Algorithm for converting a finite state machine into a regular expression

Darmani · on Oct 6, 2010

Just had a second look at your algorithm.

You're essentially running a well-known DFA-minimization algorithm (see http://en.wikipedia.org/wiki/DFA_minimization or http://www.cs.cmu.edu/~cdm/pdf/minimization.pdf ), and then running Brzozowski. This isn't new at all, though I'm pretty impressed you reinvented the minimization algorithm.

qntm · on Oct 6, 2010

Well, there you go then. I did make an effort to find what I'd done online, but since there is no explicit "minimization" step in the way I've phrased my algorithm I probably didn't stand a chance. Apparently that DFA minimization thing is on any Computer Science degree course, which of course I've never taken (I only discovered state machines as a concept about six months ago).

Thanks for reading, everybody.

Twisol · on Oct 6, 2010

At least you know for sure that what you came up with was worth coming up with. ;) I do that all the time, and it's a mixed bag of pride and annoyance.

jacquesm · on Oct 6, 2010

At this rate you'll be coming up with original stuff before long. 6 months from discovering state machines to here is pretty impressive. Definitely don't let it stop you that this was already done before, keep at it!

qntm · on Oct 5, 2010

I apologise for the site loading slowly, my entirely unrelated page http://qntm.org/uk "The Great British Venn Diagram" is currently being hammered by Redditors.

alttab · on Oct 6, 2010

This is basically a course study in formal languages. What is more fascinating is that you arrived at the same abstraction independently.

Darmani · on Oct 5, 2010

There are several completely different algorithms for converting DFAs into regular expressions. See http://www.cs.cmu.edu/~cdm/pdf/KleeneAlg.pdf for three. One of my favorites involves eliminating states by replacing all pairs of transitions through them with regexes -- it winds up looking almost exactly like the Floyd-Warshall algorithm.

profquail · on Oct 5, 2010

The site wasn't loading for me, but it did load (albeit slowly) from a Coral Cache link:

http://qntm.org.nyud.net:8080/algo

seancron · on Oct 5, 2010

It loads for me, it just loads slowly.

Cached version: http://cache.historious.net/cached/569458/

Edit: P.S. If you used the cached version, can you let me know how it worked for you? I stumbled across historious and I'm testing to see how well it works.

stavros · on Oct 5, 2010

Heh, I went to historify that site and post the cache to help out, but wasn't sure if it would be seen as shameless promotion. I'm proud this feature sees usage, and glad you like it!

EDIT: I'm glad to report that server load is at 0.01 right now, so hammer away, Varnish is pretty much indestructible.

seancron · on Oct 6, 2010

I've been using Historious for the past few days and it has worked pretty well for me so far. We'll see how it works in a month or so when I need to find that Really Useful Article :)

One thing that I did notice about the cache is that when qntm's server was getting hit hard, it was taking ~5-10 seconds to load the cached page. I think this is because even though you cache the page, it was trying to load the CSS, images, and js from qntm's server. Not that big of a deal, but I was expecting it to load faster.

stavros · on Oct 6, 2010

Fantastic, I'm glad it's been good so far. Finding that Really Useful Article is working very well, I've been using it for months and it's great :P

Unfortunately, it can't not load the CSS/JS/images from the other site, as it would take some pretty involved spidering and a bit of additional code on our end to support caching CSS/JS/images, so it's not very much worth it.

What is worth it, though, is building in a sort of version of Readability into the cached page. So you will be able to click "light view" or something and reach a well-formatted page with only the text for the bulk of the article. This is so you can read it better if the parent site is down (it'll load faster without CSS/JS) and you can read it on your mobile, kindle, etc.

profquail · on Oct 5, 2010

The Historious cache link works just fine for me too.

nopassrecover · on Oct 5, 2010

Hmm my relevant theory is a little sketchy, but there are FSMs that aren't modellable by regex right? (e.g. Nested tags).

batterseapower · on Oct 5, 2010

Actually they are exactly equivalent.

See for example http://en.wikipedia.org/wiki/Regular_expression ("Regular expressions in this sense can express the regular languages, exactly the class of languages accepted by deterministic finite automata.")

xtacy · on Oct 6, 2010

Most of the regex engines in languages (like Perl) allow keeping track of state by using back references. If you're talking about this, then theoretically, they aren't "finite state" machines.

andrewf · on Oct 6, 2010

FSMs can't handle nested tags. Pushdown Automata can.

qntm · on Oct 6, 2010

Most "regular" expression engines in modern programming languages can handle back-references (e.g. "(.)\1"), which are actually so complex that you need a full-blown Turing machine to handle them.

jemfinch · on Oct 6, 2010

Nested tags can't be matched by fsms.

Kafka · on Oct 5, 2010

Any mirror?

Groxx · on Oct 5, 2010

Try googling for:

  cache: url you want cached

(I only found this trick out a couple days ago)

bhiggins · on Oct 5, 2010

If you find this interesting, you may be interested in this as well: http://en.wikipedia.org/wiki/Star_height_problem