Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Algorithm for converting a finite state machine into a regular expression (qntm.org)
67 points by pavel_lishin on Oct 5, 2010 | hide | past | favorite | 22 comments


Just had a second look at your algorithm.

You're essentially running a well-known DFA-minimization algorithm (see http://en.wikipedia.org/wiki/DFA_minimization or http://www.cs.cmu.edu/~cdm/pdf/minimization.pdf ), and then running Brzozowski. This isn't new at all, though I'm pretty impressed you reinvented the minimization algorithm.


Well, there you go then. I did make an effort to find what I'd done online, but since there is no explicit "minimization" step in the way I've phrased my algorithm I probably didn't stand a chance. Apparently that DFA minimization thing is on any Computer Science degree course, which of course I've never taken (I only discovered state machines as a concept about six months ago).

Thanks for reading, everybody.


At least you know for sure that what you came up with was worth coming up with. ;) I do that all the time, and it's a mixed bag of pride and annoyance.


At this rate you'll be coming up with original stuff before long. 6 months from discovering state machines to here is pretty impressive. Definitely don't let it stop you that this was already done before, keep at it!


I apologise for the site loading slowly, my entirely unrelated page http://qntm.org/uk "The Great British Venn Diagram" is currently being hammered by Redditors.


This is basically a course study in formal languages. What is more fascinating is that you arrived at the same abstraction independently.


There are several completely different algorithms for converting DFAs into regular expressions. See http://www.cs.cmu.edu/~cdm/pdf/KleeneAlg.pdf for three. One of my favorites involves eliminating states by replacing all pairs of transitions through them with regexes -- it winds up looking almost exactly like the Floyd-Warshall algorithm.


The site wasn't loading for me, but it did load (albeit slowly) from a Coral Cache link:

http://qntm.org.nyud.net:8080/algo


It loads for me, it just loads slowly.

Cached version: http://cache.historious.net/cached/569458/

Edit: P.S. If you used the cached version, can you let me know how it worked for you? I stumbled across historious and I'm testing to see how well it works.


Heh, I went to historify that site and post the cache to help out, but wasn't sure if it would be seen as shameless promotion. I'm proud this feature sees usage, and glad you like it!

EDIT: I'm glad to report that server load is at 0.01 right now, so hammer away, Varnish is pretty much indestructible.


I've been using Historious for the past few days and it has worked pretty well for me so far. We'll see how it works in a month or so when I need to find that Really Useful Article :)

One thing that I did notice about the cache is that when qntm's server was getting hit hard, it was taking ~5-10 seconds to load the cached page. I think this is because even though you cache the page, it was trying to load the CSS, images, and js from qntm's server. Not that big of a deal, but I was expecting it to load faster.


Fantastic, I'm glad it's been good so far. Finding that Really Useful Article is working very well, I've been using it for months and it's great :P

Unfortunately, it can't not load the CSS/JS/images from the other site, as it would take some pretty involved spidering and a bit of additional code on our end to support caching CSS/JS/images, so it's not very much worth it.

What is worth it, though, is building in a sort of version of Readability into the cached page. So you will be able to click "light view" or something and reach a well-formatted page with only the text for the bulk of the article. This is so you can read it better if the parent site is down (it'll load faster without CSS/JS) and you can read it on your mobile, kindle, etc.


The Historious cache link works just fine for me too.


Hmm my relevant theory is a little sketchy, but there are FSMs that aren't modellable by regex right? (e.g. Nested tags).


Actually they are exactly equivalent.

See for example http://en.wikipedia.org/wiki/Regular_expression ("Regular expressions in this sense can express the regular languages, exactly the class of languages accepted by deterministic finite automata.")


Most of the regex engines in languages (like Perl) allow keeping track of state by using back references. If you're talking about this, then theoretically, they aren't "finite state" machines.


FSMs can't handle nested tags. Pushdown Automata can.


Most "regular" expression engines in modern programming languages can handle back-references (e.g. "(.)\1"), which are actually so complex that you need a full-blown Turing machine to handle them.


Nested tags can't be matched by fsms.


Any mirror?


Try googling for:

  cache: url you want cached
(I only found this trick out a couple days ago)


If you find this interesting, you may be interested in this as well: http://en.wikipedia.org/wiki/Star_height_problem




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: