Well, there you go then. I did make an effort to find what I'd done online, but since there is no explicit "minimization" step in the way I've phrased my algorithm I probably didn't stand a chance. Apparently that DFA minimization thing is on any Computer Science degree course, which of course I've never taken (I only discovered state machines as a concept about six months ago).
At least you know for sure that what you came up with was worth coming up with. ;) I do that all the time, and it's a mixed bag of pride and annoyance.
At this rate you'll be coming up with original stuff before long. 6 months from discovering state machines to here is pretty impressive. Definitely don't let it stop you that this was already done before, keep at it!
I apologise for the site loading slowly, my entirely unrelated page http://qntm.org/uk "The Great British Venn Diagram" is currently being hammered by Redditors.
There are several completely different algorithms for converting DFAs into regular expressions. See http://www.cs.cmu.edu/~cdm/pdf/KleeneAlg.pdf for three. One of my favorites involves eliminating states by replacing all pairs of transitions through them with regexes -- it winds up looking almost exactly like the Floyd-Warshall algorithm.
Edit: P.S. If you used the cached version, can you let me know how it worked for you? I stumbled across historious and I'm testing to see how well it works.
Heh, I went to historify that site and post the cache to help out, but wasn't sure if it would be seen as shameless promotion. I'm proud this feature sees usage, and glad you like it!
EDIT: I'm glad to report that server load is at 0.01 right now, so hammer away, Varnish is pretty much indestructible.
I've been using Historious for the past few days and it has worked pretty well for me so far. We'll see how it works in a month or so when I need to find that Really Useful Article :)
One thing that I did notice about the cache is that when qntm's server was getting hit hard, it was taking ~5-10 seconds to load the cached page. I think this is because even though you cache the page, it was trying to load the CSS, images, and js from qntm's server. Not that big of a deal, but I was expecting it to load faster.
Fantastic, I'm glad it's been good so far. Finding that Really Useful Article is working very well, I've been using it for months and it's great :P
Unfortunately, it can't not load the CSS/JS/images from the other site, as it would take some pretty involved spidering and a bit of additional code on our end to support caching CSS/JS/images, so it's not very much worth it.
What is worth it, though, is building in a sort of version of Readability into the cached page. So you will be able to click "light view" or something and reach a well-formatted page with only the text for the bulk of the article. This is so you can read it better if the parent site is down (it'll load faster without CSS/JS) and you can read it on your mobile, kindle, etc.
See for example http://en.wikipedia.org/wiki/Regular_expression ("Regular expressions in this sense can express the regular languages, exactly the class of languages accepted by deterministic finite automata.")
Most of the regex engines in languages (like Perl) allow keeping track of state by using back references. If you're talking about this, then theoretically, they aren't "finite state" machines.
Most "regular" expression engines in modern programming languages can handle back-references (e.g. "(.)\1"), which are actually so complex that you need a full-blown Turing machine to handle them.
You're essentially running a well-known DFA-minimization algorithm (see http://en.wikipedia.org/wiki/DFA_minimization or http://www.cs.cmu.edu/~cdm/pdf/minimization.pdf ), and then running Brzozowski. This isn't new at all, though I'm pretty impressed you reinvented the minimization algorithm.