I was working similar idea, but for laws. They also change quite a lot and not very well presented online.
I halted the development after I realized how complicated can word diffs get. I would be interested about techniques that you used. I it is quite good as it is, but I noticed some common problems, such as:
1. Reusing letters from words that have nothing in common:
> fury over a [-hik-]{+n increas+}e in bus fares [1]
2. Inserting few paragraphs into one word (first paragraph) [2]
That's ok. There's enough value in elevating the awareness that these things change frequently and providing a change record for those interested in a specific article, that providing a neater/cleaner way of conveying the changes is forgiven. Anyone can figure out how to read the diffs once they sit down to do it, and it still requires a human to interpret the value of changes. A single word can simply be a correction or it can be a complete reversal.
If a better UI is developed later on, it can be retrofitted.
This is amazing. This sort of technology may not be sexy enough for TechCrunch, but it's going to be infinitely more valuable to posterity than photo filters. Historiography will continue to evolve at lightspeed for the next several decades, and I'm excited to see how this sort of accumulated data interacts with coming advances in machine learning.
I love this idea but I think it would be made a lot better by having this information available on the news websites themselves with a browser plugin or something.
I halted the development after I realized how complicated can word diffs get. I would be interested about techniques that you used. I it is quite good as it is, but I noticed some common problems, such as:
1. Reusing letters from words that have nothing in common:
> fury over a [-hik-]{+n increas+}e in bus fares [1]
2. Inserting few paragraphs into one word (first paragraph) [2]
3. Loads of minor changes, also more of 1. [3]
[1] http://newsdiffs.org/diff/263401/263432/www.nytimes.com/2013...
[2] http://newsdiffs.org/diff/265812/265841/www.washingtonpost.c...
[3] http://newsdiffs.org/diff/265776/265810/www.nytimes.com/2013...