Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is an interesting question that I'd like to answer now that I have all the data. I am curious to see how long it will take to find a solution as I believe even the most efficient algorithms for this have a high runtime complexity.

And yes, the graph is not connected (there are both nodes with no outgoing links and with no incoming links), but over 99% of the pages are connected, so the answer would still be interesting and worthwhile.



Another interesting question would be what the largest graphs are that are disconnected from the "main" one. Are there any larger than 1 or 2 nodes? Is there some set of pages on some obscure topic that only link within themselves?


Floyd-Warshall solves the all-pairs shortest path in time O(V^3). Running a BFS search rooted from every node would find the shortest path in an unweighted graph in O(V*(V + E)).


Hence, since there's a whole lot of Vs in the wikipedia graph, he's probably going to be satisfied with an approximate solution, unless he has a lot of CPU hours to spare.


There's 5,579,252 articles in English Wikipedia right now, which means the largest component has 5 million and change or so vertices. Each individual BFS should take a few CPU-seconds at most, and you can trivially parallelize the BFS for each node across as many CPUs as you have.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: