Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The methodology is explained here: https://refactoringenglish.com/tools/hn-popularity/methodolo...

You won't show up unless your site is listed in this manually curated CSV file: https://github.com/mtlynch/hn-popularity-contest-data/blob/m...



>You won't show up unless your site is listed in this manually curated CSV

Correction: you'll show up even if you're not in the CSV. The CSV just populates metadata for your entry.


How do you filter out the non-blog content? I assume you had an allow-list of known personal blogs.


Everything is default included, and I have a long list of not-blog domains that are excluded.[0] Plus, I exclude the Alexa top 500.

There are lots of not-blogs still in the dataset, but I just exclude them when I come across them in popular views. But I'm sure if you dig through positions 101-5000 you'll find lots of domains that don't match my official criteria for a blog.

https://github.com/mtlynch/hn-popularity-contest-data/blob/m...


Thank you for the reply, I'll go and make a PR.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: