Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's really interesting. Will you open-source the NLP algorithm?


I am planning to open source it in several months. (Our codes have not been well-commented and well-structured yet...

Our implementation and algorithm detail is followings.

Its categorizing process is written in Python.

Using nltk, it makes corpus with TFIDF model from HN topics and comments. And it generates classifiers from this corpus with SVM algorithm using scipy and numpy.

FYI, its web interface is written in Clojure and ClojureScript.


presumably you've trained it with hand annotated content, or bootstrapped from a few choice hn searches (like ?q=jquery will give you a web tech category)


Yes. You are right.

I trained classifiers with hand annotations (about 1000 contents or so)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: