Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I created this site using a Fast.ai trained language model using the Stack Overflow data dump.

Full writeup available here: https://stackroboflow.com/about/index.html

Interesting things I’ve noticed so far:

* It does a remarkably good job of context switching between programming languages based on the semantics of the question! If the question is about SQL it often includes SQL in < code > tags. If it’s about JavaScript it will include JavaScript! The syntax isn’t perfect due to the tokenizer mangling some things but it’s pretty close!

* The English grammar isn’t perfect but it’s pretty good.

* It doesn’t seem to lose closing track of closing tags and quotes.

* It's learned to sometimes pre-emptively thank people for their answers and to "edit" in "updates" at the end of the post.

If you find any interesting ones you can share them with the permalink! Use the "Fresh Question" button to load a new one.



I found this part of the about page interesting:

> Originally, I wanted to predict the number of upvotes and views questions would get (which, intuitively, I thought would be a good proxy for their quality). Unfortunately, after working on this for about a week straight I've come to the conclusion that there is no correlation between question content and upvotes/views.

> I tried several different models (including adapting an AWD_LSTM classifier, a random forest on a bag of words, and using Google's AutoML) and none of them produced anything better than random noise.

> I also tried using myself as a "human classifier" and given two random questions from StackOverflow I can't predict which one will be more popular.


Thanks, brilliant work! Some questions are downright hilarious (see a suite of automated packaging techniques [0]), and the broken English just adds extra ESL credibility to the questions.

[0] >I want to write an update statement with a sequence of values I can run through a database. i've written the below code to broken up my character string into the columns. All of the articles i've read seem to suggest that I 'll need a suite of automated packaging techniques for my environment to all update the database.

What s the best way to update the column ids?

Thanks


I wouldn't doubt it was written by a human if I saw it on stackoverflow.


> Answering Questions Right now the model only generates questions. In version 2 I want to train it to answer questions. If I could get this working it'd actually become a useful tool instead of a fun toy.

Looking forward to that part :D

I mean, those answers are probably not going to be correct, but I wonder how close they will be to something useful.


Yes, many times the questioner does not actually need an answer to the question, he just needs to look a little closer to the situation, which is potentially able to be automated. But one should not disguise such automation as an 'answer': more like a query autocheck but more tooled-up.


I wonder what percentage of questions just need a correctly working example because the questioner is unsure of how to use a given API. Automation of this I imagine could actually be doable.


Thanks for including permalinks to questions, that's great for sharing!


How does this compare to the gpt by openai https://github.com/openai/gpt-2 ?


It’s a different model, the AWD_LSTM. The inventor of this model spoke about GPT-2 on this podcast and talked a bit about the differences: https://overcast.fm/+Goog4jsR8




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: