Show HN: This Question Does Not Exist

yeldarb · on March 14, 2019

I created this site using a Fast.ai trained language model using the Stack Overflow data dump.

Full writeup available here: https://stackroboflow.com/about/index.html

Interesting things I’ve noticed so far:

* It does a remarkably good job of context switching between programming languages based on the semantics of the question! If the question is about SQL it often includes SQL in < code > tags. If it’s about JavaScript it will include JavaScript! The syntax isn’t perfect due to the tokenizer mangling some things but it’s pretty close!

* The English grammar isn’t perfect but it’s pretty good.

* It doesn’t seem to lose closing track of closing tags and quotes.

* It's learned to sometimes pre-emptively thank people for their answers and to "edit" in "updates" at the end of the post.

If you find any interesting ones you can share them with the permalink! Use the "Fresh Question" button to load a new one.

duckerude · on March 14, 2019

I found this part of the about page interesting:

> Originally, I wanted to predict the number of upvotes and views questions would get (which, intuitively, I thought would be a good proxy for their quality). Unfortunately, after working on this for about a week straight I've come to the conclusion that there is no correlation between question content and upvotes/views.

> I tried several different models (including adapting an AWD_LSTM classifier, a random forest on a bag of words, and using Google's AutoML) and none of them produced anything better than random noise.

> I also tried using myself as a "human classifier" and given two random questions from StackOverflow I can't predict which one will be more popular.

avip · on March 14, 2019

Thanks, brilliant work! Some questions are downright hilarious (see a suite of automated packaging techniques [0]), and the broken English just adds extra ESL credibility to the questions.

[0] >I want to write an update statement with a sequence of values I can run through a database. i've written the below code to broken up my character string into the columns. All of the articles i've read seem to suggest that I 'll need a suite of automated packaging techniques for my environment to all update the database.

What s the best way to update the column ids?

Thanks

naugtur · on March 14, 2019

I wouldn't doubt it was written by a human if I saw it on stackoverflow.

arendtio · on March 14, 2019

> Answering Questions Right now the model only generates questions. In version 2 I want to train it to answer questions. If I could get this working it'd actually become a useful tool instead of a fun toy.

Looking forward to that part :D

I mean, those answers are probably not going to be correct, but I wonder how close they will be to something useful.

drilldrive · on March 14, 2019

Yes, many times the questioner does not actually need an answer to the question, he just needs to look a little closer to the situation, which is potentially able to be automated. But one should not disguise such automation as an 'answer': more like a query autocheck but more tooled-up.

wpasc · on March 14, 2019

I wonder what percentage of questions just need a correctly working example because the questioner is unsure of how to use a given API. Automation of this I imagine could actually be doable.

exolymph · on March 14, 2019

Thanks for including permalinks to questions, that's great for sharing!

ganeshkrishnan · on March 15, 2019

How does this compare to the gpt by openai https://github.com/openai/gpt-2 ?

yeldarb · on March 15, 2019

It’s a different model, the AWD_LSTM. The inventor of this model spoke about GPT-2 on this podcast and talked a bit about the differences: https://overcast.fm/+Goog4jsR8

jerf · on March 14, 2019

"I have creating a PNG Image file where I am printing out the image's with different colors and Image Types. Now I am sure I am drawing properly, but what I'm seeing is that the image is not differently jpeg (ie FF or Chrome) and Safari (for Firefox) is different from the one in Firefox. "

As a bit of a connoisseur of babblebots over the decades, one of the interesting things about this generation is that it is producing text that has a very interesting effect in my mind. There is a part of the parsing process where the above text went down smooth; yup, that's what Stack Overflow questions from early developers tend to look like. That part of my brain issues no objection. But the next layer up screams bloody murder about how nonsensical that is. And it's not just "that's a bad question but I still see the order under it", but nonsense.

It's a combination I've not experienced before. Previous generation babblebots could often produce a lot of fun text, but every processing level above raw word processing has always been able to tell it's computer garbage, even when it blundered onto a particularly entertaining piece of garbage. We've actually successfully moved up a level here.

As I'm describing subjective experience, YMMV.

x1798DE · on March 14, 2019

The experience you are describing reminds me of the comparative illusion [0], which is a grammatical illusion where certain sentences seem grammatically correct when you read them, but upon further reflection actually make no sense, example:

"More people have been to Berlin than I have."

[0] https://en.wikipedia.org/wiki/Comparative_illusion

Zanni · on March 15, 2019

Fascinating. There's a sentence I picked up from a friend in childhood, "Although the Moon is only an eighth the size of the Earth, it is much farther away," which seems to be similar, but not quite a CI, if I'm reading the Wikipedia article correctly. Thanks for the link.

veryworried · on March 14, 2019

This is like a mental tarpit, where you waste time reading trying to understand what the person is saying only to realize they were a bot and all your effort was for nothing, that is time you will never get back now.

This is a terrifying way to destroy an online community if a person floods it with nonsense content like this.

ryandrake · on March 14, 2019

Terrifying -> inevitable. Imagine a botnet full of fake AI users trained with a corpus of legit HN posts. Let them loose commenting on random articles, beginning slowly but ramping up until they’re 99% of all comments.

In a few years, the standard Silicon Valley “Growth Hacking” job description will include using AI to deploy fake content to your competitors’ sites, destroying their user community.

Jeff_Brown · on March 15, 2019

Two potential solutions: Reputation and a new account fee.

Nonsense flooding will make it more difficult for people to establish their identities on a network, but once it's established, they'll be in the clear. If someone has to pay to have their first thread or two reviewed, it will take serious money to flood a site to death.

(A similar solution to email spam has been waiting to happen for decades -- charge a fraction of a penny per email, and nobody is harmed but spammers. Maybe allow exceptions for officially recognized organizations that have to send a lot of messages, like political campaigns.)

richk449 · on March 15, 2019

I was with you until you said you wanted to exempt the politicians. I would charge them double.

Jeff_Brown · on March 15, 2019

I share your concern.

This would be even better: Email recipients grant free access to whoever they want. A tiny price would be charged only when sending to someone who has not granted such access.

yeldarb · on March 15, 2019

I like the idea.

Unfortunately, even though snail mail has associated costs I still get a ton of junk.

avip · on March 14, 2019

Is this a future? How much content on twitter/fb is autogenerated, auto-liked and auto-shared?

pjc50 · on March 14, 2019

Some is deliberately auto-generated, like https://twitter.com/choochoobot ; but yes, there is definitely an awful lot of auto-liked auto-shared fake engagement out there.

jachee · on March 14, 2019

If the answer is not "zero" then the answer is "too much."

droithomme · on March 15, 2019

I'm positively sure these tactics are already being deployed as a weapon in order to shut down debate of certain inconvenient topics and disrupt problematic communities.

AnIdiotOnTheNet · on March 15, 2019

Ugh. Like SV hasn't yet made the internet suck enough...

mortenjorck · on March 14, 2019

Indeed, there’s something almost unsettling about text that initially appears to follow a sort of internal logic, yet doesn’t. Some of the results read like a programming fever dream:

“I set each thread * pointer, adding a new thread and in a loop inside this function. The thread would be immediately on the thread, but the thread resulted in the exception. If I return the thread to the first thread and finally the thread is left, the thread doesn't hang, and I couldn't kill thread # 1 - because the thread method made first thread calls the native thread. But, the thread is waiting for the thread blocking and all the other threads to be started. In other words, the thread is always destroyed.”

eponeponepon · on March 14, 2019

Unsettling is exactly the word. https://stackroboflow.com/#!/question/16662 leaves me trying to work out what on earth the poster's really trying to do - even though I know full well that there is no poster...

hyperpallium · on March 15, 2019

A few times, I've come across Stackoverflow questions on technical topics I'm not very familiar with, and the question makes no sense to me (there are clear spelling, grammatical, and consistency errors). But there's an answer, and a comment exchange that seemed to resolve the question. So, I conclude, it's just my unfamiliarity prevents me from seeing through those errors.

A related phenomena is seeing fundamental errors in a newspaper article on a topic you're expert in... but believing articles on topics you're not familiar with.

This can operate as a partial turing test: a gradient for iteration.

bhl · on March 14, 2019

You might like this post from last week:

Humans who are not concentrating are not general intelligences

https://news.ycombinator.com/item?id=19251755

eponeponepon · on March 14, 2019

On the other hand, cherrypicked excerpts can be terrifyingly convincing: "What is the best way to login to my Ruby application in a browser via Perl?"

We've all been asked a question like that and had a cold dread creep over us as we try to formulate a response...

rosser · on March 14, 2019

It's the Uncanny Valley of text synthesis.

aur09 · on March 14, 2019

This one is golden:

"What's the best way to indeed start a process on an OS x machine?

What is the best way to start a process on Mac OS x Snow Leopard?

There I just need to be able to run the OS x.exe from the command line and it's working fine (make it available in Windows). But I'm on an Mac and I haven't figured out how to do this for a Linux machine.

Another reason I ask is that I only have a Unix shell running with the Python process in it (it's my an Ubuntu machine, nothing didn't work in the shell).

Thank you in advance"

DoctorPenguin · on March 14, 2019

https://stackroboflow.com/#!/question/16993 Thanks Steve

Without too much sarcasm I have received support requests that were far too close to this.

---

Laughing way to hard, but is it having a stroke? https://stackroboflow.com/#!/question/14791

turtlebraile · on March 15, 2019

This reminds me of the Ponzo famous line on Becket's Waiting for Godot.

apo · on March 14, 2019

I propose a new kind of Turing Test.

Gather equal numbers of the least intelligible questions from SO (possibly using a metric based on low views/upvotes/comments/answers over long time) and a random selection from stackroboflow.

Present human judges with both sets of questions and ask them to tell the difference.

Having read numerous SO questions from newbie developers whose grasp of English was tenuous at best, I doubt I could tell the difference.

The next step up: the same test, but with mathematics or scientific papers judged by non-experts in the field.

We may actually be there already - I'm not sure.

All of which makes me wonder when we'll reach the point where the bar has been raised so high that the comparison will need to be against the best SO questions and scientific/mathematics papers judged by subject matter experts.

faizshah · on March 14, 2019

This is the most prototypical stackoverflow question I have ever seen:

https://stackroboflow.com/#!/question/21467

anonytrary · on March 14, 2019

Hah. I raise you https://stackroboflow.com/#!/question/2222

MagnificentSpam · on March 15, 2019

"What the heck what does $ (' # ') do?" https://stackroboflow.com/#!/question/7492

yeldarb · on March 14, 2019

But is it web scale?

autechr3 · on March 14, 2019

"View from View to View, Need to open this new View in View"

https://stackroboflow.com/#!/question/4781

A common problem for everyone, i'm sure.

thrownaway954 · on March 14, 2019

"I come from a C # background, this is my first Silverlight project, but I'm new to Windows."

Now that is comedy!

yellowapple · on March 14, 2019

Maybe jQuery will help? https://stackroboflow.com/#!/question/11261

dylan-m · on March 14, 2019

I got a surprisingly comprehensible (and similarly recursive) ListView question: https://stackroboflow.com/#!/question/15327.

kruczek · on March 15, 2019

Similarly this one: https://stackroboflow.com/#!/question/19297

Sounds like jumpstarting your userbase would be easy, once you allow users to define other users :)

avip · on March 14, 2019

https://stackroboflow.com/#!/question/22733 I hate when fellow coders do that.

Another pearl: Creating a PDF from PDF. The situation is as follows: We have a video file hosted by Google Map.

It's like reading a doco-satire about my life.

Findus23 · on March 14, 2019

I can claim to have experience [0] with generating funny nonsense based on Stackoverflow data (what a wired thing to say :))

Seems like you beat me to my plan to make a Neural Network based variant and I really like the results (especially that they stay a topic instead of totally drifting off into fun nonsense like my Markov Chains.

Have you tried also using other Stackexchange sites as a source? In my experience they result in more fun questions as they have more "human" interactions (especially the more personal advice based sites) which creates things like: - Do Greeks driving affect the whaling industry? - Essential windsurfing equipment to fish? - Do mountaineers eat grass? - Can I toast

[0] https://news.ycombinator.com/item?id=16947038

yeldarb · on March 14, 2019

I haven't yet! It's on my list of things I'd like to try.

gudok · on March 14, 2019

I reviewed 1600 edits at StackOverflow. And I can say that some of the automatically generated questions are more intelligible than the average SO question. For example, this one looks fine to me: https://stackroboflow.com/#!/question/11235

inostia · on March 14, 2019

It's so close to being intelligible, but I still can't quite parse it, like so many actual SO posts.

_9jgl · on March 15, 2019

Likewise, I'm not sure I'd think anything was strange if I came across https://stackroboflow.com/#!/question/12110.

natch · on March 14, 2019

Fascinating. I wonder if our current discussion boards on the interwebs can survive the coming influx of content like this and the next generations of it that follow.

There are a lot of SO questions posted by very weak non-native speakers of English and some of these are hard to distinguish from those. Kind of scary!

What possible positive outcomes do you see for this kind of (admittedly inevitable) capability?

pjc50 · on March 14, 2019

AI will render political discussion between honest human strangers impossible on the open internet.

At some point this technology will extend into what's left of print, then talk radio, then TV. An endless supply of Markov punditry.

carapace · on March 14, 2019

"Markov punditry" Nice coinage there.

pjc50 · on March 14, 2019

:) I doubt it's original; it reminds me of the character Markov Chaney from Robert Anton Wilson's books.

yeldarb · on March 14, 2019

I am actually a bit worried that I’m already starting to see search engine traffic coming in...

I hope that the good will outweigh the bad. I’d love to create an answer generator, for example.

Once enough questions are generated I’m going to try creating a classifier to see if a neural net can differentiate between real questions and fake ones.

eitland · on March 14, 2019

> I am actually a bit worried that I’m already starting to see search engine traffic coming in...

Brilliant, now you only need to come up with a way to use this for good and keep (at least slightly) ahead of the cost in the long run.

triplewipeass · on March 14, 2019

You could put up a robots.txt denying all search engines.

natch · on March 14, 2019

But the issue is not just what he could do, but what malicious content generation systems could do.

triplewipeass · on March 14, 2019

The issue is, as stated:

> I am actually a bit worried that I’m already starting to see search engine traffic coming in...

We can discuss hypothetical systems that could maliciously flood us with generated content. The creator of this particular service which is being discussed here and now could also begin taking steps to ensure that his creation does not inadvertently create a problem for some hapless Google user.

natch · on March 15, 2019

Well, no. You have to read further up the thread to see the issue I was referring to.

>I wonder if our current discussion boards on the interwebs can survive the coming influx of content like this and the next generations of it that follow.

Yes the robots.txt is a good and trivial step he could take to ensure well behaved robots do not pick up his content. So your comment suggesting robots.txt is a good comment in its narrow frame, but one that missed the larger picture. That minor problem is solved. The interesting problem is of a different nature.

Angostura · on March 14, 2019

I was thinking the same, as my immediate reaction was to attempt to understand the question and formulate a solution

iforgotpassword · on March 14, 2019

I was clicking through a couple funny ones but as soon as I got one[1] that fell into this uncanny valley I immediately forgot this was generated and tried to understand it, getting super confused.

[1] https://stackroboflow.com/#!/question/13535

9dev · on March 14, 2019

I was cycling through some answers, when suddenly the following, completely unrelated text shows up in a random code block:

I can feel the admin is different

You sure you didn't just accidentally create a self-aware AI? Forgot to permalink sadly

motohagiography · on March 14, 2019

There is immense value in training these to synthesize test data sets for sensitive information you can't safely put in a preprod environment.

Health information would be the main case I can think of now.

Having synthesized data for testing new services in govt would be a huge improvement.

De-identification is basically impossible and there are a bunch of companies who will lie to you if you pay them to, but synthesized data covers many use cases for de-identification and for homomorphic encryption.

ggambetta · on March 14, 2019

Reminds me of https://git-man-page-generator.lokaltog.net/, which I always found hilarious :)

arendtio · on March 14, 2019

> This is NOT real git documentation!

YMMD :D

kristianc · on March 14, 2019

Awesome. Can you create a neutral net that arbitrarily closes questions as off topic or non constructive? ;)

penagwin · on March 14, 2019

No need for a neural network, you can just use Math.random (or your respective language's RNG).

Sidenote, would this technically count as a neural network with only one weight (which is randomly initialized)?

yeldarb · on March 14, 2019

It’s something I’m interested in!

Unfortunately I’ve come to the conclusion that upvotes on Stackoverflow aren’t correlated with question content (or I’m not skilled enough to be able to differentiate between “good” and “bad” questions). Check out the linked write up for more detailed info.

deckar01 · on March 14, 2019

> arbitrarily

I think the original comment is being sarcastic and suggesting that the actual humans that close discussions and mark them as "off topic" don't understand the question and perform these actions at random. This is a sentiment shared by many who don't "live" in those types of forums.

yeldarb · on March 14, 2019

Ah, missed the operative word there. I could definitely do that! ;D

joshvm · on March 14, 2019

Could you not also train a classifier that correlates question content with mod decision? Questions like "what is the best X? " that are obviously subjective , for example.

Maybe even some kind of crazy generative model that learns to post questions that aren't closed by the AI moderator!

jefb · on March 14, 2019

I think we've all been here before:

"i've been asked to use Json to call a webservice. I don't modify a JSON object at all. However, when calling JSON returned by the Json object, it fails because the object life isn't array!"

https://stackroboflow.com/#!/question/24101

tom_usher · on March 14, 2019

Excellent! It's great when it tries to generate code: https://stackroboflow.com/#!/question/8138 (the last line here made me laugh)

MagnificentSpam · on March 15, 2019

This one looks like a genuine java program.

https://stackroboflow.com/#!/question/29584

``` Cat cat = new Cat(); Cat cat = new 2nd Cat ");" ```

jpindar · on March 14, 2019

I heard you like paths, so I put some paths in your path:

https://stackroboflow.com/#!/question/973

syllable_studio · on March 15, 2019

Very fun. But how am I supposed to help Charset solve their urgent problem? I'll just answer here.

Q: How to use a JSON string in a funky way https://stackroboflow.com/#!/question/49913

A: Dear Charset, I hope this might resolve your issue.

window.location = JSON.parse('[{"use": "https://www.youtube.com/embed/0ROzGihgCj8?rel=0&autoplay...

rcthompson · on March 14, 2019

https://stackroboflow.com/#!/question/12875 "It works fine in GCC but it does not work in GCC / GCC."

joshvm · on March 14, 2019

Absolute gold: "Is there a animal out there that someone can apply to do the sort of thing I'm looking for?"

Not sure what happened to the title that time.

https://stackroboflow.com/#!/question/11716

(perhaps op is a vim user)

kyle-rb · on March 14, 2019

Try a Python or a Pony, or maybe even an OCaml.

owl57 · on March 15, 2019

Probably OCaml: that looks like an ML-style function signature.

kyle-rb · on March 14, 2019

Oh wow, this is amazing. My favorite so far is: https://stackroboflow.com/#!/question/22890

>How can I do this software?

Although it sounds more like a question from Quora.

woodrowbarlow · on March 14, 2019

having spent some time in the triage & edit queues, this 100% sounds like stackoverflow.

silveroriole · on March 14, 2019

This is great... “I'm getting errors with Line 1, Line 39, Column million” lol

skykooler · on March 14, 2019

When trying to pack your code into a one-liner goes too far...

sampleinajar · on March 14, 2019

Nice! This is also what every question looked like when I was new to programming.

mannykannot · on March 14, 2019

I'm voting to close as duplicate.

mormegil · on March 14, 2019

Right, after adding tags and answers, comments need to be added as well...

yeldarb · on March 14, 2019

[Sorry, this question has been protected by a moderator.]

(^.^ this “so-SO” comment made me smile.)

ZoomZoomZoom · on March 14, 2019

Oh, great, now I know how my clueless questions look like to a knowledgeable person! Example:

>"I need to create an image from a imported wav file (for a user - friendly format find enough header for the cookie). I looked for a solution, but that didn't work either."

hiccuphippo · on March 14, 2019

> ... Thinking I have to use the first two but it's not possible to use Jquery.

> So: Is it recommended to use a Perl function

This is just like the real thing.

_0nac · on March 14, 2019

I presume this is due to tokenization or something, but there's a lot of extra whitespace in the code samples that make them look very unrealistic:

  def _ _ init__(self, default): 
  " " " 
  See if the default value for the field on a view is 
  timespan. 
  " " " 

  < select > 
  < option > value < / option > 
  < option > value < / option > 
  < / select >

And indentation is also missing completely. Maybe you need to use another NN to guess which language the fake code is in and autoformat it accordingly!

yeldarb · on March 14, 2019

It is, the tokenizer isn't reversible (and it adds spaces all over the place).

But a lot of these I should be able to add to my regex that converts the output back into more human readable format (in the raw output, there's a space before every punctuation mark so I already remove those extraneous spaces from periods, commas, etc).

I just haven't gotten around to adding in any heuristics specifically for code but adding a bit more post-processing is on my to-do list.

yeldarb · on March 17, 2019

I updated my regexes to clean up some of the tokenizer noise last night. So many of the formatting in the code snippets should look a bit more natural now.

misterdoubt · on March 15, 2019

Fortunately, the problem minimally affects SQL.

https://stackroboflow.com/#!/question/21712 https://stackroboflow.com/#!/question/21690 https://stackroboflow.com/#!/question/5959 https://stackroboflow.com/#!/question/27941

Looks about right.

hyperpallium · on March 15, 2019

Comgratulations, you have simulated a million monkeys at typewriters with a million monkeys at typewriters. Has anyone really been far even as decided to use even go want to do look more like?

inostia · on March 14, 2019

This one boggles my mind, it even has code:

https://stackroboflow.com/#!/question/25131

code_duck · on March 14, 2019

Final question didn’t end in a question mark - perfect!

“I want to do something like this

$ _ -1 = object();”

We all do....

Now I see that the virtual question is different every time. Great work. It read better than most SO questions.

8bitsrule · on March 14, 2019

This one is clearly written by a broken agent: https://stackroboflow.com/#!/question/1450

Reminds me of those online chatbots I used to torture back 10 or 15 years ago. One I started asking about personal information about its creator. It was remarkably evasive, constantly attempting to switch the subject.

joshvm · on March 14, 2019

Another gem: https://stackroboflow.com/#!/question/17035

"I have got a big "someone" who will be going to be using the asp.net site.

I have a black box and a background in firefox, where they have a width of 100%.

They will never know of a color.

They come from a background color."

Film starring Liam Neeson?

dugluak · on March 14, 2019

Every good invention can be terrifying if it falls in the hands of bad guys (Nuclear technology for example). It's true for AI also. I am sure bad guys must be training similar AI agents by only feeding fake news, conspiracy theories etc. and it's easy to build AI agents as there is so much Open Source material online about AI.

chrisco255 · on March 14, 2019

I'm trying to imagine a productive use case for this? Maybe in reverse for attempting to answer questions?

jdefr89 · on March 14, 2019

Think things like election meddling. Propagating truly fake news to cater to the emotions of what people simply want to be true. Humans are weak against Confirmation Bias, ten minutes on Facebook will show you for sure.

cpeterso · on March 14, 2019

Yes. That was the rationale OpenAI made just a few weeks ago to not release their new language models:

http://approximatelycorrect.com/2019/02/17/openai-trains-lan...

dugluak · on March 14, 2019

use case is to spread misconceptions in the society (that's what bad guys want right?) in an automated way. especially during elections.

aboutruby · on March 14, 2019

I think it would still be considered a "Does Not Exist"-valid website if the generated questions would have some auto-formatter for the code. Main issue I see is extra spaces everywhere, often in a syntax breaking way (and missing spaces for formatting) (not that all SO questions have those).

yeldarb · on March 14, 2019

Yeah this is a shortcoming of the tokenizer. It splits things up in ways that are not 1:1 mappable back to their source unfortunately.

I did a bit of post-processing to get it formatted a bit better (re-combining the “would“ and “n’t” tokens and changing html tags to markdown for example) but there’s still room for improvement.

Spacing specifically is different based on the context. Outside of code blocks you want a space after a period. Inside you probably don’t. But since the tokenizer has one in both places there’s no opportunity for the neural net to learn this (it can’t see any difference). And my naive formatted doesn’t know the difference either. (If you’re curious you can find it in the JS file)

yeldarb · on March 17, 2019

I updated my regexes to clean up some of the tokenizer noise last night. So many of the formatting in the code snippets should look a bit more natural now.

TheAsprngHacker · on March 14, 2019

Huh, in this question, there are a lot of words that get repeated five consecutive times: https://stackroboflow.com/#!/question/13733

Is there a reason why? (I don't know anything about AI.)

yeldarb · on March 14, 2019

The way the language model is trained is by rewarding it for correctly predicting the next word in a sequence.

The output of the model is a predicted probability distribution of the next word and a “state” — the next iteration takes the state output of the previous interation and generates another word and state (and this process repeats many times).

Since there’s a probabilistic dimension, what may have happened in this case is that it happened to repeat once by chance and the model had learned that if something repeats 2x it’s likely that it will repeat a third, fourth, and fifth time.

Basically it’s just trying to game the loss function which rewarded it for predicting the next word in the sequence correctly.

TheAsprngHacker · on March 14, 2019

Thanks for the explanation. Your description superficially reminds me of a Markov chain (https://en.wikipedia.org/wiki/Markov_chain). Is this related or is it totally different?

LeanderK · on March 14, 2019

I haven't read the paper the work is based on, but if the RNN outputs a probability distribution for the next letter/word then they form Markov Chains (since then they only depend on the current state and not the previous state)!

RNNs are just fancy parametric functions that take a (state, input)-pair and return a new (state', output)-pair.

crooked-v · on March 14, 2019

From what I understand, repeated output is a common failure state with neural net stuff in particular, though I don't know why.

dmurray · on March 14, 2019

Because it needs to determine if a string is a '????? '''''?????, of course.

drdaeman · on March 14, 2019

This desperately needs some AI-generated expert answers!

deepsy · on March 14, 2019

This is similar to

thispersondoesnotexist.com thisresumedoesnotexist.com

yeldarb · on March 14, 2019

Yes, I was heavily inspired by them :) Glad someone made the connection!

I actually hadn't seen thisresumedoesnotexist.com yet; but I loved https://thiscatdoesnotexist.com and https://thisrentaldoesnotexist.com

iforgotpassword · on March 14, 2019

https://thiscatdoesnotexist.com

Oh my god, some of these look terrifying, pure nightmare fuel.

gwern · on March 14, 2019

You might enjoy https://www.thiswaifudoesnotexist.net

Like the Airbnb, it's StyleGAN+GPT-2 (finetuned in this case on anime plot synopses+summaries: https://www.gwern.net/TWDNE#gpt-2-anime-plot-synopses-for-gp... ).

I'm currently training an improved 'portrait' anime StyleGAN to fix up some of the faces' issues.

misterdoubt · on March 15, 2019

Seem fine to me. https://imgur.com/HTAgmPx

yeldarb · on March 14, 2019

Just added a browsable archive of all of the questions it has generated thus far: https://stackroboflow.com/browse/index.html

yeldarb · on March 14, 2019

And now, tooltip previews on that page for browsing convenience.

sachin18590 · on March 15, 2019

This looks absolutely amazing! I would be very curious to know how you went about conceptualizing the project and the AI beneath. Do you have a blogpost on it or planning to write one?

yeldarb · on March 15, 2019

Yep, there's a writeup on the site: https://stackroboflow.com/about/index.html

stabbles · on March 14, 2019

This is very refreshing!

"I'm starting a new website using VB. I make a migration file and save it to a local Azure database"

drinane · on March 14, 2019

This comment is worth about as much as this website.

mitchtbaum · on March 14, 2019

The software that wrote this comment does not exist.

TomMckenny · on March 14, 2019

It has better grammar than the real one anyway.

turtlebraile · on March 15, 2019

I really would like a bot like this to produce ideas of things to create with programming in general.

Any ideas on the possible dataset?

booleandilemma · on March 15, 2019

Can we please get a Jon Skeet neural network to provide answers?

droptablemain · on March 14, 2019

Giggles Love it.

chrisco255 · on March 14, 2019

Funny, yet terrifying at the same time. How can I be sure that HN isn't just a really well trained Neural Net?

chrisco256 · on March 14, 2019

How can I be sure that i'm not just a really well trained Neural Net?

jdefr89 · on March 14, 2019

You are a very well trained neural net... The concept is based off of actual Neurons in our brain. Can't tell if you're serious or trolling though lol.

chrisco255 · on March 14, 2019

Isn't a brain just that?

yeldarb · on March 14, 2019

A friend of mine actually suggested trying to generate a hacker-news-comment language model next.. sounds like a fun project.

I’ll have to look and see if there’s an archive of them available.

code_duck · on March 14, 2019

An archive of HN comments? This is it.

exolymph · on March 14, 2019

Wait, you're saying it's not?

code_duck · on March 14, 2019

How can I be sure that I’m not in a coma? Nothing external can be believed.

hyperpallium · on March 15, 2019

stackroboflow

drinane · on March 14, 2019

This is lame.

drinane · on March 16, 2019

I protest getting -4 points. They used github and stackoverflow. Wrote a function to connect the two based on tags and then randomly generate a question off of that. It's lame. Do something useful or cool.