1. Scientifically, synthesizing really simple loop-free programs from high-level specs is not really new. You can synthesize programs at this level of complexity with a few minutes on a single ten year old laptop using 5-10 year old algorithms.
2. Scientifically, it's already known that the architectures used in nlp models are also useful for simple program synthesis tasks.
3. From an engineering perspective, here was a good HN discussion about this a few weeks ago. IMO this comment hits the nail on the head (https://news.ycombinator.com/item?id=23256333):
> This technology converts writing code into bug hunting in pre-written code. Finding bugs in code that you did not write is way harder than writing the code yourself. So if anything, this makes programming harder, not easier, and we will need more programmers, not less.
Rings true for this example. I can write my intended implementation much faster than I can read the generated code, find the bug, and think of a comment that captures the correct spec.
It's interesting, but it's not "generating code to solve a novel programming task".
I agree with you and I doubt we'll get useful AI-generated code before we get strong AI, and by the time we get that we'll have much more interesting applications to showcase I'm sure.
I can definitely see this tech leading to ever more clever linters though. Having an AI reading my code, interpreting the comments, variable and function names and use that to look for bugs in my implementation could be highly valuable, assuming that the number of false positives isn't overwhelming. "Your comment says that you apply the discount only on palindromes but your condition here is wrong and your code ends up doing the opposite".
We already have AI-generated code. It's just that compilers work well enough to actually use, we understand their limitations, and so we stopped calling microcode compilers "automatic programmers" back in the 80s.
I think we will have useful source code synthesizers for high-level languages at some point in the next few years. They will probably look more like alphazero than gpt-3 though.
I'm not sure that they will be commercially successful. The way you automate away programmers is Wix, not Ruby on Rails program synthesizers.
Yeah. What I want is the inverse. Get me some AI that predicts which lines of code I write that are likely to cause bugs. Kinda like a linter meets a fuzzer.
This is a motte-and-bailey argument if I've ever seen one. It's true that it's simple code. It's true that it's not going to replace programmers anytime soon. It's true that this is not novel computer science. But this is clearly a novel program and it's also clearly not something that other methods could have done.
"not novel code" here was referring GP's "novel programming task", not the synthesis method. I think we're probably using different definitions of "task". Where you mean it in a very particular sense (this exact piece of code) and I mean it in a more "writing if-else blocks inside a single procedure with no loops and no recursion using functions that are in-scope" sense.
The proper way to determine if there's anything interesting here would be to run gpt-3 on some existing program synthesis benchmarks. Literally any program synthesizer can look super impressive if you just show one working example in a yt video. My suspicion is that gpt-3 isn't going to do particularly well on those benchmarks at least out of the box, and that getting it to work as well as sota would require a bunch of non-trivial engineering work.
You have a much rosier view of program synthesis than I do. Could you link a paper that you think is particularly impressive? I know Idris can do trivial inferences interactively, but I don't know anything that can do anything non-trivial that isn't also very slow and very unreliable.
IIUC, the Generalized Program Synthesis Benchmark Suite[1] is still mostly unsolved, including problems like “Given three strings n1, n2, and n3, return true if length(n1) < length(n2) < length(n3), and false otherwise.”
My point wasn't that current program synthesis is particularly great, although I do think modern program synthesis tools can probably beat gpt-3 on lots of problems (and allow that the other direction is probably true too...)
My point was that I'm skeptical that GPT-3 would do particularly well on those benchmarks without lots of additional effort. And then, since you can build pretty much anything anyway with enough blood and sweat, the actual question is: would the same amount of effort poured into an alternative approach generate the same/better results but in a way that's far easier to interpret/extend?
It could work. But the yt video alone is more "huh, interesting" than "wow, impressive". If that makes sense.
Well the key difference is that you don't have to think much to get a code-specialized language model, and when you do train one it's much more general (eg. inferring constraints from text, using user-provided types, correctly naming variables, less prone to exponential complexity as sample length grows, etc.). And then the model just gets better over time as AI improves, and all you have to provide is a comparatively cheap bit of compute.
I got the impression from you saying “You can synthesize programs at this level of complexity with a few minutes on a single ten year old laptop using 5-10 year old algorithms.” that you thought this was generally solved at this level of complexity, rather than merely true for an easier example here and there.
Maybe it would be helpful if you gave an example of the simplest python function it won't be able to synthesize, and if/when they release the code GPT into the API we can test your prediction.
Umm https://www.youtube.com/watch?v=y5-wzgIySb4
That was a smaller model fine-tuned on Github IIRC.