The question is ultimately going to come down to - "Is Copilot the same as a hum...

TuringNYC · on June 23, 2022

>> "Is Copilot the same as a human programmer reading a lot of GPL code and rehashing, in a non-infringing way, the algorithms, functions, and designs used in a lot of FOSS? Or is Copilot performing more of a copy and paste pastiche of code that is protected by intellectual property law?"

IANAL, but isnt the concept of "derived data" pretty standard? You dont need to copy data for it to be infringing. I've tackled derived data clauses regularly when negotiating data contracts at work and there is always verbiage and discussion around it (e.g., are we allowed to publish an average of the purchased data)

pbhjpbhj · on June 23, 2022

An average is not related to artistic aspects of the data and so can't be a derivative in the copyright sense (based on international law - one of the principle Conventions is fully titled "the Berne convention for the protection of literary and artistic laws", that's because copyright protects literary and artistic works).

Provided you have rights to access a body of statistics, then copyright has nothing to say -- save overreaching national caselaw (!) -- on your derivation of mathematical, technical, or scientific data from that work.

But a contractual clause, in general, doesn't care about copyright; of you've contracted not to derive data from a work then that's orthogonal to copyright.

IANA(IP)L, this is my opinion and unrelated to my employment.

jhugo · on June 23, 2022

> The question is ultimately going to come down to - "Is Copilot the same as a human programmer reading a lot of GPL code and rehashing, in a non-infringing way, the algorithms, functions, and designs used in a lot of FOSS? Or is Copilot performing more of a copy and paste pastiche of code that is protected by intellectual property law?"

Of course it isn't the same as a human programmer doing anything. It's a complex piece of software, which we happen to misuse the term "AI" to describe, but it is not intelligent.

mtkhaos · on June 23, 2022

Technically just advanced indexing of code snippets

vinnymac · on June 23, 2022

Precisely. I don't see a difference between what Google indexes on its search engine, and what CoPilot can recommend. Google has been and still does get slapped on the wrist when they don't respond to take down requests. It seems this is missing from CoPilot currently, and will open them up to a number of lawsuits in the future if it continues to operate as it does.

merpnderp · on June 23, 2022

Except Google isn’t creating anything new, Copilot is. I’ve had it kick out some very interesting short stories based on Sherlock Holmes, so if I published those would they infringe?

jhugo · on June 23, 2022

You have no idea where they came from, or whether they are as new as you assume. Maybe they'd infringe, maybe they wouldn't!

merpnderp · on June 23, 2022

Google returned nothing which contained exact matches of some of the more interesting dialogue. It would be a serious find - worthy of a paper - to disprove that GPT-3 is generating novel text/code.

jhugo · on June 24, 2022

It's easy to prove that it sometimes regurgitates text verbatim — just play with it for a while. Having certainty that any given span of text/code is novel is extraordinarily difficult.

InvertedRhodium · on June 24, 2022

Where does human creativity come from?

nojs · on June 23, 2022

> I don't see a difference between what Google indexes on its search engine, and what CoPilot can recommend.

That is an extremely disingenuous take. It produces novel output, so is not merely an “index” in any sense of the word.

greyface- · on June 23, 2022

Each search results page is novel and unique. Even two different users making the same query will get different results thanks to the "search personalization" Google is doing these days.

dahfizz · on June 23, 2022

Google search doesn't synthesize anything. It collects results and orders them according to an algorithm. Copilot and similar language models can synthesize new text. That's clearly different than just presenting existing text.

belorn · on June 24, 2022

Copilot can't create new novel concepts. In the end it is just an complex mathematical formula that return a set of code references with a set of length determined by the math.

The illusion of creativity is similar to that of technology. Sufficient advanced technology is indistinguishable from magic, and sufficient advanced math is indistinguishable from intelligence. The relation between AI and math is the same as the relation between magic and technology.

UncleMeat · on June 23, 2022

All of the large language models can emit text that has never been seen in the training set (unless you go so far to consider each character to be a snippet).

BeefWellington · on June 23, 2022

They can also emit text that it verbatim copied.

Infringement isn't about how the infringing system works, it's about the product of that work.

jhugo · on June 23, 2022

> Infringement isn't about how the infringing system works, it's about the product of that work.

Exactly this. It makes zero difference that you produced your infringing work with the help of a program that happens to be extremely complex and marketed as "AI".

thesz · on June 24, 2022

So do smaller models and I have to note smaller models are better at that.

Paradigma11 · on June 23, 2022

If it gives the same output as a human programmer for the same input, why would it be legally relevant if one system has the intelligent property?

remram · on June 23, 2022

But that's the whole point of copyright. The same piece of code you copy from a Google search can legally be used by you if your developer came up with it, and not if Oracle came up with it. Where you copied it from is the entire point.

jazzyjackson · on June 23, 2022

of course it's not intelligent, but we still have to decide how the law applies to the actions of software, or otherwise re-frame the whole thing to include the co-pilot developers doing the copyright infringement when they trained the model - not the current discussion which gives agency to the IDE plug-in "choosing" a code snippet to paste.

jhugo · on June 23, 2022

I don't see how the way the software was built is particularly relevant.

It's just a tool used by the developer; the onus is on the developer to ensure they don't infringe the licenses of the source code they incorporate in their software. Since Copilot makes it impossible to know where it's barfing code up from and what license that code is under, a developer who cares about not getting sued probably needs to avoid using Copilot.

jazzyjackson · on June 23, 2022

Eh, the law is about making a copy. If my IDE plug-in fills in some code, the question is, did I copy the code, did the robot copy the code, or did the developers that wrote "cp github.db ~/trainingset" copy the code?

ycombobreaker · on June 23, 2022

The authors of the tool created something that can be used for copyright infringement.

The tool itself lacks agency, it did what it was programmed to do.

If you took the tool's suggestions and proceeded to published a derivative work, you may have infringed.

This really doesn't feel any different from P2P filesharing services. Rightsholders have targeted tool publishers in the past, because they are the largest single target and not anonymous; but ultimately the infringement is performed by the end user.

jhugo · on June 24, 2022

This isn't complicated at all. You copied the code, which isn't an issue until you then go on to do something which infringes the license (e.g. publish under a different license, publish binaries without publishing source, publish without attribution, whatever it is that the license requires).

LeifCarrotson · on June 23, 2022

A law that uses the archaic terms "copy and paste", referring to a time when people would make an analog photocopy of a document written using a typewriter, trim it out with scissors or a knife, and glue it to their book with the pasty remains from boiling animal collagen cannot be trusted to apply word-for-word in a time when technology has obsoleted the glue, typewriter, xerox machine, and even the paper.

It is not the same as a human, no, but it's not hard to choose a definition of the word "intelligent" that can accurately describe something that can be done by a program.

When a human walks around a puddle, are they demonstrating intelligence? When a horse avoids stepping in a hole, is the horse intelligent? When a robotic vacuum avoids a stairway, is it intelligent? When a self-driving car avoids a bollard, is that intelligent?

Whether there's a being inside the device that believes it experiences consciousness or not, the same outcome happens. A Searle's Chinese Room that produces copies of Chinese IP, a trained monkey that does so, or a human that does the same thing, the outcome is very similar.

bencollier49 · on June 23, 2022

Perhaps it's a little bit like employing a human programmer with an eidetic memory who occasionally remembers entire largish functions.

If he were able to remember a large enough piece of copyrighted code, and reused it, then it still wouldn't be fair use, even if he changed a variable name here or there, or the license message.

Longlius · on June 23, 2022

Yeah, that's definitely the impression I get from the few Copilot examples I've seen. I've not personally used Copilot so I refrained from making absolute statements about its behavior in my top comment.

But I think the conclusion most people are settling on is that it's definitely infringing.

jka · on June 23, 2022

A possible response that I'd predict from GitHub would be to attribute much/all of the responsibility to the user.

The argument would be along the lines of: you as the user are the one who asked the eidetic programmer (nice terminology, @bencollier49) to produce code for your project; all we did is make the programmer available to you.

WithinReason · on June 24, 2022

Relevant parts from the Copilot FAQ (https://github.com/features/copilot/):

Does GitHub own the code generated by GitHub Copilot?

GitHub Copilot is a tool, like a compiler or a pen. GitHub does not own the suggestions GitHub Copilot generates. The code you write with GitHub Copilot’s help belongs to you, and you are responsible for it. We recommend that you carefully test, review, and vet the code before pushing it to production, as you would with any code you write that incorporates material you did not independently originate.

Does GitHub Copilot recite code from the training set?

The vast majority of the code that GitHub Copilot suggests has never been seen before. Our latest internal research shows that about 1% of the time, a suggestion may contain some code snippets longer than ~150 characters that matches the training set. Previous research showed that many of these cases happen when GitHub Copilot is unable to glean sufficient context from the code you are writing, or when there is a common, perhaps even universal, solution to the problem.

pmarreck · on June 23, 2022

I've used Copilot for months and honestly it's become one of my most favorite inventions in all of programming- and this is key- even when it screws up (such as by suggesting Ruby-syntax code to autocomplete Elixir code). It tickles the "childlike joy" funnybone in me, the same one that got me into programming to begin with. I don't know how long it will take for typing "#ANSI yellow" (for example) and autocompleting to the right codes to get old, or every time it autocompletes anything considered "boilerplate," but it hasn't, yet!

You know, pretty much all of programming can be summed up as "tedious labor elimination," and this tool directs that same labor elimination at the work of programming itself (I no longer have to constantly google syntax idiosyncrasies etc.), and NOW coders are pissed? I don't get it. Eat your own dog food, people, because this is what it looks and tastes like.

As to the copyright infringement or licensing-violation claims, I have yet to see it autocomplete an entire algorithm correctly, or one copied verbatim from somewhere, although that could be mitigated. You still have to pay attention (kind of like Tesla autopilot), it's not going to eliminate your job.

Longlius · on June 23, 2022

No one is complaining about copilot making programming easier or automating it.

We're upset because it's quite literally infringing on intellectual property. Infringing on intellectual property that's been set aside for the exclusive use of the commons.

jazzyjackson · on June 23, 2022

god bless AI for moving human society beyond silly notions like ideas-as-property

copyright was established to increase the innovation and creative will of the arts and sciences, what could increase that creative force more than an AI assistant who has seen every creative work ever made?

account42 · on June 24, 2022

I'd be all for returning ALL code to the commons.

Except that is not what is happening here. The problem is that AI is being used to take code, which was provided to the commons under the explicit condition that anything built with it is also released under the same terms, is now being fed to a magic mystery machine to produce code that can supposedly legally be witheld from the comments. The only code that this affects is the one that was already shared - you won't see Microsoft feeding Windows and Office source code into Copilot anytime soon.

pmarreck · on June 23, 2022

Do you use it? Have you ever used it? How many people making negative comments about it here have actually used it? I don't actually believe many have. I suggest at least trying it out before lighting your torches.

If it infringes everyone equally and everyone equally benefits from the infringement, has a net wrong actually occurred? (which of course begs the "do the ends justify the means" question...)

I don't see how this is any different a form of "infringement" than me copying and pasting snippets of other peoples' code, and then modifying it to suit my particular context, without specific attribution, except that the latter is a much more laborious and time-consuming process than copilot autocomplete, and programming is all about tedium elimination

ryukafalz · on June 23, 2022

> If it infringes everyone equally and everyone equally benefits from the infringement, has a net wrong actually occurred?

It’s not done equally though. Copyleft code is extremely likely to be on GitHub somewhere, while internal proprietary code is often not. Copilot will thus have been trained more on the former than the latter.

> I don't see how this is any different a form of "infringement" than me copying and pasting snippets of other peoples' code, and then modifying it to suit my particular context, without specific attribution

It’s no different, but that is also copyright infringement.

pmarreck · on June 23, 2022

> but that is also copyright infringement.

so basically all of Stackoverflow is copyright infringement and has been for decades? Find me the programmer who has never either 1) copied and pasted directly from the internet, or 2) taken an idea found on the internet and massaged it for their own purposes. I mean... this is basically why programming is so lucrative IMHO. Everyone is piggybacking off of everyone else's work (at least in open source)

zvr · on June 24, 2022

The tens of thousands of developers in a company I am familiar with have taken a basic training on intellectual property concepts and software licenses.

A typical case mentioned in the training is that code from StackOverflow is (probably) licenses under CC-BY-SA 4.0 and as such it can never be copied inside their proprietary-licensed code base.

ryukafalz · on June 24, 2022

This is something I wish more companies would do. It’s sorely needed.

ryukafalz · on June 23, 2022

(Recent) StackOverflow contributions are licensed under CC BY-SA 4.0 by default (though the author can of course release it under any additional licenses they choose): https://stackoverflow.com/help/licensing

If the code is really sufficiently trivial (and I’d guess that most code samples you’ll find on StackOverflow are) you may have a fair use argument in the US. Generally speaking though (and especially for anything nontrivial) you need to respect the license. CC BY-SA 4.0 is one-way compatible with GPLv3, though, so that helps if you’re including it in a GPLv3 codebase: https://creativecommons.org/2015/10/08/cc-by-sa-4-0-now-one-...

pmarreck · on June 23, 2022

Define "sufficiently trivial"

ryukafalz · on June 23, 2022

It's fuzzy and imprecise, as many legal concepts are. Small/unoriginal enough snippets may not even be copyrightable:

https://en.wikipedia.org/wiki/Threshold_of_originality

Then even if it is copyrightable, under some circumstances your use of it may be considered fair use anyway:

https://en.wikipedia.org/wiki/Fair_use#U.S._fair_use_factors

Or potentially de minimis:

https://blogs.library.unt.edu/copyright/2017/09/05/the-de-mi...

But when in doubt, ask for permission or ask a lawyer.

account42 · on June 24, 2022

Even apart from copyright aspect, it would be nice if we as programmers would improve our attitude towards attribution. If researchers can cite the work that has influenced theirs without legal threats than so can we.

belorn · on June 24, 2022

Github explicitly leaves out proprietary code bases, included microsoft windows source code (Microsoft own github and uses it for their own products).

If Microsoft included their own source code when training copilot then at least they would be intellectually honest, but they don't. They only consider GPL and other free and open source code to be up for grabs.

dal · on June 23, 2022

This kind of reminds me of when someone reverse engineers a piece of software to document interfaces, protocols or APIs for the purpose of writing compatible software. Then a second person not involved in the RE process implement compatible software from the documentation the first person wrote.

This is to avoid any contamination and verbatim copies of code. Once you have read a piece of code there is a risk of "contamination" and you will be influenced by it. It does not matter if you directly copy it, write it out from memory or use an AI to regurgitate it. It will be a copy of the code. To me this is very clear.

dayjah · on June 23, 2022

This sounds like “taint” in the M&A space. I’ve very limited experience of it and would be interested in hearing more from the better informed folks on this topic!

My limited experience: my then-employer opted not to acquire a company after doing due diligence. Ultimately we decided that the price of acquisition (both paid out, and also incurred in internal time) was below the cost of building a comparable product ourselves.

As the dev who did the tech portion of the due diligence I was now “tainted” by my knowledge of their system. As a result I could not work directly on the effort to build our own comparable solution.

account42 · on June 24, 2022

Another example is Wine: Anyone who has seen the Windows source code is not allowed to contribute [0]

[0] https://wiki.winehq.org/Developer_FAQ#Who_can.27t_contribute...

jeroenhd · on June 23, 2022

A human who will type out the fast inverse square root algorithm line by line won't be exempt from copyright/license infringement just because he remembered it from the top of their head. However, using the same concepts is likely to be fine outside silly jurisdictions where software patents are a thing.

The difference is that AI isn't able to grasp concepts, it's only capable of rehashing patterns. If it is able to understand concepts then it should be shut down and researched immediately, because it's either close to gaining consciousness or already has done so.

The core of copilot is a file or a block of memory laying out a bunch of floating points that get processed and turned into code. This arrangement of floats is derived from source code, with licenses and copyright notices.

I don't think it's any different from turning code into a compiled program. Any developer will understand that a compiled version of GPL code is a derived work and subject to the GPL license. Why would a compiler that turns code into floats be any different? Sure, those floats get mixed up with the floats from other source code, but linking to GPL'd code does something very similar and is also covered by the license.

It's possible to consider copilot similar to hashing: a SHA hash of a binary isn't subject to the binary's license, that'd be silly. However, hashes are inherently one-way, and copilot isn't.

A question I'd like to ask Microsoft is "if I steal the Windows source code and train an AI on it, can that AI be freely distributed and used for Wine/ReactOS/etc?" If Microsoft sticks to the stance that AI isn't subject to the licenses on software then a leaked source AI should be fine, but if they want to protect their intellectual property then they will send cease and desist letters to anyone even thinking about using such an AI model for code completion. My expectation is that Microsoft will act against such an AI.

Regardless, the fact that Github did not ask permission or provide an opt out before training started is a huge middle finger to all open source developers. Even if they can get away with this stuff legally, this approach has surely offended many open source developers who want big tech companies to abide by their code licenses. I don't do much open source work myself but I've been offended by the whole process from the day copilot rolled out and I don't believe I'm alone in this.

tzs · on June 23, 2022

> A human who will type out the fast inverse square root algorithm line by line won't be exempt from copyright/license infringement just because he remembered it from the top of their head.

A human would probably try to defend against a copyright infringement suit over that by arguing something like the following.

There isn't sufficient creative expression in fast inverse square root (FISR) to be copyrightable. There is plenty of creativity in that thing, but it is in things that are not copyrightable such as the underlying mathematics that it is using. Copyright covers expression of ideas, not use of ideas (that's patents) or the ideas themselves.

The expression in FISR that they probably are copying from is pretty much all just in choosing the names of variables, and most implementations I've seen just use pretty normal names that follow normal naming conventions that people use when they aren't putting any thought into naming their variables.

That level of expression is arguably not creative enough to support copyright, at least in the US after Feist Publications, Inc., v. Rural Telephone Service Co., 499 U.S. 340 (1991) [1].

(I'm assuming that the human didn't do anything stupid, like reproduce the comments too).

[1] https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R....

jeroenhd · on June 23, 2022

I think the FISR is one of the few algorithms that I would actually consider creatively enough to match the creativity requirement. It's counter intuitive math that I would think the vast majority of programmers would never be able to come up with. It's an elegant bit twiddling algorithm that requires one or two blog posts to truly understand, it's not something you read and think "oh, that makes sense, moving on".

Algorithms for generic mathematical operations such as the dot product or matrix multiplication are often trivial to deduce, though optimizer vectorized versions perhaps less so. Most helper functions are unoriginal enough that no reasonable copyright law would protect them, which is also the case for (too) many cases of patented code.

The copyright question does ignore the code license question, though. If a complicated algorithm like FISR is not original enough the what protects any boring old operating system code? What stands in the way of publicly hosting Microsoft's leaked sources, as clearly the code is all quite trivial? There is very little in an operating system that other operating system developers haven't thought of or would reasonably have come up with had they been constrained to the same restrictions.

The variable names are one thing, though they could be chosen much more descriptively. However, the system also output the comment "// what the fuck?" which is not only terribly nondescriptive, it's also something that the system couldn't have come up with if it would have learned from code in any practical form.

The suit you linked is about the difference between information and creativity. However, the case surrounds a data set, something simply factual, rather than a composed piece of information such as code or a book. Code listed on Github is not similar to the listings in a phone book. If they were, all software copyright, proprietary or otherwise, goes down the drain. I think that's impractical to say the least.

OkayPhysicist · on June 23, 2022

Algorithms are patentable, not copyrightable.

FISR could have been patented (and be now in the public domain anyway), but only it's specific implementation in DOOM is covered by copyright.

Also, your argument follows a composition fallacy: emergent properties exist, and thus you cannot simply say that because each individual piece of a whole is trivial, the whole is trivial. Heck, software pretty much by definition goes against that. For relevant precedent, there is no shortage of information that becomes classified when in aggregate. Knowing where a certain piece of infrastructure is isn't likely classified, but knowing where all the strategically important pieces of infrastructure are certainly is.

Which is why the question isn't whether the users of Copilot are infringing someone's GPL (they'd likely have a solid defense based on the individual piece not being sufficient to hold copyright protection), it's whether Copilot itself constitutes a derivative work of its input data, which it consumed as whole (copyrighted) works.

ImprobableTruth · on June 23, 2022

I'm curious as to what distinction you draw between rehashing patterns and grasping a concept.

jeroenhd · on June 23, 2022

That's a philosophical question that nobody can know a definitive answer to.

Personally I'd say the difference is understanding why a certain pattern works rather than blindly inserting whatever works. It's the classic Chinese Room thought experiment.

monkeybutton · on June 23, 2022

Just reading certain code is enough to taint a human programmer though. Some companies have policies against hiring developers with experience on some OSS projects because they have their own clean room implementation they want to protect.

rag-hav · on June 23, 2022

> Some companies have policies against hiring developers with experience on some OSS projects

Can you please elaborate on this?

monkeybutton · on June 23, 2022

They're basically following this process to build their products: https://en.m.wikipedia.org/wiki/Clean_room_design

Sateeshm · on June 23, 2022

Season 1 of halt and catch fire

usrn · on June 23, 2022

Right. I don't see any way this is legal.

Zedmor · on June 23, 2022

Never heard of single one. I bet you just invented it.

homarp · on June 23, 2022

take Windows NT source, train your local version, deploy it on the internet, advertize it does Windows code completion

wait for Microsodlft lawyer to get answer to your original question

vslira · on June 23, 2022

On your tangential note: I always assumed many in the FLOSS side are actually against most cases of copyright as applied to software, but since it is the regulatory standard, they put a strong emphasis on making it work for their purposes, thus the somewhat ironic “copyleft”. It’s a “don’t hate the player hate the game” situation for them

Longlius · on June 23, 2022

This is definitely the orthodox take. If shared source code was the norm and software wasn't subject to copyright (or really if either of those two conditions were met), there'd be no need for FOSS as an ideology. The purpose of copyleft is to ensure that there's a permanent bulwark against code meant for the commons being co-opted by proprietary software vendors and having changes walled off from the community who created the software in the first place.

pabs3 · on June 23, 2022

Source code is essential to FOSS, a public domain binary-only copy of Microsoft Windows definitely would not be FOSS. This is the second item of the open source definition.

https://opensource.org/osd

account42 · on June 24, 2022

Sure, that is a useful condition and is a no brainer to add if you need leverage copyright anyway.

But would it be enough to spur the open source movement on its own if you could legally decompile all binaries and redistribute that? Probably not.

Its not like source vs. binary is a clear distinction - between code obfuscation, generated code, transpilation, etc. there is a lot of wiggle room what should or should not be OK.

pabs3 · on June 24, 2022

The GPL makes it a pretty clear distinction, "preferred form for modification" is pretty clear, but decided on a case-by-case basis. Obfuscated code is not source, generated code is not source, transpilation is often not source but could be depending on how you use it afterwards, bitmap images are often not source but they can be, executables are usually not source but could be, videos are not source but could be. Some links discussing what source is here:

https://www.inventati.org/frx/essays/softfrdm/whatissource.h... https://b.mtjm.eu/source-code-data-fonts-free-distros.html https://wiki.freedesktop.org/www/Games/Upstream/#source https://compliance.guide/pristine https://opengameart.org/forumtopic/source-required-for-art-l... https://wiki.debian.org/rly-free-software

pabs3 · on June 23, 2022

People on the FLOSS side are for software freedom, copyleft is just one of the tools we can use within the current regulatory framework of copyright. If copyright ever went away, we would have to use different tools but would have different opportunities too.

toyg · on June 23, 2022

Those people these days are a vanishing minority. This is not the early-00s anymore.

The reality is that, nowadays, the overwhelming majority of developers touches FOSS code every day and just assumes they're entitled to use it as they see fit. The folks that came up with "copyleft" or care about licenses, are very much not in the driving seat. Blame FAANGs and their hatred for GPL.

DarkWiiPlayer · on June 23, 2022

I think the problem goes a bit deeper than that. From an IP perspective, I think it's reasonable to consider that training an AI on some form of work is using said work to build a new one, just like it would be if it was manually copied in or reproduced.

The problem is that, iirc, GPL didn't consider this at all and still uses language focused on copying code, so something like copilot might slip through the cracks of those definitions.

Then again, the license uses this language when it allows usage of the code in the first place, so one could say that either a) this usage is covered by the license, in which case all conditions apply, or b) it is not covered by the license, in which case... github wouldn't be allowed to use the code at all.

To give an analogy: I think feeding code into an AI is essentially analogous to compiling the code. A machine turns it into something more usable and the original human-written content isn't part of the result anymore, but the intellectual property gets dragged through the process nonetheless. Why would it be any different just because the mechanism of transforming the code into executable software gets a bit more complicated through the usage of AI?

dwild · on June 23, 2022

> Is Copilot the same as a human programmer reading a lot of GPL code and rehashing, in a non-infringing way, the algorithms, functions, and designs used in a lot of FOSS?

It literally can't do it in an "in a non-infringing way" as it wasn't made to do it "in a non-infringing way".

People were able to get copy-pasted code verbatim. It means it does not know whether what it does infringe on the GPL or not.

Let say you find a human that never knew anything about copyright and you show him a bunch of Disney movies and you ask him to make you a movie and he literally copy one of their movie. Does it make it non-infringing? (Funny thing is, even people aware of copyrights does infringe it... so yeah hard to say even a machine could make some non-infringing content).

The solution would be to at least make him aware of copyrights and works with that, but first is it even possible, and seconds, is it even enough...

Sadly nothing will ever be done, at least not until it we feed it Disney movies and it start to affect their bottom lines.

TrustInCopilot · on June 23, 2022

> On a tangential note, I always find the discussions surrounding FOSS licenses and copyright rather amusing in a sad way. There's a certain kind of entitlement a lot of people feel towards FOSS that they certainly do not express towards proprietary software and I imagine this a great source of the resentment and burn-out FOSS maintainers feel.

Definitely. Many of my acquaintances complaining about Github Copilot without trying it themselves regularly pirate movies, shows and music. They also always cheer if there is some court ruling against Facebook or Google, no matter what the actual case is even about.

> The question is ultimately going to come down to - "Is Copilot the same as a human programmer reading a lot of GPL code and rehashing, in a non-infringing way, the algorithms, functions, and designs used in a lot of FOSS? Or is Copilot performing more of a copy and paste pastiche of code that is protected by intellectual property law?"

It seems to me that the regurgitation only happens if you post the first half of the code, expecting the second half. I imagine that the software sees how several hundred repositories (which are all forks) have a very similar pattern and tells you the best fitting approximation of how they continue, which is again very similar.

In the future I can definitely see Github updating their license and some kind of exodus by FOSSers towards GitLab. But I believe that many open source projects will just put up with it, similar to how Youtubers and Twitch streamers want to stay on the premier platform.