Conspiracy against programmers: Microsoft, GitHub and OpenAI are accused of hiding the fact of theft of the license code
Copilot, a smart development assistant, has been violating copyright law for a long time.
Microsoft’s GitHub platform reportedly deliberately configured Copilot so that the tool generates small changes to the code provided to developers, and so that the output is not marked as a direct copy of existing software.
The lawsuit, originally filed back in November last year on behalf of four unidentified plaintiffs, alleges that Copilot, a tool to help developers write code, modeled on OpenAI’s Codex and commercialized by Microsoft GitHub, was trained on public domain code in violation of copyright law. law and software licensing requirements, as the tool represented someone else’s code as its own.
The companies have tried to get the case dismissed, but so far have only been able to refute some of the claims. The judge left the underlying copyright and licensing issues intact and allowed the plaintiffs to re-file several lawsuits with more details.
The amended lawsuit now covers eight counts instead of twelve, retaining copyright infringement, open source license infringement, unfair enrichment and unfair competition claims. In addition, several new allegations are being added to replace those that were sent for revision: selling licensed materials in violation of GitHub policy and intentionally interfering with alleged economic relations.
The complaint also includes code samples written by the plaintiffs, which Copilot allegedly reproduced verbatim. The judge overseeing the case allowed the plaintiffs to remain anonymous in their court filings due to credible threats of violence against their attorney, so the plaintiffs’ example license code was changed to make it difficult to identify them. However, with a high degree of probability, the plaintiffs are still known to the defendants in this case.
The updated lawsuit also alleges that in July 2022, in response to Copilot’s public criticism, GitHub introduced a custom filter called “Offers matching public code” to avoid viewing software proposals that duplicate other people’s work.
“When the filter is enabled, GitHub Copilot compares code suggestions that are around 150 characters long against the public code on GitHub. If a match is found, no suggestion will be shown to the user,” the GitHub documentation explains.
However, the complaint claims that the filter is essentially useless because it only checks for exact matches and does nothing to detect output that has been slightly modified. In fact, the plaintiffs allege that GitHub is trying to avoid copyright and license infringement by itself modifying the Copilot output in such a way that it appears that it was not exactly copied.
“In the hands of GitHub, the propensity for small cosmetic changes to co-pilot output is a feature, not a bug. These small cosmetic changes mean that GitHub can provide Copilot customers with an unlimited number of modified copies of the licensed material, without ever running a verbatim code filter,” reads the amended complaint.
The lawsuit also alleges that machine learning models such as Copilot have a parameter that controls how much the output changes quite precisely. “Copilot is the original method of software piracy,” the plaintiffs concluded in their class action complaint.
In turn, Microsoft representatives deny these allegations and state that they are doing everything in order to simplify the programming process and make developers happier: “We are confident that Copilot adheres to applicable laws, as we have been committed to the responsible implementation of innovations in Copilot from the very beginning. . We will continue to invest in the tool and champion the AI-powered development experience.”
This example clearly shows how once again the development and widespread implementation of neural networks trained on public data causes open discontent from the public.
Previously, one of the high-profile examples of such “legal piracy” was the Stability AI company, which trained its Stable Diffusion neural network on tens of thousands of images from the paid photo stock GettyImages. This fact quickly surfaced due to the presence of the photostock branded watermark on images generated by the neural network, in connection with which the photostock sued on StabilityAI.
Whether Microsoft, GitHub and OpenAI manage to get out of a similar situation with Copilot more gracefully, so that such a solution suits everyone, only time will tell.