Thursday, July 18, 2024

Copyright Claim Fails in GitHub Case: A Turning Point for Generative AI?

Copyright Claim Fails in GitHub Case: A Turning Point for Generative AI?

Janpha Thadphoothon

I must confess here that I am not a software developer or a programmer—certainly not a computer nerd or scientist. I am just someone who is interested in the use of AI for the betterment of society.

What is at the heart of the issue? Generative AI is shaking up the traditional intellectual property (IP) framework by challenging the idea that only human-created works are protected by copyright laws. With AI's ability to create original content, the line between human and machine creativity is becoming blurred, disrupting established views on authorship and ownership.



Before diving into the court case, let's explore the landscape of AI and intellectual property (IP). The use of AI in creative works presents both pros and cons.


Generative AI has significant pros in terms of innovation and creativity. It can create original content, pushing the boundaries of traditional creativity. Additionally, AI's ability to process vast amounts of data enables the production of unique and innovative works.


There are potential legal adjustments to consider. Some jurisdictions are contemplating amendments to text and data mining exemptions, as well as fair use or fair dealing exceptions, to cover AI training, potentially easing copyright concerns. These exemptions could allow activities that might otherwise be seen as copyright infringement, thereby fostering innovation.

The cons of using generative AI include copyright infringement concerns, as training AI often involves using copyright-protected content, raising questions about infringement. Developers using third-party content for training AI models risk facing copyright claims. Additionally, there is significant legal ambiguity regarding the use of generative AI, impacting developers, content creators, and copyright owners, with varying approaches across jurisdictions creating an uneven regulatory landscape. Protecting content creators' rights is also a challenge; new legislation may be needed to balance these rights with the need to foster innovation, and without clear legal guidance, content creators may feel their rights are inadequately protected against unauthorized use of their work by AI systems.

There are several considerations to bear in mind. AI's learning process is similar to human learning, where new creations are often influenced by previous knowledge and experiences. Legislators need to find a balance that incentivizes innovation while ensuring content creators' rights are respected. The ongoing legal uncertainty highlights the need for clear guidelines to address the intersection of AI innovation and copyright law.

While generative AI presents significant opportunities for innovation and creativity, it also poses complex legal challenges that require careful consideration and potential legislative adjustments to ensure a fair balance between fostering technological advancement and protecting intellectual property rights.

As far as I know, the intersection of artificial intelligence and copyright law is a complex and rapidly evolving battlefield. Recently, a significant decision was made in the ongoing legal tussle involving GitHub, Microsoft, and OpenAI. A U.S. federal judge dismissed crucial claims in a class-action lawsuit brought by developers against these tech giants, marking the first decision in a series of court actions related to generative AI.

What’s New

The case, filed in November 2022 by programmer Matthew Butterick and the Joseph Saveri Law Firm, claimed that GitHub Copilot, powered by OpenAI Codex, had generated unauthorized copies of open-source code hosted on GitHub. The plaintiffs argued that this constituted copyright infringement. However, after several attempts by the defendants to get the lawsuit thrown out, the judge dismissed the claims of copyright infringement and unfair profit.


The Case Details

The lawsuit targeted GitHub Copilot for allegedly copying public code without proper attribution. Initially, in May 2023, the judge dismissed some claims but allowed the plaintiffs to revise their arguments. The plaintiffs then focused on GitHub Copilot’s duplication detection filter, which revises output that matches public code on GitHub. They argued that this feature demonstrated Copilot’s ability to copy code from OpenAI Codex’s training set. However, the judge found that the plaintiffs failed to present concrete evidence that Copilot could generate substantial copies of code and dismissed the copyright claim with prejudice, meaning it cannot be refiled.

Additionally, the judge dismissed the claim that GitHub profited unjustly by charging for access to GitHub Copilot. Under California law, unjust enrichment requires proof of enrichment through “mistake, fraud, coercion, or request,” which the plaintiffs could not demonstrate.

Yes, But...

While this lawsuit has been significantly reduced, it isn’t over. A breach-of-contract claim remains, focusing on whether OpenAI and GitHub used open-source code without proper attribution, violating open-source licenses. The plaintiffs also plan to refile their unjust-enrichment claim.


Behind the News

This lawsuit is one of several testing the copyright implications of training AI systems. Other notable cases involve Getty Images, the Authors’ Guild, The New York Times, and a consortium of music-industry giants. These cases hinge on the argument that copying copyrighted works for training AI models violates the law — a claim that the plaintiffs in the GitHub case failed to substantiate.


Why It Matters

This case specifically concerns code written by open-source developers, but its implications are far-reaching. A final verdict could shape how code can be used and how developers utilize generative AI in their work. Although this dismissal is not a final verdict, it supports the notion that AI developers have broad rights to use data for training models, even if that data is protected by copyright.

My Personal Opinion

Even though I am no computer expert, I would like to offer my personal view regarding the issues. In my view, AI should be allowed to do with data, including open-source code, anything that humans can legally and ethically do, such as studying and learning. I believe that the judge’s decision provides much-needed clarity for AI developers on how they can use training data. Moreover, it could establish that it is ethical to use code-completion tools trained on open-source code.

For me, this decision is a crucial step in defining the legal landscape for AI and its interaction with copyright law. It’s a topic worth watching closely, as the outcomes of these cases will undoubtedly shape the future of AI development and its ethical use of data.


What are my opinions on these?

Personally, I could be wrong, but I think AI agents should be treated like humans in terms of how they curate and share learned information. This perspective encourages a more robust and diverse exchange of ideas, fostering innovation and creativity. By allowing AI to curate and share content, we can democratize information, making it more accessible and beneficial for education and research. This approach aligns with human learning processes, where we read, discuss, and critique ideas, integrating AI more intuitively into society.

However, this idea raises significant legal, ethical, and practical concerns. Unlike humans, AI lacks personal responsibility and ethical judgment, which could lead to copyright infringements, misinformation, or the spread of biased or harmful content. Additionally, humans can critically evaluate information, while AI might struggle with quality control. There are also questions about accountability—who is responsible if AI disseminates incorrect or harmful information? Content creators might feel threatened by AI repurposing their work, leading to potential financial losses and a disincentive for original creation. Balancing innovation with the protection of intellectual property rights, alongside establishing mechanisms for accountability and quality control, is crucial.


What is your perspective on this?


Reference

The Batch (17 July 2024). DeepLearning.AI

World Economic Forum (Jan 13th, 2024)
Will copyright law enable or inhibit generative AI?
From https://www.weforum.org/agenda/2024/01/cracking-the-code-generative-ai-and-intellectual-property/




Janpha Thadphoothon is an assistant professor of ELT at the International College, Dhurakij Pundit University in Bangkok, Thailand. Janpha Thadphoothon also holds a certificate of Generative AI with Large Language Models issued by DeepLearning.AI.

No comments:

Post a Comment

Why Write Tanka?

Why Write Tanka? By Janpha Thadphoothon I would like to introduce to you another poetic form from Japan – tanka. A tanka is a Japanese poem ...