Abstract: In response to the repeated calls for copyright reform to address the issue of Generative AI, this blogpost explores the idea of introducing a statutory license for machine learning purposes for generative AI as a compromise solution to secure a vibrant environment for AI development while preserving the central role played by human creators. It summarizes the main findings of the article: “The Forgotten Creator: Towards a Statutory Remuneration Right for Machine Learning of Generative AI”, forthcoming in the Computer Law and Security Review (available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4594873).
Generative AI is disrupting the creative process(es) of intellectual works on an unparalleled scale. More and more AI systems offer services that push users’ production capacity of new literary and artistic works beyond unforeseen barriers. Algorithmic tools are gradually colonizing any creative sector, from being able to generate text (i.e., ChatGPT, Smodin), to perform music (i.e., AIVA, Beatoven, Soundful), to draw images (i.e., Dall-e, Midjourney, DreamStudio), to shoot movies (i.e., Deepbrain AI, Veed.io). Apart from revolutionizing the creative markets, the ability to obtain new artworks with an increasingly marginalization of human contribution has inevitably tested the fitness of copyright legislations in all over the world to deal with the so-called “artificial intelligence” (‘AI’).
In a nutshell, generative AI raises two main copyright issues that branch off into further sub-problems which in turn intercept (if not collide with) some fundamental rights, especially freedom of artistic expression, freedom of art and science and the right to science and culture (Arts. 11 and 13 EUCFR, 19 UDHR, 27.1 UDHR, Art. 15.1 a and b ICESCR) and the right to the protection of moral and material interests of creators (Arts. 17.2 EUCFR, 27.2 UDHR, and 15.1 c ICESCR).
From the input side, it is questionable whether the training of AI through the extraction and mining of copyrighted works constitutes a copyright infringement or falls within an exceptional regime which varies from Europe to United States and other parts of the world (in particular Japan has an interesting copyright limitation which might apply). Indeed, human creators seek compensation for the novel use of their intellectual efforts while AI firms aim to maximize the free harvesting of data (including copyright-protected materials) for training their algorithms. From the output side, it is hotly debated whether the contents produced by a generative AI satisfy the protectability requirement under copyright law so that they trigger the exclusive protection.
Courts are already dealing with the first question since some content creators or licensees have filed copyright infringement lawsuits against providers of generative-AI services (namely OpenAI, Meta, Stability AI, and Midjourney). These litigations might have convinced the European legislator to deal with the issue in the proposal for a regulation laying down harmonized rules on Artificial Intelligence (‘Artificial Intelligence Act’ or ‘AIA’), introducing lately a provision to address the issue of transparency with regard to the works used in the machine learning process. Very recently, IP offices and the judiciary started to decide on the copyrightability of AI-generated outputs.
In 2019, the U.S. Copyright Office (USCO) denied copyright protection over a painting titled “A Recent Entrance to Paradise” allegedly realized by the AI system named “Creativity Machine” because the work lacked human authorship. The decision has been confirmed by the Review Board of the Copyright Office in February 2022 as well as in the recent decision by the U.S. District Court of Columbia of 18 August 2023, No 22-1564, specifying that ‘human authorship is a bedrock requirement of copyright’. On 21 February 2023, the USCO reviewed the registration of the comic book “Zarya of the Dawn” (Registration No. VAu001480196) excluding copyright protection over the images produced by the AI Midjourney on the grounds that the changes by the alleged author were “too minor and imperceptible to supply the necessary creativity for copyright protection”.
Moreover, the Italian Supreme Court, in the decision no. 1107 of 16 January 2023, acknowledged copyright protection to a digital flower created with the aid of a software because the human contribution of the author was still identifiable. As a software-implemented creation, it was not in the public domain and the company willing to exploit the work had to clear the right to reproduction.
In China, the Beijing Internet Court denied copyright protection for an AI-generated work because of the lack of human involvement in the creative process. However, in another judgment of 24 December 2019, the Nanshan District Court of Shenzhen awarded copyright to an AI-generated text since it complied with the formal requirement of written work.
In sum, despite the different constitutional frameworks and copyright legislations in force on both sides of the Atlantic, there is a common trend to reject algorithmic authorship based on the historical anthropocentric approach to copyright law. It is very likely that many more cases will be brought to courts in the near future.
This post is focused on the input side of the challenges raised by generative AI. Drawing on a previous paper (see Geiger) and in line with some recent proposals advanced in IP literature (see below), it suggests exploring the idea of introducing a statutory license for machine learning purposes as a compromise solution to ensure an attractive environment for artificial creativity without marginalizing the role played by human authors. This remuneration proposal is rooted on a fundamental rights analysis that balances the competing interests at stake.
In its original version, the AIA did not address copyright aspects. It aimed at striking a balance between enhancing innovation while granting fundamental rights by adopting a risk-based approach that was (quite surprisingly) totally agnostic to intellectual property rights. However, the outlined tensions between providers of generative AI and copyright holders led the European Parliament to include some limited considerations with regard to copyright aspects of machine learning.
Firstly, the amendment 399 to Art. 28 b) offers a notion of generative AI before national legislators engage in their own defining attempts. Indeed, the endemic cross-border applications of this technology make a fragmented approach highly undesirable. The European Parliament proposed to define generative AI as the service provided through ‘foundation models used in AI systems specifically intended to generate, with varying levels of autonomy, content such as complex text, images, audio, or video’. The illustrative list of intellectual works is to be welcome because it enhances the adaptability of the provision to the growing production capacities of generative AI which in the future may cover any kind of creative segment.
Secondly, para. 4, lett. c) of the mentioned amendment imposes on providers of foundation model the obligation to train, design and develop the model in compliance with Union and national legislation on copyright prior to making their service available on the market. To this purpose, providers should publish a sufficiently detailed summary of the use of training data protected under copyright law.
The transparency provision seems to require providers of foundation models to disclose a comprehensive listing of the copyrighted content used for training their algorithm(s), accompanied by precise identification of rightsholders. It can be presumed that these transparency rules have been introduced to allow rightsholders to more effectively exercise the opt-out right from text and data mining exception established by Art. 4.3 of the CDSM Directive. It could also be a first step to establish an obligation to obtain a license for the ML uses in question, should it be considered that these uses fall within the exclusive right (this seems to be the purpose of several lawsuits engaged against AI system producers in the US, claiming that these uses are not fair and therefore not falling under the fair use-exception of US Copyright law). Some great pressure from rightsholders can surely be expected in this regard on this side of the Atlantic as well. A good example of a maximalistic approach is reflected in the recent draft bill proposal introduced in France on September 12, 2023, which proposes to submit the machine learning process to the exclusive control of the rightsholders which works are used and that the authorship of the works generated by AI should be attributed to the authors of the works used in the machine learning process. Further, it obliges to label the generated output “AI generated work” and to list the names of all authors which works have been used in the training process. Such an overreaching solution, no matter how well intended, would be very detrimental to the development of AI systems and result in making jurisdiction adopting it very unattractive for these innovative sectors.
Furthermore, it has been rightly stressed that the fulfillment of the transparency obligation transparency with regard to the work used appears quite unfeasible because of the low, still inhomogeneous, threshold of originality, the fragmentation of copyright across various jurisdictions and its multiple ownerships, the absence of a mandatory registration process and the general inadequate state of ownership metadata (see Quintais). Also, the technical feasibility needs to be confirmed as algorithms can be trained from an immense variety of sources and it might not always be easy to determine precisely which sources have been used.
It becomes crucial to elucidate the specific content of this proposals in the AI act during trilogue negotiations. The rationale behind the latest amendment to the AIA is quite clear: ensuring collaboration between providers of generative AI services and copyright holders as regards to this new form of exploitation of creative works. The great divergence of interests and the high transaction costs of a potential licensing solution makes it however unlikely that agreed solutions can be elaborated without future legislative intervention. It is also probably not desirable that this crucial question for the future of creativity in the digital environment is left solely to the self-regulation of the various market players.
Moreover, the amendment seems to provide an effective enforcement mechanism to the opt-out right set forth by Art. 4.3 of the CDSM Directive. In the absence of a report containing all the copyrighted works mined and extracted for machine learning purposes, it would be nearly impossible for rightsholders to discover that their work has been injected into the software, except for blatant cases where the initial work is recognizable in the AI output or when there are other clear indications such as the image produced by Stability AI that showed two football players with a watermark very similar to that of Getty Images.
However, the provision may produce an unintended – or at least undesirable – consequence: a sharp cut in the datasets available for the algorithmic training resulting from a massive exercise of the opt-out right. This would in turn affect the quality of the AI-produced outputs according to the old adage in information systems “garbage in, garbage out”. The narrative on biases in AI is rich of examples on the nexus between flawed inputs and flawed outputs, such as the stereotyped representation of female nurse vs. male doctor.
It is a delicate balancing exercise because the introduction of excessive (administrative and/or financial) burdens on AI-providers may limit the input datasets with consequences on the advancement of AI systems. Indeed, the value of generative AI services to ensure support for creative activities shall not be underestimated. Also, it should not be forgotten that Generative AI can also be used for scientific purposes, which might call for differentiated approaches with regard to the purpose of the ML (see Love) in question as the Fundamental Right to research calls for a privileged treatment of research over copyright claims (see Geiger & Jütte). The main challenge is to lay down a legal framework in which artificial intelligence-based tools remains instrumental to human creativity rather than a stronger substitute. In addition, there remain some doubts as to how AI companies will operationalize the reporting obligation under the belated copyright provision.
Various solutions are emerging with the common aim of reconciling the need for massive and accurate data of AI developers with the demand for equitable royalties of copyright holders. Some authors advocate for the establishment of data-sharing agreements with data providers or for royalty-based compensation models in direct contact with content creators (see Lucchi), perhaps accompanied by tools able to verify the training through a sort of reverse engineering mechanism (see Strowel).
However, the gigantic transaction costs of one-to-one negotiations and the risks of having incomplete datasets inhibiting the developments of performing AI systems leads to envisage other solutions that are reconciling more efficiently the interest of AI developers and the remuneration interest of creators, such as the introduction of a remunerated copyright limitation that imposes a general payment obligation on providers of generative AI systems for the use of copyrighted work for machine learning purposes in the context of Generative AI systems (see Christophe Geiger, “When the Robots (try to) Take Over: Of Artificial Intelligence, Authors, Creativity and Copyright Protection”, in: F. Thouvenin, A. Peukert, T. Jäger and C. Geiger (eds.), “Innovation- Creation- Markets, Festschrift Reto M. Hilty”, Springer, Berlin/Heidelberg, 2024 (forthcoming), on file with the author; for similar proposals see also Frosio and, but linked to the AI generated output, see Senftleben). In this perspective, it can be argued that the introduction of a limitation-based remuneration right will preserve the anthropocentric approach to copyright law, preventing an upside-down scenario where human creativity becomes instrumental to artificial creativity.
It is worth recalling that statutory licenses are limitations of the exclusive power arising from copyright protection that are justified by the objective of securing derivative creativity in specific sectors that otherwise are likely to be stagnant (see Geiger). The U.S. copyright system has developed this exceptional “permitted-but-paid” (see Ginsburg) regime which curtails the exclusive rights of copyright owners for specific aims, namely preventing monopoly in the music sector, reducing transaction costs for the license of sound recordings and television programs.
In the EU, Art. 5.2 b of the InfoSoc Directive introduced a remunerated private copying exception or limitation, a similar mechanism that Member States could implement to remunerate the prejudice that rightsholders suffer for the private reproduction of their works. As in the case of Generative AI, content creators were not able to monitor and enforce their rights while their works were used without remuneration. This scenario brought the German legislator to address the problem of the fair remuneration of copyright holders in uncontrollable environments by establishing the first levy system in the 1965 Copyright Act (see Kretschmer). The payment is integrated into a levy that is imposed on the cost of a physical medium or a device which allows the user to duplicate copyright protected works. Then, the collected levies are distributed among content creators and rightsholders based on different criteria across the Member States that have implemented Art. 5.2 b. Applied to machine learning (ML), the remuneration would result from the machine learning uses of copyright protected works by an algorithm for commercial Generative AI purposes.
Interestingly, the Italian legal system offers also an example of such “permitted-but paid”- use with regards to engineering projects that provide original solutions to technical problems. Those works are protected by a 20-years neighboring right lasting from the filing of the work. In particular, Art. 99 of the Italian Copyright Law entitles the author of such works to be fairly compensated for the unauthorized for-profit implementation of the technical project. The justification behind the limitation of the author’s power lies on the interest to foster technical progress in the engineering sector. Hence, the Italian legislator decided to remove obstacles that could hinder innovation in this field.
However, more compelling justifications for introducing a statutory remuneration right for commercial machine learning purposes in place of the opt-out right under Art. 4.3 of the CDSM Directive can be found in the fundamental rights framework. Indeed, reframed from a digital constitutionalist perspective (see De Gregorio), the question of generative AI vs. authors’ remuneration opens the way to statutory licenses as they provide a compelling balance of the different fundamental rights involved.
This exceptional regulatory instrument is rooted on Arts. 11 and 13 EUCF19 UDHR, 27.1 UDHR, 15.1 a and b ICESCR, considering that generative AI training is essential for human beings to explore new avenues of artistic expression that are still unknown. Indeed, some outputs produced by generative AI have the potential to be used by human authors in new forms of art and cultural expressions that benefit the society at large. Access to comprehensive data, even if protected by copyright, is therefore a precondition for the correct operation and advancement of AI systems. It should be made clear that freedom of artistic expression concerns exclusively human beings considering that, at least under the current state of the law, AI does not enjoy the mentioned constitutional right. This implies that the interest in the flourishing of the Generative AI industry remains instrumental to the end objective of increasing human artistic freedom of expression.
The functioning of statutory licenses allows to maximize the number of copyrighted contents exploitable for machine learning purposes while taking into account the interests of the authors to be remunerated for the commercial use of their intellectual efforts, as protected by Arts. 17.2 EUCF, 27.2 UDHR, and 15.1 c ICESCR. AI developers should thus share their revenues with the authors whose works are used to train the algorithms. A right to a fair remuneration granted by transparency obligations makes human beings still encouraged to produce new works (eventually with the expanded creative ability that generative AI models can provide) while securing that the use of their work by AI systems generates a fair return. The human creator remains in this way at the centre of the copyright system.
In order to reduce the risk that such mechanism benefits primarily superstars from every creative sector, considering that a substantial portion of the requests may concern the imitation of the style of well-known celebrities, some empirical studies on the effective risk of discrimination between artists based on their popularity would be very useful to assess the need of recalibrations by the system in place; redistribution systems can be imagined to secure that remuneration flows also to niche creators. However, in general terms, it is reasonable to believe that this policy option, if combined with mandatory collective rights management, could improve the living and working conditions of human authors (on file with the author).
The hype for generative AI comes with some legitimate concerns on the fair remuneration of authors when their protected works serve as input for machine learning purposes. Copyright law is not new to cope with unprecedented forms of exploitation of intellectual works resulting from market practices or technological developments. When the existing legislation did not successfully address the new dysfunctionalities affecting a creative segment, authors experienced a sense of inequality and frustration as deprived of the revenues arising from the new exploitation of their fruits of the labor. The EU has taken action to correct these market failures, for instance by introducing the rental right, the private copying exception, or the contractual protection of authors vs. publishers to ensure an appropriate share of the revenues obtained from information society service providers. However, European copyright law appears still in its nascent phase as regards to statutory licenses and their potential role in solving the conflict between exclusive rights and other concurring interests (see Geiger).
The right to culture and science and freedom of artistic expression, as enshrined by Arts. 11 EUCF, 19 UDHR, 27.1 UDHR, 15.1 a and b ICESCR, can justify the introduction of a statutory license that will grant an appealing environment for AI creativity without jeopardizing the right of the human authors to be remunerated for the commercial use for their works, as established by Arts. 17.2 EUCF, 27.2 UDHR and 15.1 c ICESCR. This compromise solution within the copyright system is the result of a balancing exercise that is inherent to any conflict between divergent fundamental rights. Despite some difficulties in reaching the optimal equilibrium, fundamental rights should remain the compass to navigate the seas of the still unexplored digital ecosystems and frontier technologies (see Geiger). In this case where the exploitation of copyright-protected works to train generative AI models challenged the anthropocentric nature and function of the copyright system, fundamental rights can be valid allies to reconcile machine generated outputs and human creativity.
This blog-post was originally published on the blog “The Digital Constitutionalist” at the following link.