On The Importance Of Teaching Dissent To Legal Large Language Models

Machine learning from legal precedent requires curating a dataset comprised of court decisions, judicial analysis, and legal briefs in a particular field that is used to train an algorithm to process the essence of these court decisions against a real-world scenario. This process must include dissenting opinions, minority views, and asymmetrical rulings to achieve near-human legal rationale and just outcomes. 

tl;dr
The use of machine learning is continuing to extend the capabilities of AI systems in the legal field. Training data is the cornerstone for producing useable machine learning results. Unfortunately, when it comes to judicial decisions, at times the AI is only being fed the majority opinions and not given the dissenting views (or, ill-prepared to handle both). We shouldn’t want and nor tolerate AI legal reasoning that is shaped so one-sidedly.

Make sure to read the full paper titled Significance Of Dissenting Court Opinions For AI Machine Learning In The Law by Dr. Lance B. Eliot at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3998250

(Source: Mapendo 2022)

When AI researchers and developers conceive of legal large language models that are expected to produce legal outcomes it is crucial to include conflicting data or dissenting opinions. The author argues for a balanced, comprehensive training dataset inclusive of judicial majority and minority views. Current court opinions tend to highlight the outcome, or the views of the majority, and neglect close examination of dissenting opinions and minority views. This can result in unjust outcomes, missed legal case nuances, or bland judicial arguments. His main argument centers around a simple observation: justice is fundamentally born through a process of cognitive complexity. In other words, a straightforward ruling with unanimous views has little value in learning or evolving a certain area of the law but considering trade-offs, reflecting on and carefully weighing different ideas and values against each other does.  

This open-source legal large language model with an integrated external knowledge base exemplifies two key considerations representative of the status quo: (1) training data is compiled by crawling and scraping legally relevant information and key judicial text that exceeds a special area and is not limited to supporting views. (2) because the training data is compiled at scale and holistically, it can be argued that majority views stand to overrepresent model input considering that minority views often receive less attention, discussion, or reflection beyond an initial post-legal decision period.  In addition, there might be complex circumstances in which a judge is split on a specific legal outcome. These often quiet moments of legal reasoning rooted in cognitive complexity hardly ever make it into a written majority or minority opinion. Therefore it is unlikely to be used for training purposes.

Another interesting consideration is the access to dissenting opinions and minority views. While access to this type of judicial writing may be available to the public at the highest levels, a dissenting view of a less public case at a lower level might not afford the same access. Gatekeepers such as WestLaw restrict the audience to these documents and their interpretations. Arguments for a fair learning exemption for large language models arise in various corners of the legal profession and are currently litigated by the current trailblazers of the AI boom. 

A recent and insightful essay written by Seán Fobbes cautions excitement when it comes to legal large language models and their capabilities to produce legally and ethically accurate as well as just outcomes. From my cursory review, it will require much more fine-tuning and quality review than a mere assurance of dissenting opinions and minority views can incorporate. Food for thought that I shall devour in a follow up post.

Legislative Considerations for Generative Artificial Intelligence and Copyright Law

Who, if anyone, may claim copyright ownership of new content generated by a technology without direct human input? Who is or should be liable if content created with generative artificial intelligence infringes existing copyrights?

tl;dr
Innovations in artificial intelligence (AI) are raising new questions about how copyright law principles such as authorship, infringement, and fair use will apply to content created or used by AI. So-called “generative AI” computer programs—such as Open AI’s DALL-E and ChatGPT programs, Stability AI’s Stable Diffusion program, and Midjourney’s self-titled program—are able to generate new images, texts, and other content (or “outputs”) in response to a user’s textual prompts (or “inputs”). These generative AI programs are trained to generate such outputs partly by exposing them to large quantities of existing works such as writings, photos, paintings, and other artworks. This Legal Sidebar explores questions that courts and the U.S. Copyright Office have begun to confront regarding whether generative AI outputs may be copyrighted and how generative AI might infringe copyrights in other works.

Make sure to read the full paper titled Generative Artificial Intelligence and Copyright Law by Christopher T. Zirpoli for the Congressional Research Service at https://crsreports.congress.gov/product/pdf/LSB/LSB10922/5


The increasing use of generative AI challenges existing legal frameworks around content creation, ownership, and attribution. It reminds me of the time streaming began to challenge the – then – common practice of downloading copyrighted and user-generated content. How should legislators and lawmakers view generative AI when passing new regulations? 

Copyright, in simple terms, is a type of legal monopoly afforded to the creator or author. It is designed to allow the creator to monetize from their original works of authorship to sustain a living and continue to create because it is assumed that original works of authorship further society and expand knowledge of our culture. The current text of the Copyright Act does not explicitly define who or what can be an author. However, both the U.S. Copyright Office and the judiciary have afforded copyrights only to original works created by a human being. In line with this narrow interpretation of the legislative background, Courts have denied copyright for selfie photos created by a monkey arguing only humans need copyright as a creative incentive.


This argument does imply human creativity is linked to the possibility of reaping economic benefits. In an excellent paper titled “The Concept of Authorship in Comparative Copyright Law”, the faculty director of Columbia’s Kernochan Center for Law, Media, and the Arts, Jane C. Ginsburg refutes this position as a mere byproduct of necessity. Arguably, a legislative scope centered around compensation for creating original works of authorship is failing to incentivize creators and authorship altogether, who, for example, seek intellectual freedom and cultural liberty. This leaves us with a creator or author of a copyrightable work can only be a human. 

Perhaps, generative AI could be considered a collaborative partner used to create original works through an iterative process. Therefore creating an original work of authorship as a result that could be copyrighted by the human prompting the machine. Such cases would also fall outside of current copyright laws and not be eligible for protection. The crucial argument is the expressive element of a creative work must be determined and generated by a human, not an algorithm. In other words, merely coming up with clever prompts to allow generative AI to perform an action, iterating the result with more clever prompts, and claiming copyright for the end result has no legal basis as the expressive elements were within the control of the generative AI module rather than the human. The interpretation of control over the expressive elements of creative work, in the context of machine learning and autonomous, generative AI, is an ongoing debate and likely see more clarification by the legislative and judicial systems.    

To further play out this “Gedankenexperiment” of authorship of content created by generative AI, who would (or should) own such rights? Is the individual who is writing and creating prompts, who is essentially defining and limiting parameters for the generative AI system to perform the task, eligible to claim copyright for the generated result? Is the Software Engineer overseeing the underlying algorithm eligible to claim copyright? Is the company owning the code-work product eligible to claim copyright? Based on the earlier view about expressive elements, it would be feasible to see mere “prompting” as an ineligible action to claim copyright. Likewise, an engineer writing software code performs a specific task to solve a technical problem. Here, an algorithm leveraging training data to create similar, new works. The engineer is not involved or can be attributed to the result of an individual using the product to the extent that it would allow the engineer to exert creative control. Companies may be able to clarify copyright ownership through its terms of service or contractual agreements. However, a lack of judicial and legal commentary on the specific issue leaves it unresolved, or with few clear guidances, as of October 2023.     

The most contentious element of generative AI and copyrighted works is the liability around infringements. OpenAI is facing multiple class-action lawsuits over its allegedly unlicensed use of copyrighted works to train its generative models. Meta Platforms, the owner of Facebook, Instagram, and WhatsApp, is facing multiple class-action lawsuits over the training data used for its large-language model “LLaMA”. Much like the author of this paper, I couldn’t possibly shed light on this complex issue with a simple blog post, but lawmakers can take meaningful action. 

Considerations and takeaways for lawmakers and professionals overseeing the company policies that govern generative AI and creative works are: (1) clearly define whether generative AI can create copyrightable works, (2) exercise clarity over authorship and ownership of the generated result, and (3) outline the requirements of licensing, if any, for proprietary training data used for neural networks and generative modules.

The author looked at one example in particular, which concerns the viral AI-song “Heart On My Sleeve” published by TikTok user ghostwriter977. The song uses generative AI to emulate the style, sound, and likeness of pop stars Drake and The Weeknd to appear real and authentic. The music industry understandably is put on guard with revenue-creating content generated within seconds. I couldn’t make up my mind about it, so here you listen for yourself.