On The Importance Of Teaching Dissent To Legal Large Language Models

Machine learning from legal precedent requires curating a dataset comprised of court decisions, judicial analysis, and legal briefs in a particular field that is used to train an algorithm to process the essence of these court decisions against a real-world scenario. This process must include dissenting opinions, minority views, and asymmetrical rulings to achieve near-human legal rationale and just outcomes. 

The use of machine learning is continuing to extend the capabilities of AI systems in the legal field. Training data is the cornerstone for producing useable machine learning results. Unfortunately, when it comes to judicial decisions, at times the AI is only being fed the majority opinions and not given the dissenting views (or, ill-prepared to handle both). We shouldn’t want and nor tolerate AI legal reasoning that is shaped so one-sidedly.

Make sure to read the full paper titled Significance Of Dissenting Court Opinions For AI Machine Learning In The Law by Dr. Lance B. Eliot at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3998250

(Source: Mapendo 2022)

When AI researchers and developers conceive of legal large language models that are expected to produce legal outcomes it is crucial to include conflicting data or dissenting opinions. The author argues for a balanced, comprehensive training dataset inclusive of judicial majority and minority views. Current court opinions tend to highlight the outcome, or the views of the majority, and neglect close examination of dissenting opinions and minority views. This can result in unjust outcomes, missed legal case nuances, or bland judicial arguments. His main argument centers around a simple observation: justice is fundamentally born through a process of cognitive complexity. In other words, a straightforward ruling with unanimous views has little value in learning or evolving a certain area of the law but considering trade-offs, reflecting on and carefully weighing different ideas and values against each other does.  

This open-source legal large language model with an integrated external knowledge base exemplifies two key considerations representative of the status quo: (1) training data is compiled by crawling and scraping legally relevant information and key judicial text that exceeds a special area and is not limited to supporting views. (2) because the training data is compiled at scale and holistically, it can be argued that majority views stand to overrepresent model input considering that minority views often receive less attention, discussion, or reflection beyond an initial post-legal decision period.  In addition, there might be complex circumstances in which a judge is split on a specific legal outcome. These often quiet moments of legal reasoning rooted in cognitive complexity hardly ever make it into a written majority or minority opinion. Therefore it is unlikely to be used for training purposes.

Another interesting consideration is the access to dissenting opinions and minority views. While access to this type of judicial writing may be available to the public at the highest levels, a dissenting view of a less public case at a lower level might not afford the same access. Gatekeepers such as WestLaw restrict the audience to these documents and their interpretations. Arguments for a fair learning exemption for large language models arise in various corners of the legal profession and are currently litigated by the current trailblazers of the AI boom. 

A recent and insightful essay written by Seán Fobbes cautions excitement when it comes to legal large language models and their capabilities to produce legally and ethically accurate as well as just outcomes. From my cursory review, it will require much more fine-tuning and quality review than a mere assurance of dissenting opinions and minority views can incorporate. Food for thought that I shall devour in a follow up post.

Forecasting Legal Outcomes With Generative AI

Imagine a futuristic society where lawsuits are adjudicated within minutes. Accurately predicting the outcome of a legal action will change the way we adhere to rules and regulations. 

Lawyers are steeped in making predictions. A closely studied area of the law is known as Legal Judgment Prediction (LJP) and entails using computer models to aid in making legal-oriented predictions. These capabilities will be fueled and amplified via the advent of AI in the law.

Make sure to read the full paper titled Legal Judgment Predictions and AI by Dr. Lance B. Eliot at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3954615

We are in Mega-City One in the year 2099AD. The judiciary and law enforcement are one unit. Legal violations, disputes, infringements of social norms are enforced by street judges with a mandate to summarily arrest, convict, sentence, and execute criminals. Of course, this is the plot of Judge Joseph Dredd, but the technology in the year 2023AD is already on its way to making this dystopian vision a reality. 

Forecasting the legal outcome of a proceeding is a matter of data analytics, access to information, and the absence of process-disrupting events. In our current time, this is a job for counsel and legal professionals. As representatives of the courts, lawyers are experts in reading a situation and introducing some predictability to it by adopting a clear legal strategy. Ambiguity and human error, however, make this process hardly repeatable – let alone reliable for future legal action. 

Recent developments in the field of computer science, specifically around large-language models (LLM), natural language processing (NLP), retrieval augmented generation (RAG), and reinforced learning from human feedback (RLHF) have introduced technical capabilities to increase the quality of forecasting legal outcomes. It can be summarized as generative artificial intelligence (genAI). Crossfunctional efforts between computer science and legal academia coined this area of study “Legal Judgment Prediction” (LJP).  

The litigation analytics platform “Pre/Dicta” exemplifies the progress of LJP by achieving prediction accuracy in the 86% percentile. In other words, the platform can forecast the decision of a judge in nearly 9 out of 10 cases. As impressive as this result is, the author points out that sentient behavior is a far-fetched reality for current technologies, which are largely based on statistical models with access to vast amounts of data. The quality of the data, the methods leveraged to train the model, and the application determine the accuracy and quality of the prediction. Moreover, the author makes a case for incorporating forecasting milestones and focusing on those, rather than attempting to predict the final result of a judicial proceeding that is very much dependent on factors that are challenging to quantify in statistical models. For example, research from 2011 established the “Hungry Judge Effect” which in essence stated a judge’s ruling has a tendency to be conservative if it happens before the judge had a meal (or on an empty stomach near the end of a court session) versus the same case would see a more favorable verdict if the decision process took place after the judge’s hunger had been satisfied and his mental fatigue had been mitigated. 

Other factors that pose pitfalls for achieving near 100% prediction accuracy include the semantic alignment on “legal outcome”. In other words, what specifically is forecasted? The verdict of the district judge? The verdict of a district judge that will be challenged on appeal? Or perhaps the verdict and the sentencing procedure? Or something completely adjacent to the actual court proceedings? It might seem pedantic, but clarity around “what success looks like” is paramount when it comes to legal forecasting.  

While Mega-City One might still be a futuristic vision, our current technology is inching closer and closer to a “Minority Report” type of scenario where powerful, sentient or not, technologies churn through vast amounts of intelligence information and behavioral data to forecast and supplement human decision making. The real two questions for us as a human collective beyond borders will be: (1) how much control are we willing to delegate to machines? and (2) how do we rectify injustices once we lose control over the judiciary? 

About Black-Box Medicine

Healthcare in the United States is a complex and controversial subject. Approximately 30 million Americans are uninsured and at risk of financial ruin if they become ill or injured. Advanced science and technology could ease some of the challenges around access, diagnosis, and treatment if legal and policy frameworks allow innovation to balance patient protection and medical innovation.  

Artificial intelligence (AI) is rapidly moving to change the healthcare system. Driven by the juxtaposition of big data and powerful machine learning techniques, innovators have begun to develop tools to improve the process of clinical care, to advance medical research, and to improve efficiency. These tools rely on algorithms, programs created from health-care data that can make predictions or recommendations. However, the algorithms themselves are often too complex for their reasoning to be understood or even stated explicitly. Such algorithms may be best described as “black-box.” This article briefly describes the concept of AI in medicine, including several possible applications, then considers its legal implications in four areas of law: regulation, tort, intellectual property, and privacy.

Make sure to read the full article titled Artificial Intelligence in Health Care: Applications and Legal Issues by William Nicholson Price II, JD/PhD at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3078704

When you describe your well-being to ChatGPT it takes about 10 seconds for the machine to present a list of possible conditions you could suffer from, common causes, and treatment. It offers unprecedented access to medical knowledge. When you ask how ChatGPT concluded your assessment, the machine is less responsive. In fact, the algorithms that power ChatGPT and other AI assistants derive their knowledge from a complex neural network of billions of data points. Information – some publicly available, others licensed and specifically trained into the model’s predictive capabilities – that is rarely analyzed for accuracy and applicability to the individual circumstances of each user at the time of their request. OpenAI, the current market leader for this type of technology, is using reinforced learning from human feedback and proximal policy optimization to achieve a level of accuracy that has the potential to upend modern medicine by making healthcare assessments available to those who cannot afford it. 

Interestingly, the assessment is something of a black box for both medical professionals and patients. Transparency efforts and insights into the algorithmic structure of machine learning models that power these chat interfaces still seem to be insufficient to explain reason and understanding about how the specific recommendation came to be and whether the prediction is tailored to the users’ medical needs or derived from statistical predictions. The author paints a vivid picture by breaking down the current state of medicine/healthcare and artificial intelligence and characterizing it with the “three V’s”: 

  1. Volume: large quantities of data – both public and personal, identifiable (health) information that is used to train ever-voracious large language models. Never before in history has mankind collected more health-related data through personal fitness trackers, doctor appointments, and treatment plans than it does today.  
  2. Variety: heterogeneity of data and access beyond identity, borders, languages, or culture references. Our health data comes from a wealth of different sources. While wearables track our specific wellbeing; location and travel data may indicate our actual wellbeing. 
  3. Velocity: fast access to data – in some instances with seconds to process medical data that otherwise would have taken weeks to process. Arguably, we have come a long way since WebMD broke down the velocity barrier. 

The “three V’s” allow for quick results, but usually lack the why and how a conclusion has been reached. The author coined this as “Black-Box Medicine”. While this creates some uncertainty, it also creates many opportunities for ancillary medical functions, e.g. prognostics, diagnostics, image analysis, and treatment recommendations. Furthermore, it creates interesting legal questions: how does society ensure black-box medicine is safe and effective and how can it protect patients and patient privacy throughout the process? 

Alomst immediately the question of oversight comes to mind. The Food and Drug Administration (FDA) does not regulate “the practice of medicine” but could be tasked to oversee the deployment of medical devices. Is an algorithm that is trained with patient and healthcare data a medical device? Perhaps the U.S. Department of Health and Human Services or local State Medical Boards can claim oversight, but the author argues disputes will certainly arise over this point. Assuming the FDA would oversee algorithms and subject them to traditional methods of testing medical devices, it would likely subject algorithms to clinical trials that couldn’t produce scientific results because an artificial intelligence, by virtue of its existence, changes over time and adapts to new patient circumstances. Hence the author sees innovation at risk of slowing down if the healthcare industry is not quick to adopt “sandbox environments” that allow safe testing of the technology without compromising progress. 

Another interesting question is who is responsible when things go wrong? Medical malpractice commonly traces back to the doctor/medical professional in charge of the treatment. If medical assessment is reduced to a user and a keyboard will the software engineer who manages the codebase be held liable for ill-conceived advice? Perhaps the company that employs the engineer(s)? Or the owner of the model and training data? If a doctor is leveraging artificial intelligence for image analysis – does it impose a stricter duty of care on the doctor? The author doesn’t provide a conclusive answer and courts yet have to decide case law of this emerging topic in healthcare. 

While this article was first published in 2017, I find it to be accurate and relevant today as it raises intriguing questions about governance, liability, privacy, and intellectual property rights concerning healthcare in the context of artificial intelligence and medical devices in particular. The author leaves it to the reader to answer the question: “Does entity-centered privacy regulation make sense in a world where giant data agglomerations are necessary and useful?”   

The Problem With Too Much Privacy

The debate around data protection and privacy is often portrayed as a race towards complete secrecy. The author of this research paper argues that instead, we need to strike a balance between protection against harmful surveillance and doxing on one side and safety, health, access, and freedom of expression on the other side.

Privacy rights are fundamental rights to protect individuals against harmful surveillance and public disclosure of personal information. We rightfully fear surveillance when it is designed to use our personal information in harmful ways. Yet a default assumption that data collection is harmful is simply misguided. Moreover, privacy—and its pervasive offshoot, the NDA—has also at times evolved to shield the powerful and rich against the public’s right to know. Law and policy should focus on regulating misuse and uneven collection and data sharing rather than wholesale bans on collection. Privacy is just one of our democratic society’s many values, and prohibiting safe and equitable data collection can conflict with other equally valuable social goals. While we have always faced difficult choices between competing values—safety, health, access, freedom of expression and equality—advances in technology may also include pathways to better balance individual interests with the public good. Privileging privacy, instead of openly acknowledging the need to balance privacy with fuller and representative data collection, obscures the many ways in which data is a public good. Too much privacy—just like too little privacy—can undermine the ways we can use information for progressive change. Even now, with regard to the right to abortion, the legal debates around reproductive justice reveal privacy’s weakness. A more positive discourse about equality, health, bodily integrity, economic rights, and self-determination would move us beyond the limited and sometimes distorted debates about how technological advances threaten individual privacy rights.

Make sure to read the full paper titled The Problem With Too Much Data Privacy by Orly Lobel at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4578023

The United States has historically frayed away from enacting privacy laws or recognizing individual privacy rights. Arguably, this allowed the United States to innovate ruthlessly and progress society relentlessly. The Fourth Amendment and landmark Supreme Court cases shaped contemporary privacy rights in the U.S. over the past half of the century. In the aftermath of the September 11, 2001 terrorist attacks, however, privacy rights were hollowed out when H.R. 3162 aka the Patriot Act was passed, which drastically expanded the Government’s surveillance authority. In 2013, whistleblower Edward Snowden released top-secret NSA documents to raise the public’s awareness of the scope of surveillance and invasion of privacy done to American citizens and citizens of the world by and large. In 2016, the European Union adopted regulation EU 2016/679 aka General Data Protection Regulation (GDPR). Academic experts who participated in the formulation of the GDPR wrote that the law “is the most consequential regulatory development in information policy in a generation. The GDPR brings personal data into a complex and protective regulatory regime.” This kickstarted a mass adoption of privacy laws across different States from California’s Consumer Protection Act of 2018 (CCPA) to Virginia’s Consumer Data Protection Act of 2021 (VCDPA). 

History, with all its legislative back-and-forth evolutions, illustrates the struggle around balancing data privacy with data access. Against this backdrop, the author argues that data is information and information is a public good. Too much privacy restricts, hampers, and harms access to information and therefore innovation. And, while society has always faced difficult choices between competing values, modern technology has the capability to effectively anonymize and securely process data, which can uphold individual privacy rights while supporting progressive change.   

Structuring Technology Law

Techlaw, which studies how law and technology interact, needs an overarching framework that can address the common challenges posed by novel technologies. Generative artificial intelligence seems to be a novel technology that introduces a plethora of legal uncertainties. This chapter excerpt and paper examines lawmakers, legislators, and legal actors legal response options to techlaw uncertainties and inspires a structured approach to creating an overarching framework.  

By creating new items, empowering new actors, and enabling new activities or rendering them newly easy, technological development upends legal assumptions and raises a host of questions. How do legal systems resolve them? This chapter reviews the two main approaches to resolving techlaw uncertainties. The first is looking back and using analogy to stretch existing law to new situations; the second is looking forward and crafting new laws or reassessing the regulatory regime.

Make sure to read the full paper titled Legal Responses to Techlaw Uncertainties by Rebecca Crootof and BJ Ard at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4545013

(Source: Tierney/Adobe Stock Photography via law.com)

New technologies often expand human interactions and transactions beyond previously codified regimes. This creates dissonance between historic legal decision-making and present or future adjudication of conflict. The authors argue that technology challenges laws in three distinct ways: (1) application of laws, (2) normative, i.e. creating an undesired result or legal loophole, and (3) institutional, i.e. which body should regulate and oversee a specific technology. To illustrate with a simplified, practical example: OpenAI released with ChatGPT a program that creates content based on the depth of access to training data and level of quality control. Does the act of creating content from training data which originates from thousands of human creators constitute a copyright violation? Is copyright law applicable? If so, does a judicial order contradict the purpose of intellectual property rights? Or, perhaps to take it a step further should property rights be applicable to artificial intelligence in the first place? 

The authors offer a two-pronged approach to overcome these challenges by (1) adopting a “looking back” and (2) a “looking forward” mindset when interpreting and resolving legal uncertainties. They discuss these approaches as binary to emphasize the distinctions between them, but they exist on a continuum. Looking back is using analogies to extend existing law to new situations. Looking forward is creating new laws or reevaluating the regulatory framework. They argue that technology law needs a shared methodology and overarching framework that can address the common challenges posed by novel technologies. Without diving into a backwards approach, which is commonly taught in law school, let’s skip to future proofing new laws. Lawmakers represent an eclipse of society with all its traditional and modern challenges. They have to balance between ease of amending a law and its scope. For a number of reasons, they often prefer stability over flexibility and flexibility over precision. Actual decision-making comes down to passing tech-neutral laws or tech-specific laws. Tech-neutrality implies a broad and adaptable set of regulations applicable to various technologies, offering flexibility and reducing the need for frequent updates when new tech emerges. However, they can be vague and overly inclusive, potentially interfering with desirable behaviors and enforcement. Tech-specific laws, on the other hand, are commonly clear in language and tailored to specific issues, making compliance easier while still promoting innovation. Yet, they may become outdated and create legal loopholes or greyareas if not regularly updated, and crafting them requires technical expertise. Technical expertise in particular is hard to convey to an ever-aging body of political representatives and lawmakers. 

Structuring technology law seems to prefer a high-level of flexibility and adaptability over system stability. However, the nuances and intricacies of technology and its impact on society can’t be quantified or summarized in a brief chapter. This original excerpt builds upon content Crootof and Ard originally published in 2021. You can read the full paper titled “Structuring Techlaw” at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3664124 to gain full perspective of lawmakers’ legal response options to techlaw uncertainties. 

AI in Legal Research and Decision Making

Traditionally, legal research and judicial decisions are performed by legally certified, skilled humans. Artificial intelligence is supporting and enhancing these processes by introducing text analysis tools, case comparison, document review at scale, and accurate case predictions among other things. Where are the technical and ethical boundaries of machine-enhanced judicial decision-making? And, how long until large language models interpret and explain laws and societal norms better than legally certified, skilled humans do? 


This paper examines the evolving role of Artificial Intelligence (AI) technology in the field of law, specifically focusing on legal research and decision-making. AI has emerged as a transformative tool in various industries, and the legal profession is no exception. The paper explores the potential benefits of AI technology in legal research, such as enhanced efficiency and comprehensive results. It also highlights the role of AI in document analysis, predictive analytics, and legal decision-making, emphasizing the need for human oversight. However, the paper also acknowledges the challenges and ethical considerations associated with AI implementation, including transparency, bias, and privacy concerns. By understanding these dynamics, the legal profession can leverage AI technology effectively while ensuring responsible and ethical use.

Make sure to read the full paper titled The Role of AI Technology for Legal Research and Decision Making by Md Shahin Kabir and Mohammad Nazmul Alam at https://www.researchgate.net/publication/372790308_The_Role_of_AI_Technology_for_Legal_Research_and_Decision_Making

I want to limit this post to the most interesting facet of this paper: (1) machine learning as a means to conduct legal research and (2) expert systems to execute judicial decisions.  

The first part refers to the umbrella term machine learning, which in the legal profession comes down to predictive or statistical analysis. In other words, ML is a method to ingest vast amounts of legal and regulatory language, analyze, classify, and label it against a set of signals. For example, think about all laws and court decisions concerning defamation that were ever handed down. Feed the statistical means into your ML system and deploy it against a standard intake of text looking to identify (legally) critical language. Of course, this is an exaggerated example, but perhaps not as far-fetched as it seems. 

The second part refers to the creation of decision support systems, which – as far as we understand the author’s intent here – are designed to be the result of the aforementioned ML engagement that is tailored to the situation and, ideally, executed autonomously. It helps humans to identify potential legal risks. It helps to shorten the time required to overview an entire, complex case. If set and deployed accurately, these decision support systems could become automated ticket systems upholding the rule of law. That is a big if. 

One of the challenges for this legal technology is algorithmic hallucinations or simply put – a rogue response. These appear to take place without warning or noticeable correlation. They are system errors that can magnify identity or cultural biases. This raises ethical questions and liability for machine mistakes. Furthermore, it raises questions of accountability and the longevity of agreed-upon social norms. Will a democratic society allow its norms, judicial review, and decision-making to be delegated to algorithms?  

For some reason, this paper is labeled August 2023 when in fact it was first published in 2018. I only discovered this after I started writing. ROSS Intelligence has been out of business since 2021. Their farewell post “Enough” illustrates another challenging aspect of AI, legal research, and decision-making: access.     

W36Y23 Weekly Review: X Corp. v. California, Maryland v. Instagram/TikTok, and Government Takedown Requests

+++X Corporation Challenges California Law for Transparency in Content Moderation 
+++Maryland School District sues Instagram, TikTok, YouTube and others over Mental Health
+++Appeals Court Limits Government Power to Censor Social Media Content
+++California Lawmakers Wrestle with Social Media Companies over Youth Protection Laws

X Corporation Challenges California Law for Transparency in Content Moderation 

California’s AB 587 law, which demands that social media platforms reveal how they moderate content related to hate speech, racism, extremism, disinformation, harassment, and foreign political interference, is being challenged by X, the company that runs Twitter. X says that the law infringes on its constitutional right to free speech by making it use politically charged terms and express opinions on controversial issues. The lawsuit is part of a larger conflict between California and the tech industry over privacy, consumer protection, and regulation.

Read the full report on techcrunch.
Read the full text of Assembly Bill 587.
Read the case X Corporation v. Robert A. Bonta, Attorney General of California, U.S. District Court, Eastern District of California, No. 2:23-at-00903.

Maryland School District sues Instagram, TikTok, YouTube and others over Mental Health

A school district in Anne Arundel County, Maryland is taking legal action against major social media companies, such as Meta, Google, Snapchat, YouTube, and TikTok. The school district accuses these companies of causing a mental health crisis among young people by using algorithms that keep them hooked on their platforms. The school district says that these platforms expose young users to harmful content and make them spend too much time on screens. The school district demands that these platforms change their algorithms and practices to safeguard children’s well-being. The school district also wants to recover the money that it has spent on addressing student mental health issues.

Read the full report on WBALTV.
Read the case Board of Education of Anne Arundel County v. Meta Platforms Inc. et alia, U.S. District Court, Maryland, No. 1:23-cv-2327.

Appeals Court Limits Government Power to Censor Social Media Content

A federal appeals court has narrowed a previous court order that limited the Biden administration’s engagement with social media companies regarding contentious content. The original order, issued by a Louisiana judge on July 4th, prevented various government agencies and officials from communicating with platforms like Facebook and X (formerly Twitter) to encourage the removal of content considered problematic by the government. The appeals court found the initial order too broad and vague, upholding only the part preventing the administration from threatening social media platforms with antitrust action or changes to liability protection for user-generated content. Some agencies were also removed from the order. The Biden administration can seek a Supreme Court review within ten days. 

Read the full report on the associated press.
Read the case Missouri v. Biden, U.S. District Court for the Western District of Louisiana, No. 3:22-CV-1213.

California Lawmakers Wrestle with Social Media Companies over Youth Protection Laws

A bill to make social media platforms responsible for harmful content died in a California committee. Sen. Nancy Skinner (D-Berkeley) authored SB 680, which targeted content related to eating disorders, self-harm, and drugs. Tech companies, including Meta, Snap, and TikTok, opposed the bill, saying it violated federal law and the First Amendment. Lawmakers said social media platforms could do more to prevent harm. Another bill, AB 1394, which deals with child sexual abuse material, passed to the Senate floor. It would require platforms to let California users report such material, with fines for non-compliance.

Read the full report on losangelestimes
Read the full text of Senate Bill 680.
Read the full text of Assembly Bill 1394.

More Headlines

  • Copyright Law: “Sam Smith Beats Copyright Lawsuit Over ‘Dancing With a Stranger’” (by Bloomberg Law)
  • Copyright Law: “Copyright Office Denies Registration to Award-Winning Work Made with Midjourney” (by IP Watchdog)
  • Cryptocurrency: “Who’s Afraid Of (Suing) DeFi Entities?” (by Forbes)
  • Privacy: “Meta Platforms must face medical privacy class action” (by Reuters
  • Social Media: “Meta-Backed Diversity Program Accused of Anti-White Hiring Bias” (by Bloomberg
  • Personal Injury: “New York man was killed ‘instantly’ by Peloton bike, his family says in lawsuit” (by CNBC)
  • Social Media: “Fired Twitter employee says he’s owed millions in lawsuit” (by SF Examiner)
  • Social Media: “Georgetown County School District joining lawsuit against Meta, TikTok, Big Tech” (by Post and Courier
  • Defamation: “Elon Musk to sue ADL for accusing him, X of antisemitism” (by TechCrunch)

In-Depth Reads

  • Surveillance Capitalism: “A Radical Proposal for Protecting Privacy: Halt Industry’s Use of ‘Non-Content’” (via Lawfare)

In Other News (or publications you should read)

This post originated from my publication Codifying Chaos.

Machine Learning from Legal Precedent

When training a machine learning (ML) model with court decisions and judicial opinion, the result of these rulings is the training data needed to optimize the algorithm that determines an outcome. As lawyers, we take the result of these rulings as final. In some cases, however, the law requires change when rulings become antiquated or conflict with a shift in regulations. This cursory report explores the level of detail needed when training an ML model with court decisions and judicial opinions.   


Much of the time, attorneys know that the law is relatively stable and predictable. This makes things easier for all concerned. At the same time, attorneys also know and anticipate that cases will be overturned. What would happen if we trained AI but failed to point out that rulings are at times overruled? That’s the mess that some using machine learning are starting to appreciate.

Make sure to read the full paper titled Overturned Legal Rulings Are Pivotal In Using Machine Learning And The Law by Dr. Lance B. Eliot at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3998249

(Source: Mapendo 2022

Fail, fail fast, and learn from the failure. That could be an accurate summary of a computational system. In law, judicial principles demand a less frequent change of pace. Under the common law principle of stare decisis, courts are held to a precedent that can either be a vertical or horizontal rule. Vertical stare decisis describes lower courts are bound by higher courts ruling whereas horizontal stare decisis describes an appellate court decision can become a guiding ruling but only for similar or related cases on the same level. In essence, stare decisis is meant to instill respect for prior rulings to ensure legal consistency and predictability. 

In contrast, the judicial process would grind to a halt if prior decisions could never be overturned or judges wouldn’t be able to deviate and interpret a case without the dogma of stare decisis. Needless to say, overturning precedent is the exception rather than the rule. According to a case study of 25,544 rulings of the U.S. Supreme Court of the United States from 1789 to 2020, the court only overturned itself in about 145 instances, or 0.56%. While this number might be considered marginal, it does have a trickle-down effect on future court rulings at lower levels. 

A high-level description of current ML training procedures could include the curation of a dataset comprised of court decisions, analysis, and legal briefs in a particular field that is used to train an algorithm to process the essence of these court decisions against a real-world scenario. On its face, one could argue to exclude overturned, outdated, or dissenting rulings. This becomes increasingly difficult for legal precedent that is no longer fully applicable yet still recognized by some of the judiciary. Exclusion, however, would lead to a patchwork of curated data that would not be robust and capable of reaching legal reasoning of high quality. Without the consideration of an erroneous or overturned decision, a judge or an ML system could not develop a signal around pattern recognition and sufficiently adjudicate cases. On the other hand, mindlessly training an ML model with everything available could lead the algorithm to amplify erroneous aspects while ranking lower current precedents in a controversial case. 

This paper offers a number of insightful takeaways for anyone building an ML legal reasoning model. Most notably there is a need for active curation of legal precedent that includes overturned, historic content. Court decisions and judicial opinions must be analyzed for their intellectual footprint that explains the rationale of the decision. Once this rationale is identified, it must be parsed against possible conflicts and dissent to create a robust and just system.