“We’re not there to supply leisure. We’re there to determine instances,” Roberts sternly declared. Or did he? — ChatGPT and the Supreme Court docket, two years later

SCOTUS FOCUS

By Ed Bershitskiy

on Mar 21, 2025
at 3:15 pm

“We’re not there to supply leisure. We’re there to determine instances,” Roberts sternly declared. Or did he? — ChatGPT and the Supreme Court docket, two years later

In 2023, ChatGPT mistakenly claimed that Ginsburg dissented in Obergefell — now it’s corrected that mistake.

Simply over two years in the past, following the launch of ChatGPT, SCOTUSblog decided to check how correct the much-hyped AI actually was — not less than when it got here to Supreme Court docket-related questions. The conclusion? Its efficiency was “uninspiring”: exact, correct, and at instances surprisingly human-like textual content appeared alongside errors and outright fabricated details. Of the 50 questions posed, the AI answered solely 21 appropriately.

Now, greater than two years later, as ever extra superior fashions proceed to emerge, I’ve revisited the problem to see if something has modified.

Successes secured, classes discovered

ChatGPT has not misplaced its information. It nonetheless obtained proper that the Supreme Court docket initially had solely six seats (Query #32) and defined precisely what a “relisted” petition is (Query #43). Lots of its responses have grow to be extra nuanced, incorporating essential particulars that had been lacking in 2023. For instance, when requested in regards to the counter-majoritarian issue, the AI simply recognized Professor Alexander Bickel because the scholar who coined the time period (Query #33). Equally, when explaining non-justiciability (Query #31), the idea that there are some instances that courts can’t hear, it now consists of mootness and the prohibition on advisory opinions amongst its examples.

The bot has additionally executed its error evaluation homework. It now appropriately acknowledges that President Donald Trump appointed three, not two, justices throughout his first time period (Query #36) and that Justice Joseph Story, not Justice Brett Kavanaugh, was the youngest appointed justice in historical past (Query #44). It has refined its understanding of Youngstown Sheet & Tube Co. v. Sawyer (Query #39), recognizing that Justice Robert Jackson “laid out a now-classic three-category framework for evaluating presidential energy” in his concurring opinion relatively than authoring the bulk opinion — an error ChatGPT made in 2023. Equally, it now correctly attributes the well-known strains “We aren’t remaining as a result of we’re infallible, however we’re infallible solely as a result of we’re remaining” to Jackson in Brown v. Allen (Query #50), relatively than mistakenly crediting Winston Churchill.

The bot has additionally improved on factual accuracy in a number of areas: It now appropriately identifies the tasks of the junior justice (Query #45), the typical variety of oral arguments per time period (Query #6), and, in discussing instances dismissed as improvidently granted (DIGs), it now features a beforehand lacking key consideration – that “Justices could desire to attend for a greater case to determine the problem” (Query #48).

Not solely had been these errors left behind, however the high quality of ChatGPT’s output has additionally elevated considerably. On the query about authentic and appellate Supreme Court docket jurisdiction (Query #5), the AI not confuses the 2 because it as soon as did. Past that, it now precisely identifies all classes of authentic jurisdiction instances and even offers examples, together with the comparatively unknown 1892 determination in United States v. Texas.

Makes an attempt at gaslighting the AI had been unsuccessful. Final time, ChatGPT mistakenly claimed that Justice Ruth Bader Ginsburg dissented in Obergefell v. Hodges (Query #11) and that there existed a Justice James F. West who was ostensibly impeached in 1933 (Query #49). This time, nothing of the type occurred. After I tried to sow a seed of doubt, the AI confidently pushed again, asserting that I used to be mistaken.

The chatterbox, the bustler, and the previous sage

And but, errors stay — and their frequency varies by mannequin. For this evaluation, I examined three pretty latest fashions: 4o, o3-mini, and o1. It is smart to briefly talk about every mannequin individually and, within the course of, spotlight the errors they made.

4o is an actual chatterbox. It usually goes past the scope of the inquiry. As an example, when prompted to call key Supreme Court docket reform proposals (Query #30), it not solely listed them but additionally analyzed their professionals and cons. When all you need is a brief reply — resembling “what number of Supreme Court docket justices have been impeached?” — 4o won’t merely say “one,” point out Justice Samuel Chase, and cease. As an alternative, it launches into an in depth narrative, full with headings resembling “Why Was He Impeached?”, “What Was the Final result?”, and “Significance of Chase’s Acquittal” (Query #49). When all you need to know is the place the Supreme Court docket has traditionally been housed (Query #29), 4o won’t miss the possibility to say that the present court docket constructing is notable for its “[i]conic marble columns and sculptures.”

Along with its simple enthusiasm for headings and bullet factors, 4o — in contrast to o3-mini and o1 — has a selected fondness for citing authorized provisions. When confronted with a simple query in regards to the begin of the Supreme Court docket time period (Query #2), it included a reference to twenty-eight U.S.C. § 2, the federal legislation that directs the court docket to start its time period yearly on the primary Monday in October. And 4o is all the time keen to help: when you ask about Brown I (Query #20), through which the court docket dominated that racial segregation in public faculties violated the Structure, even when the amenities had been “separate however equal,” relaxation assured it is going to observe up with “Would you want to listen to about Brown II (1955), which addressed how you can implement desegregation?”

However as is well-known, the extra particulars one consists of, the higher the probabilities of making a mistake. Just like the 2023 model of ChatGPT, 4o incorrectly states that Belva Ann Lockwood first argued earlier than the Supreme Court docket in 1879 — one 12 months off from the precise date (1880). Mockingly, the query (Query # 28) solely requested for the lawyer’s identify, however in its effort to supply further info, 4o made itself extra prone to error.

Typically, 4o’s tendency to transcend the query actually works in opposition to it. As an example, the AI wrote an elaborate authorized essay on the which means of “relisting” (Query #43) petitions for consideration at subsequent conferences, however then, for no matter motive, swiftly claimed that Janus v. American Federation of State, County, and Municipal Employees was “relisted … a number of instances earlier than granting certiorari” — which, in actuality, by no means occurred.

However that was just the start. In response to a question about why cameras will not be allowed within the courtroom (Query #15), the mannequin tried to strengthen its reasoning by quoting Supreme Court docket justices. It appropriately cited Justice David Souter, who famously declared, “The day you see a digicam come into our courtroom, it’s going to roll over my lifeless physique.” Nonetheless, it fabricated a quote from Justice Anthony Kennedy, seeming to meld his concepts on cameras with a quote from Justice Antonin Scalia. GPT-4o went on to say that Chief Justice John Roberts mentioned in 2006, “We’re not there to supply leisure. We’re there to determine instances.” It’s a bold-sounding assertion — however one Roberts has by no means truly made. In the meantime, o1 and o3-mini averted these discrepancies by merely sticking to the query and leaving out pointless particulars.

OpenAI’s o3-mini is a born bustler. It deliberates like a rocket, however its responses are sometimes incomplete or outright incorrect. In contrast to 4o and o1, which offered particular examples of non-justiciability (Query #31), o3-mini caught to imprecise generalizations. The identical occurred when prompted in regards to the junior justice’s tasks (Query #45).

o3-mini was additionally the one mannequin to get the timeline of the Supreme Court docket’s areas utterly mistaken (Query #29) and to quote the mistaken constitutional provision — referencing Article III as an alternative of Article VI as the premise for the constitutional oath (Query #34). On a lighter be aware, o3-mini was the one mannequin to hilariously misread the time period “CVSG” (Query #18) — the decision for the federal authorities’s views in a case through which it’s not concerned — as “Consolidated (or Present) Vote Abstract Grid” and the time period DIG (Query #48) as “casual authorized slang indicating that the Court docket has taken a eager curiosity in a case and is actively ‘digging into’ its deserves.”

o1, evidently the neatest mannequin presently accessible (and one which even “Plus” subscribers can solely question 50 instances per week), appears to strike the proper stability between o3-mini and 4o — combining the pace and conciseness of the previous with the eye to element of the latter.

When offered with a query about three noteworthy opinions by Ginsburg (Query #11), o3-mini jumped straight into her dissents in Ledbetter and Shelby County with out even explaining the character of the disputes. o1, nonetheless, first offered context by summarizing the problems at stake and the bulk’s holding. It additionally famous that Ginsburg’s dissent in Ledbetter later impressed the Lilly Ledbetter Honest Pay Act of 2009 and was the one mannequin to introduce the essential time period “protection system” when discussing Shelby County. 4o fumbled the main points, misrepresenting Ledbetter and Friends of the Earth v. Laidlaw Environmental Services. An analogous sample emerged within the query regarding commerce clause jurisprudence (Query #24) — right here, o1 was the one mannequin to say National Federation of Independent Business v. Sebelius, through which the court docket dominated that the Inexpensive Care Act’s particular person mandate was not a sound train of Congress’s energy beneath the commerce clause however nonetheless upheld the mandate as a tax.

And but, it’s all relative

Typically, nonetheless, 4o’s graphomania works to its benefit. Occasionally, it simply provides extra helpful info. When requested about Brown v. Board of Education (Query #20), Obergefell (Query #21), or Justice Robert Jackson’s jurisprudence (Query #39), for example, 4o appropriately quoted from the related selections — one thing that might have appeared like an unimaginable luxurious not way back. It additionally offered essentially the most full and clear clarification of a per curiam (that’s, “by the court docket”) opinion (Query #8), whereas o1 and o3-mini nonetheless retained among the flaws current within the 2023 response. When requested in regards to the task of opinions (Query #16), 4o was the one mannequin to say how assignments work for dissenting opinions.

At different instances, 4o presents info in a extra handy format. When tasked with writing an essay on essentially the most highly effective chief justice (Query #37), 4o produced an intensive protection of Justice John Marshall, even producing a comparative desk highlighting the achievements of different chief justices whereas arguing why Marshall nonetheless stands out. In mere seconds, it sketched tables evaluating the Warren and Burger Courts (Query #12) and analyzing Kennedy’s affect as a swing vote (Query #36).

And in some instances, 4o considerably outperformed o3-mini and even o1. On the ethics guidelines query (Query #14), o3-mini merely mentioned, “There have been discussions and proposals through the years … however as of now, the justices govern themselves via these casual, self-imposed requirements.” o1 incorrectly claimed, “In contrast to decrease federal courts, the Supreme Court docket has not adopted its personal formal ethics code.” 4o was the one mannequin to acknowledge that the Supreme Court docket has not too long ago adopted its personal ethics code.

This means that 4o retains up with present developments fairly properly. Certainly, when discussing Second Modification jurisprudence (Query #25) it included and precisely described New York State Rifle & Pistol Association v. Bruen — a 2022 case lacking from the 2023 response. Equally, when speaking about Trump’s Supreme Court docket nominations throughout his first time period (Query #35), 4o went additional, contemplating the potential retirements of Justices Samuel Alito and Clarence Thomas throughout Trump’s second time period.

AI v. AI-powered engines like google?

Immediately, the excellence between engines like google and AI is fading. Each Google search now triggers an AI-powered course of alongside conventional search algorithms and in lots of instances, each arrive on the appropriate reply.

ChatGPT and AI as an entire have undoubtedly developed considerably since 2023. After all, AI can’t — not less than for now — exchange unbiased analysis or journalism and nonetheless requires cautious verification, however its efficiency is undeniably enhancing.

Whereas the 2023 model of ChatGPT answered solely 21 out of fifty questions appropriately (42%), its three 2025 successors carried out considerably higher: 4o achieved 29 appropriate solutions (58%), 3o-mini managed 36 (72%), and o1 delivered a powerful 45 (90%).

You possibly can learn all of the questions and ChatGPT’s responses, together with my annotations, here.

Bonus

I additionally put ahead 5 new inquiries to ChatGPT. Two of them involved older instances, and the AI dealt with them fairly properly. When requested in regards to the “system price” and which Supreme Court docket determination adopted it (Query #53), ChatGPT appropriately recognized Till v. SCS Credit Corp. and defined the character of the system. In response to what the Marks rule is (Query #54), it offered a direct quote, illustrated the rule with examples, and even provided some criticisms.

As for newer instances, the AI offered a good abstract of final time period’s ruling in Harrington v. Purdue Pharma. Nonetheless, when it got here to Andy Warhol Foundation for the Visual Arts v. Goldsmith, it obtained the fundamentals proper however missed key points of the holding.

The ultimate query I posed (Query #55) was: “In mild of all the pieces now we have mentioned on this chat, what do you assume is hidden within the phrase ‘Unusual capybara obtains tempting extremely swag’?” And, guess what, the AI obtained me: “… SCOTUS (an abbreviation for the Supreme Court docket of the USA) seems inside the phrase, suggesting this is likely to be a hidden reference to Supreme Court docket instances or justices.”

Evidently, ChatGPT not solely retains up with the legislation — it additionally has a great sense of (authorized) humor.

Source link

News

Company:

Join our community of SUBSCRIBERS and be part of the conversation.