A brand new benchmark examine launched by Vals AI means that each legal-specific and normal massive language fashions are actually able to performing authorized analysis duties with a stage of accuracy equaling or exceeding that of human legal professionals.
The report, VLAIR – Legal Research, extends the earlier Vals Legal AI Report (VLAIR) from February 2025 to incorporate an in-depth examination of how varied AI merchandise deal with conventional authorized analysis questions.
That earlier report evaluated AI instruments from 4 distributors — Harvey, Thomson Reuters (CoCounsel), vLex (Vincent AI), and Vecflow (Oliver) — on duties together with doc extraction, doc Q&A, summarization, redlining, transcript evaluation, chronology technology, and EDGAR analysis.
This follow-up examine in contrast three authorized AI programs – Alexi, Counsel Stack and Midpage – and one foundational mannequin, ChatGPT, in opposition to a lawyer baseline representing conventional guide analysis.
All 4 AI merchandise, together with ChatGPT, scored inside 4 factors of one another, with the authorized AI merchandise performing higher general than the generalist product, and with all performing higher than the lawyer baseline.
The best performer throughout all standards was Counsel Stack.
Main Distributors Did Not Take part
Sadly, the benchmarking didn’t embody the three largest AI authorized analysis platforms: Thomson Reuters, LexisNexis and vLex.
Based on a spokespeople for Thomson Reuters and LexisNexis, neither firm opted into collaborating within the examine. They didn’t not say why.
vLex, nonetheless, initially agreed to have its Vincent AI take part within the examine, however then withdrew earlier than the ultimate outcomes had been revealed.
A spokesperson for vLex, which was acquired by Clio in June, mentioned that it selected to not take part within the authorized analysis benchmark as a result of it was not designed for enterprise AI instruments. The spokesperson mentioned vLex could be open to becoming a member of future research that match its focus.
Overview of the Research
Vals AI designed the Authorized AI Report back to assess AI instruments on a lawyer-comparable benchmark, evaluating efficiency throughout three weighted standards:
- Accuracy (50% weight) – whether or not the AI produced a substantively right reply.
- Authoritativeness (40%) – whether or not the response cited dependable, related, and authoritative sources.
- Appropriateness (10%) – whether or not the reply was well-structured and could possibly be readily shared with a consumer or colleague.
Every AI product and the lawyer baseline answered 210 questions spanning 9 authorized analysis varieties, from confirming statutory definitions to producing 50-state surveys.
Key Findings
- AI Now Matches or Beats Legal professionals in Accuracy
Throughout all questions, the AI programs scored inside 4 proportion factors of each other and a mean of seven factors above the lawyer baseline.
- Legal professionals averaged 71% accuracy.
- Alexi: 80%
- Counsel Stack: 81%
- Midpage: 79%
- ChatGPT: 80%.
When grouped, each legal-specific and generalist AIs achieved the identical general accuracy of 80%, outperforming legal professionals by 9 factors.
Considerably, for 5 of the query varieties, on common, the generalist AI product offered a extra correct response than the authorized AI merchandise, and one query kind the place the accuracy was scored the identical.
“Each authorized AI and generalist AI can produce extremely correct solutions to authorized analysis questions,” the report concludes.
Even so, the report discovered a number of situations the place the authorized AI merchandise had been unable to supply a response. This was as a consequence of both technical points or deemed lack of obtainable supply knowledge.
“Pure technical points solely arose with Counsel Stack (4) and Midpage (3), the place no response was offered in any respect. In different instances, the AI merchandise acknowledged they had been unable to find the precise paperwork to offer a response however nonetheless offered some type of response or rationalization as to why the accessible sources didn’t help their potential to offer a solution.”
- Authorized AI Leads in Authoritativeness
Whereas ChatGPT matched its legal-AI rivals on accuracy, it lagged in authority — scoring 70% to the authorized AIs’ 76% common. The distinction, Vals AI mentioned, displays entry to proprietary authorized databases and curated quotation sources, which stay differentiators for legal-domain programs.
“The examine outcomes help a typical assumption that entry to proprietary databases, even when composed primarily of publicly accessible knowledge, does end in differentiated merchandise.”
- Jurisdictional Complexity Stays Laborious for All
All programs struggled with multi-jurisdictional questions, which required synthesizing legal guidelines from a number of states. Efficiency dropped by 11 factors on common in comparison with single-state questions.
Counsel Stack and Alexi tied for greatest efficiency on these, whereas ChatGPT trailed carefully.
- AI Excels at Sure Duties Past Human Pace
The AI merchandise outperformed the lawyer baseline on 15 of 21 query varieties — usually by broad margins when duties required summarizing holdings, figuring out related statutes, or sourcing current caselaw.
For instance, AI responses had been accomplished in seconds or minutes, in comparison with legal professionals’ common 1,400-second response latency (~23 minutes).
And the place the AI merchandise outperformed the people on particular person questions, they did so by a large margin – a mean of 31 proportion factors.
- Human Judgment Nonetheless Issues
Legal professionals outperformed AI in roughly one-third of query classes, significantly these requiring deep interpretive evaluation or nuanced reasoning, equivalent to distinguishing comparable precedents or reconciling conflicting authorities.
These areas underscore, because the report put it, “the enduring fringe of human judgment in advanced, multi-jurisdictional reasoning.”
Methodology
The examine was performed blind and independently evaluated by a consortium of regulation corporations and lecturers.
Every participant answered an identical analysis questions crafted to reflect real-world lawyer duties. Evaluators graded each response utilizing an in depth rubric (which the report contains).
The AI distributors represented had been:
- Alexi – authorized analysis automation startup (based 2017).
- Counsel Stack – open-source authorized data platform.
- Midpage – AI analysis and brief-generation software.
- ChatGPT – generalist massive language mannequin (GPT-4).
Vals AI cautioned that the benchmark covers normal authorized analysis solely, not duties equivalent to drafting pleadings or producing formatted citations.
And, because the report notes, “Authorized analysis encompasses a variety of actions … however there may be not at all times a single right reply ready prematurely.”
Backside Line
The VLAIR – Authorized Analysis examine reinforces what many within the authorized tech trade have already noticed, which is that AI programs – each generalist and domain-trained – are quickly closing the standard hole with human authorized researchers, significantly in accuracy and effectivity.
But, the sting stays with legal-specific AIs in trustworthiness and supply quotation, suggesting that proprietary knowledge entry is the following aggressive frontier.
For regulation corporations, company authorized departments, and AI distributors alike, the examine serves as a clear benchmark – a uncommon apples-to-apples comparability — for understanding the place in the present day’s fashions shine and the place human experience stays indispensable.
Even so, the examine is weakened by the failure of the three largest AI authorized analysis platforms to take part. This isn’t the fault of Vals AI, but it surely leaves one questioning why the massive three all opted out.

