Go read a terrific new book on AI in Legal Education

A review of Korin Munsterman, GenAI in Legal Education: A Practical Guide for Professors and Students (CALI eLangdell Press, 2026)

TL;DR. Go download the book. Right now. Read it. Consider it critically with every human and AI tool at your disposal.

There was, until a week ago, no comprehensive guide to generative AI in legal education. The absence was not for lack of interest. The subject had produced a flood of commentary and a patchwork of school policies, but no one had gathered the whole problem between two covers: the architecture of the models, the platforms worth using, the redesign of assessment, the integrity questions, the implications for scholarship. Korin Munsterman, a Professor of Practice at the University of North Texas, has now done it, and done it well. Three cheers as well for the Center for Computer-Aided Legal Instruction (CALI), which took on the project and served as publisher. The result is a free, downloadable comprehensive work ready for classes that begin right now.

The task here was forbidding, for reasons I take up below: a complete account of a field in which no one can keep anymore is wrong somewhere the moment it prints. Professor Munsterman wrote the specifics anyway.

I should state my interest at the outset. I teach a course on large language models for lawyers, I write about AI and legal education, and I came to this book as a working participant in the same argument it joins. I also came to it skeptical, because the genre is crowded with manifestos that gesture at transformation and never tell a professor what to do. This book is the rare one that contains intelligent, contemporary recipes grounded in solid theory. I am going to assign parts of it to my students. That sentence is the most honest review I can give, and the rest of this essay is an attempt to earn it rather than assert it.

The assignment Munsterman set herself is close to impossible. She is drawing the map while the territory moves at a pace where it is now impossible to keep up. Any of the criticisms within this review must thus be automatically prefaced with that recognition. She is writing a practical guide to a technology that changes faster than her chapters can be edited. By her own account, law schools that drafted policies in early 2023 were governing GPT-3.5, and by the time those policies cleared faculty governance the model they described had been superseded twice over.[1] A guide to a moving target will always be wrong somewhere by the time it prints, and Munsterman knows it; she quotes Ethan Mollick’s line that “today’s AI is the worst AI you will ever use,” and builds a late chapter around the proposition that her own book has a shelf life.[2] To commit a prompt framework, a platform recommendation, a rubric, an assignment to print, knowing some of it will age are brave and useful choices. The alternative, a book of timeless principles that never names a tool, would have aged better yet helped no few. The reader should keep that bravery in view through every paragraph that follows, including the critical ones, because most of what I will fault is the unavoidable cost of having been specific and early rather than vague and safe.

What follows, then, is a review of a book I admire, by someone who agrees with most but not all of it. My disagreements cluster at the end of this essay because they are the smaller part of the story. The big picture is the following. This book is very, very good, that it is needed, and that it should be read by – yes, really – everyone connected to legal education. Today, before it ages.

A book measured against its own goals

The fairest way to judge a practical guide is by the promise it makes to its reader, and Munsterman makes hers plainly in the first chapter. She wants to move a law professor from “how do I ban this?” to “how do I use this?” She wants to help the curious-but-stuck faculty member — you know, the one who tried ChatGPT in 2023, found it underwhelming, and never went back. She wants to serve students directly as well as their teachers. She wants to bind academic integrity to professional responsibility. She wants faculty to lead that integration, on the evidence that instructor-designed use helps students where do-it-yourself experimentation does not. She wants to redesign assessment so that it measures something a chatbot cannot hand over. She wants the profession to actually gather evidence about whether any of this works. And she wants to model the thoughtful, critical engagement she hopes her graduates will carry into practice, harnessing the benefits of these tools while mitigating their risks.[3]

Measured against that list of sensible yet ambitious desires, the book mostly delivers. The “where do I start” promise is kept in the how-to chapters and, more impressively, in roughly twenty-nine thousand words of appendices, about a fifth of the book, that supply a runnable comparative assignment, a grading rubric for AI-disclosure, a full diagnostic-and-retest protocol, CustomGPT templates, and two platform walkthroughs.[4] This is the part of the book a faculty member can act on without further translation; it is where the guide is most fully a guide rather than an argument. The binding of integrity to professional responsibility is sustained across the whole book and is a major contribution. The call to gather evidence she answers in the only way a single author can, by identifying the gap and supplying a concrete protocol to start closing it, is more exhortation than delivery given the complexities and costs of that venture, but it is an admirable exhortation, and so I will count that as a promise kept. Two of the nine goals come under real strain: the “use, don’t ban” reframe and the “mitigate the risks” promise both buckle a little under the cumulative weight of the book’s own cautions. I will spend time on that strain later. But a book that fully achieves seven of nine genuinely hard goals, and gets part credit on the other two, is doing pretty darned well.

What the book gets right

The author used the right method to produce the book: dogfoodism. She used the very tools she writes about to produce the book. She drafted it in collaboration with AI (notably Claude) and discloses that fact, as she should, without a hint of shame. She used Claude Cowork to analyze a folder of her own research documents concerning academic integrity and AI policies. She relied on Claude to help draft the comprehensive "Model Law School GenAI Tenure & Promotion Policy" provided in Chapter 12. A non-coder, she used Claude Artifacts to design live, functional web applications that readers can access via links in the book. She used Gemini Nano Banana Pro to generate the visual figures throughout the book that illustrate complex AI concepts, such as Context Windows, AI Alignment, and how Large Language Models use parameters and RLHF. She developed "CustomGPTs" for a variety of purposes. All of this real world experience enhances the quality and credibility of the book.

The technical primer that knows when to stop

Chapter 2 is a primer on how large language models work, and it threads a needle that defeats most lay explanations of this material. It stays shallow where depth would drown the reader and goes deep where a lawyer actually needs it. The architecture of large language models gets the light touch it probably deserves: she names the transformer and self-attention, attributes them to the 2017 Google Attention is All You Need paper, and moves on, sparing the reader the key-value caches and softmax functions that belong in a different book.[5] Then, where it counts for a professional who needs to understand why these systems behave as they do, she goes further than most popular accounts dare. She addresses reinforcement learning from human feedback, Anthropic’s Constitutional AI (hint, it isn't really about the US Constitution), and discusses the helpfulness-versus-harmlessness tension through the sober observation that crowdworker preferences can train a model toward uselessly cautious refusals. The orientation is right for the audience. A lawyer absolutely needs to understand hallucination, deception, and refusal. Much as it creates agony in my mathematical soul to say so, a lawyer probably does not need to understand backpropagation’s chain rule, the importance of residual connections, or the role of masking in the attention layer. (I still might teach them anyway. Spinach.)

One image from the technical material deserves singling out, because it is the kind of insight that earns a book a second reading. Describing the opacity of a neural network, Munsterman writes that observing its reasoning “is like trying to figure out what’s going on in the jury room: we see what goes in, we see what comes out, and everything in between is mysterious.”[6] The analogy is better than she lets on. We have built an entire institution around the principle that the jury’s black box is sacred; we forbid inquiry into deliberations, and we celebrate the opacity as a feature. Yet the identical opacity in a language model provokes alarm and demands for explainability. Munsterman does not draw out the irony, and a future edition could make a meal of it, but the seed is there, and it is a good one.

Platform advice a working professor can use

Chapter 3 is the most directly useful chapter in the book, and it earns that distinction by attending to the consideration most comparison pieces ignore: deployment. Her headline recommendation is to run two complementary tools, ChatGPT and Claude, and her reasoning is practical rather than benchmark-driven. Use Claude to draft and to write the instructions for custom assistants, because it produces cleaner system prompts; use ChatGPT to deploy and share those assistants with students, because students already have ChatGPT and you will not lose half a class explaining how to reach a tool built somewhere else.[7] That second point is the kind of hard-won, unglamorous wisdom that tells you the author has actually stood in front of a room and watched a clever assignment collapse because the students could not log in. The recommendation is sensible, and a working professor can adopt it before lunch. (Of course, it might be out of date by dinner, but that is hardly Munsterman's fault. I discovered this morning, for example, that Gemini's writing capabilities had substantially improved since the last time I seriously examined them.)

The chapter’s treatment of NotebookLM is equally sound and equally practical. Munsterman calls it the first tool to put in a beginner’s hands and the right answer to the equity problem: it is free, it cites its sources, and its design discourages the passive summarizing she warns against everywhere else. I could not agree more: it remains the one best tool. Recommending the free, source-grounded tool as the on-ramp is both the equitable choice and the pedagogically safe one, and she reaches it for the right reasons. She will be all the more correct if rumored upgrades to NotebookLM such as the ability to generate synthetic lectures transpire and if my prayer comes true that various features now reserved for the more expensive subscriptions migrate down to free users.

Prompting advice

The book also provides useful and structured information on how AI users can prompt. Munsterman gives her framework an acronym, CORRECT. The letters stand for Context, Objective, Role, Restrictions, Expectations, Clarify, and Tone: supply the background and the jurisdiction, name the goal, assign the model a persona, fence off what it must not do, specify the format, restate the task in one clean sentence at the end, and set the voice. As mnemonics go it is a good one, and two of its seven letters earn their place beyond mere tidiness. Restrictions — telling the model what not to do, such as inventing citations or wandering outside a jurisdiction — is the move novices never think to make, and it is where a great deal of bad legal output gets headed off before it starts. Clarify — restating the core instruction at the end of a long prompt — exploits a real property of these systems, which weight the most recent tokens heavily. It is the prompt-engineering equivalent of the prayer for relief at the end of a complaint: the line that tells the court what, after all the preamble, you actually want.

Of course, in 2026, CORRECT is most useful as a checklist for recognizing a good prompt, not a ritual to perform from scratch every time — because the faster move is to let the model draft the prompt and judge it against the checklist, and the book's own pages on meta-prompting point straight at it. But as a way to get a nervous beginner from a Google-style keyword search to a genuine instruction, it works, and it is exactly the concrete, usable scaffolding the book delivers at its best.

The book’s spine: integrity as professional formation

If the book has one idea that organizes all the others, it is the claim that a student’s choices about AI are already a rehearsal for practice. The habits a student builds in deciding when to lean on a chatbot, how far to trust its output, and whether to disclose its use are the same habits that will later govern a candor obligation to a court and a competence obligation to a client. Munsterman runs this through the Model Rules with real care. She looks at competence under Rule 1.1, candor under Rule 3.3, supervision of non-lawyer assistance under Rule 5.3 and grounds it all in the cautionary literature of lawyers already sanctioned for filing hallucinated citations.[8] The move reframes the whole integrity conversation. A blanket prohibition teaches a student to hide a tool; a well-built disclosure requirement teaches her to account for her work to a supervising authority, which is exactly what practice will demand. The line she draws between those two outcomes is one of the most valuable things the book has to say, and it recurs, productively, in every chapter that touches student conduct.

Design over surveillance, and the courage to say detection fails

The book is at its strongest, and at its most useful to a faculty reader, when it argues that the answer to AI-enabled cheating is better assignment design rather than better policing. Her view is far more consistent with the recent memo of University of Texas Law dean Robert Chesney and far better than the shortsighted policy adopted by Berkeley Law. Munsterman’s demolition of AI-detection software is thorough, well-sourced, and correct:[9] the tools have unacceptable false positive rates, they fall hardest on non-native English speakers and neurodivergent writers, and they invite due-process disasters of the kind that have already begun to surface in litigation.[10] Her one-sentence statement of the principle is the best in the book: “Law schools should not build disciplinary policies on tools that students can defeat with a single additional prompt.”[11] Exactly! I would add only that the book does not always meet that standard itself, the hinge of my main critique, to which I will return.

Equity, and the democratizing case

Munsterman is clear-eyed and persuasive about AI as a leveling force, and this is one of the places where her moral instincts and her practical advice align without friction. She points out that a ban on AI quietly privileges the students who already have lawyers in the family and money for tutors, and disadvantages the first-generation student who has neither.[12] The argument deserves the prominence she gives it. It lands harder as the economics of legal education worsen and the option of borrowing one’s way through school narrows. A free, capable tutor available at two in the morning is worth most to the student who cannot afford the alternatives. A policy that forbids it falls hardest on the very students legal education says it most wants to help.

Intellectual seriousness where the stakes are highest

Two chapters show the author thinking hardest, and both reward the attention. Chapter 12 on scholarship, tenure, and promotion is the most intellectually serious in the book. It separates three problems the debate usually mashes together — copyright, plagiarism, and the broader question of scholarly integrity — and shows that they call for different responses. It lands on “absolute authorial accountability” as the governing principle, with disclosure demoted to a means rather than a substitute for quality.[13] And it raises, almost in passing, a worry that ought to travel further than it does in the book: that if scholarship routes itself through a handful of models trained on overlapping data, the literature will drift toward the well-cited center and stop surprising us. The concern is not hers alone; a fast-growing literature outside law has begun to document exactly this homogenizing pull.[14] But she deserves credit for being among the first to name it for legal scholarship, and the point deserves a paper of its own.

The book is also willing to ask questions it cannot answer and to admit as much. A hypothetical but very real student in one of Munsterman’s examples asks why she should learn legal analysis if the model already analyzes Supreme Court opinions better than she can, and whether firms will want new associates to do research or to supervise machines that do it. Munsterman’s answer is that we are still figuring it out, and that this generation of students will settle the questions through the choices it makes.[15] That is probably the right answer, delivered in the right register of honest uncertainty, though I do wonder if the market and pressure from clients to get the right answer as inexpensively as possible will have a large say as well.

The small, concrete gifts

Finally, the book is richly sprinkled with specific, usable ideas that justify its existence as a practical guide. I know I intend to benefit from quite a few. Examples include (1) have the model explain a hard doctrine through three non-legal analogies pitched at someone with no background in the field, which forces a clarity that pure exposition rarely produces; (2) feed a model a batch of anonymized student answers to a single exam question, ask it to identify the most common conceptual errors, and have it build a short corrective lesson: an exercise that turns student mistakes into teaching material and, not incidentally, surfaces the places your own teaching fell short; (3) use “funnel” prompting to narrow from a broad question to a precise one in stages. And the book provides me and others a very practical tip that needs to be followed all the time instead of intermittently: rename your chat conversations evocatively so you can find them later. The naming conventions of the models are decent but often too generic and the search functions in tools such as ChatGPT and Claude are peculiarly poor. Those who fail to follow this recommendation risk finding a semester’s worth of unnamed chats becoming an unsearchable heap.[16]

Some criticisms

The criticisms that follow are real, and I hold to them, but I want to frame them honestly. None of them describes a failure of care or intelligence. Each describes a place where the field’s collective understanding has not yet caught up to its own tools, and where Munsterman, writing early and fast, inherited the lag along with everyone else. I offer them as the agenda for a second edition, which this book will earn and should have.

The risk model is calibrated to last year’s chatbot

The book’s cautions — verify every citation, guard every byte of student data, assume the model will confabulate — are mostly correct; a guide that omitted them would be malpractice. But they are calibrated to a bare chatbot working from training data, and the reader of 2026 increasingly works with something else: a model wired to real sources through connectors, able to run code, grounded in documents the user supplies. The single risk the book worries about most, the hallucinated citation, is precisely the one that grounding most reduces, because a model reading the actual case through a research connector is not guessing at it. I've discussed the virtues of connecting frontier AI to legal databases recently in this blog.

What stands out is that Munsterman knows this. She describes the grounded, tool-using system in detail: Claude’s agentic Cowork mode, the connector ecosystem that reaches Westlaw and other services, an appendix that walks through a real document-synthesis session against a folder of files.[17] The capability is in the book. What does not happen is propagation: the grounded system stays quarantined in the platform chapter and a back appendix, fails to permeate the chapters on integrity, assessment, and scholarship, where its existence would lighten the very cautions those chapters pile up. The result is a book whose risk intuitions describe a weaker tool than the one its own later pages demonstrate. The right fix keeps the safeguards, most of which are the genuine floor of professional practice, and updates the risk model to assume the grounded system the reader actually uses. Make that single adjustment and a good deal of the book’s anxiety drains on its own.

A slightly overstated FERPA warning

A second concern is the book’s treatment of student-data privacy, although I will confess that some (AI-assisted) extensive FERPA research the book inspired shows that Munsterman is more nearly right than my first reading credited. Still, I don't think she has it entirely right and her recommendation still carries a cost worth naming. At more than one point Munsterman suggests that FERPA concerns may require a professor to forgo every platform except an institutionally licensed Copilot.[18] The core of her advice is sound, and a good deal sounder than I first allowed: strip identifying information before any work goes to a general-purpose tool, lean on the Canvas anonymizer (if you have one) that does this in a click, and treat the platform as non-fungible: an enterprise system bound by an institutional agreement stands on genuinely different FERPA footing than a personal account. That is not fear; that is the floor. The overreach is in the inference the chapter invites — that the safe course is to abandon every capable tool for an institutionally licensed Copilot. Neither the statute nor even its flawed regulations require nearly so much. The compliant ground is far wider than the recommendation lets on.

De-identification, which the book already endorses, can be pushed further. Ask AI to write a tiny little program that detaches the contents of a student answer in some typical format from any student ID number. Give it a few examples of what you want, and I am confident AI will have it solved before you can reread the FERPA regs. Uploading a student's musings on the dormant commerce clause or federal regulatory overreach does not violate FERPA. Alternatively, although FERPA apparently bans disclosure of student permanent ID numbers – though conceivably not student exam numbers for a semester – along (idiotically) with cryptographic hashes derived therefrom, faculty who want to associate long form exam content with a brief identification string can generate their own reproducible FERPA-compliant and unique identifiers. To do so, they just need to hash the presumably unique contents of a student exam. I've created a free little app (text-hasher) that does this for you. Moreover, synthetic or hypothetical student work, written by you or generated to the shape of the thing you grade, lets a professor build and test a rubric with no student data in the loop at all. The AI-assisted rubric can be compared to a student answer. And where a professor genuinely must process identifying work at scale because they somehow can't detach the FERPA-fatal personally identifiable information from the contents of the exam, there is always the option of using a local model – they are getting decent, particularly with tools – to evaluate student submissions against an AI-assisted rubric. They certainly compare to the last version of Copilot I tried.[19]

A broader discussion of alternatives matters because the default the book steers toward is not free. At my own institution the serious version of the licensed Copilot is $220 per year the faculty member carries personally absent departmental reimbursement. Moreover, it is widely thought to be the least capable of the frontier models. And by Munsterman’s own account it can take two months and a pilot program to get it approved. A recommendation that routes faculty to the slowest, weakest, and costliest option in order to honor a requirement the statute does not impose is, however well-meant, a counsel of fear — and it sits awkwardly beside the book’s larger and better argument that prohibition mistakes a study aid for a threat. I do not fault Munsterman for recommending measures that could theoretically cause a university to lose its federal funding or declining to turn a practical guide into a brief for reform. FERPA is a paper-era disclosure statute pressed onto ephemeral, non-retained, prompt-based interactions it was never written to govern. Saying so properly is the work of a different book – or maybe a co-authored law review article? But a second edition can do the smaller and more useful thing — tell faculty plainly where the law does not reach, so caution tracks the regulatory framework rather than its longest shadow.

Finally, students should understand that FERPA binds institutions that receive federal funds, not students. [19] A student’s own account, used on her own work, raises no FERPA question whatever — so much of what the book frames as a faculty compliance burden can be handed back to students as their own choice. The compliant ground is far wider than the recommendation lets on, and most of the book’s best uses of AI already live on it.

Unrealistic workarounds for the cheating/brain-rot problem

Munsterman’s remedy for AI-enabled cheating is process verification. That is, instead of focusing exclusively on the output, examine artifacts of the process the student went through to see the extent of their reliance on AI. I agree with the diagnosis behind the recommendation: detection fails, and the answer is to design work that rewards genuine engagement. My doubt is about the remedies themselves, and whether they are realistic.

Consider the proposals. Submit a handwritten issue tree “showing initial legal problem decomposition before research begins.”[20] Submit a revision history from a tracked document. Submit a process portfolio. Submit a “GenAI appendix” containing every prompt and output. Each of these treats a produced artifact as proof of the process that supposedly generated it, and each fails the same way. The artifact and the process are not actually welded together. A student can generate a polished analysis with a model, read the finished product, and copy it backward into a handwritten chart in fifteen minutes; the chart will look like forethought but will be a transcription. Handwriting guarantees nothing about when the thinking happened, or whether it happened. A revision history can be typed in stages or simply faked. And the disclosure appendix is worse than fakeable; if you use these tools the way a fluent user does, moving among models and threads, the honest appendix is a forty-page tangle no one will read, least of all a professor already drowning in the substantive submissions. The book half-concedes this elsewhere — it admits a research log “can be faked with GenAI” — without letting the concession spread to the neighboring artifacts that are no less fakeable.

The deepest version of the problem is the book’s faith in reflection as a non-falsifiable demonstration of human involvement. Munsterman writes that a model “can’t generate authentic reflection on the experience of doing analysis.”[21] I do not think that is true. A current model, handed the assignment and the finished memo, will produce a fluent first-person account of which arguments were hard and which paths were abandoned. It may not be authentic in some philosophical sense. It may not be true. But if the professor cannot tell the difference, the philosophical sense does not pay the rent. The metacognitive reflection that recurs across these chapters as the un-fakeable backstop is, I am afraid, fakeable, and the only method in the book that genuinely resists a determined student — the live oral defense — is the one Munsterman concedes does not scale easily to a common law school class of eighty.

None of this shows that Munsterman is wrong about integrity; her central instinct, that design beats surveillance, is correct, and I share it. It shows that her proposed process-verification cures are defeatable by the same single additional prompt that defeats the detectors she rightly rejects. This raises the question the book never quite poses in the open, the question I think sits at the center of this entire subject: some students will misuse the tool no matter how well we teach them, and the measures aimed at catching them tax the honest majority and the exhausted faculty far more than they constrain the determined few. So does the potential for abuse justify ruining the tool for everyone else? I do not have a clean answer. But I notice that the book’s own best answers point away from policing: the student-facing chapter that simply persuades students it is in their interest to do the work, and the assignments that make the human contribution the visible object of assessment. I wish those answers, rather than the forensic ones, had been given the last word.

Potentially excessive optimism about the ongoing need for human lawyers and faculty

My last reservation is the deepest, and it is, for now, a minority view: most of the field shares Munsterman’s cautious optimism. Running through GenAI in Legal Education, as through most writing in this genre, is a reassurance: that whatever these systems come to do, some core of lawyerly judgment remains ours, and that this core is what legal education exists to cultivate. The book tells students that “if you can’t write a persuasive memo without GenAI assistance now, you won’t be able to do it competently for a client later.”[22] It tells them that law is about judgment rather than retrieval, that a model “can’t tell a client whether to settle or go to trial.”[23]

Yes, GenAI still does astonishingly stupid things on occasions. The number of my follow-up prompts that begin "NO, NO NO!" is evidence of that. But I have doubts about the size and persistence of that core of human lawyerly superiority. Already the realm of difficult tasks that a highly skilled human lawyer can perform end to end without error on which a model fails has shrunk from 93% in early 2026 to 87% with the (brief) release of Claude Fable in June of 2026. And while those figures suggest the core is still large, the benchmark on which it is based is challenging: the agent must act as an associate on a client matter, discovering and using only the closed-universe documents and materials provided in the matter files to produce reviewable legal work product that satisfies expert grading rubrics on format, facts, and analysis. When machines score 100% on the LSAT and, even several versions ago, score in the 90th percentile on the bar exam, that is a sign they have many skills that the average attorney does not.

The evidence Professor Munsterman presents in support of her “large core” theory does not impress me. Take her settlement example, which she offers as something a machine cannot do. First, she somewhat undercuts her argument in the same sentence by noting that models are already advising pro se parties to settle or go to trial. More importantly, however, a model can in fact make a settlement recommendation, and in many cases it may make it better than a human lawyer would, because (a) unlike many attorneys it can do the sophisticated math that is sometimes needed (particularly with connectors) and (b) it is not subject to the lawyer’s overconfidence, fatigue, anchoring, or the multitude of documented biases that human judgment carries into exactly these decisions. The claim that judgment is our redoubt rests on an unstated premise that human judgment is good, and the evidence for that premise is thinner than the profession likes to believe. Worse, human judgment is mostly non-falsifiable in practice: we rarely see the counterfactual in which the lawyer advised differently, so we flatter ourselves that the advice we gave was sound. A machine that matches our judgment while shedding our biases might not need to be particularly brilliant to surpass us.

The pattern is general, and the book reproduces it faithfully: each time legal educators name the thing only humans can do, the named thing turns out to be eroding. “It can’t explain its reasoning,” the book says of the model[24] — but reasoning models now expose their intermediate steps, and a platform like Bloomberg Law will show a user the path it traveled to a conclusion. The very same Claude for Word that helped me draft this article contains a “Reasoning” dropdown after each suggestion.

These visible traces may not be a fully faithful account of the model’s internal mechanics, and that caveat is worth keeping; the interpretability problem from Chapter 2 is real. But the flat claim that the machine cannot explain itself, while the human can, gets the comparison backward. Humans are themselves notoriously unable to reconstruct how they reached their judgments.

Also questionable is the evidence Professor Munsterman cites on the importance of law faculty training students how to use AI properly rather than letting them just "run wild." It would be an inconvenient truth indeed were such not the case. But let's consider the issue critically. First, the studies showing the most dramatic upside of guided AI study no one resembling a law student. The doubled gains come from intro physics undergraduates; the floor-raising miracles from Italian, Turkish, and Nigerian secondary pupils; the dependency findings from high-school math and first-semester programming — first-exposure learners in domains with right answers. Munsterman is candid that her flagship example carries that limit: the Harvard tutor's results, she notes, "may not hold for complex synthesis and higher-order critical thinking." But that is not a corner of legal education; it is the whole of it. The evidence is strongest for the kind of learning law school does least.

Second, the cleanest experiment proves less than the book needs. The Wharton math study's guardrailed tutor neutralized the harm of unguided use — students ended up no worse than students who used nothing. The guardrails prevented a loss; they did not manufacture a gain. "Don't let them run wild" and "only a professor can make this technology teach" are different claims, and the data underwrites the first far better than the second.

Third, the evidence captures a moment already closing. These were students meeting GenAI early in its lifespan and perhaps for the first time. First contact happens once. The book concedes it: entering students now arrive with experience, and Munsterman cites Schrepel's two-year experiment, where structured training beat unstructured exposure in year one and then saw "the advantage diminish" in year two as students showed up fluent. The case for heavy, law-professor-imposed scaffolding is strongest in precisely the years that are ending.

And finally, the line between instructor-led success and unguided failure is one the book itself diminishes. The claim that students need live and expensive faculty to build their guardrails assumes they have nowhere else to turn — but the book in my hands is the counterexample, with two chapters teaching students to self-regulate: the CORRECT framework, the Socratic partner that refuses to answer, the model that tests their reasoning only after they have struggled. A student who has learned to build her own guardrail no longer needs a professor to build it for her.

None of this makes guardrails worthless; the crutch effect is real, and keeping students from sprinting into dependency is reason enough to design with care. And I guess if I really believed my own cynicism I might pack up shop and stop teaching my Large Language Models for Lawyers class. But the evidence shows that unguided use can harm. It does not show that faculty-built guidance is the only road to genuine learning — least of all in law, for students who arrive already fluent and equipped, by this very book, to guide themselves.

I do not raise all this to predict the obsolescence of lawyers or law professors. I raise it because the book makes a wager on human irreplaceability and presents it as a settled premise. The honest posture is the one Munsterman herself strikes when a student asks the hard question and she answers that we are still figuring it out: nobody knows. And the comfortable answer is comfortable precisely because it also happens to justify our continued employment. The book is at its best when it sits in that uncertainty. It is at its weakest when it resolves the uncertainty in our favor and moves on.

Errata and omissions

A few small things, which I record in the spirit of a careful read rather than a complaint. The book states that Claude Code is billed only by API token; in fact, run from within the Claude desktop app, Claude Code draws against a monthly subscription (at least for now).[25] The book defines a “skill” as a markdown file. That can be true but it is not invariably so. A skill is frequently a packaged bundle: a compressed archive carrying instructions together with code and other resources.[26] The book's enthusiasm for chain-of-thought prompting reads as a half-step behind the current models, which do that reasoning internally and often do it worse when a user insists on staging it by hand. (See also here for a survey.) And its suggestion that one can ask a model to flag where it is uncertain overestimates how well today’s systems know their own minds. There is progress being made here, but I would not yet bank on the AI's response.

The omissions are more interesting than the errors and a tribute to the dynamism of the AI market. The book gives a full profile to Poe, a multi-model aggregator that by 2026 does not get much mindshare, while the legal-vertical tools a law-trained reader would most want assessed — the research assistants built into Westlaw and Lexis, the CoCounsels and Protégés of the world — appear mainly as cautionary tales in a hallucination study rather than as candidates for recommendation.[27] These tools were genuinely weak not long ago, and silence may have seemed kinder than a bad review; but they have improved, and a 2026 guide for legal educators that profiles Poe and skips them has its proportions inverted. The book also says nothing about low cost Chinese models such as Qwen, Minimax or GLM that are getting quite good and that faculty outside of public Texas universities can use without fear of beheading. It likewise says little about structured-output prompting or the “tabular review” technique now central to many modern proprietary tools such as Legora and open source tools such as MikeOSS or the eponymous Claude skill. None of these gaps is disqualifying; you will note I have not had the courage to suggest what Professor Munsterman might cut in order to accommodate these fine points or to insist that she draft a yet larger book. Together they mark the book’s vantage as somewhat consumer-facing and somewhat Anthropic-centric — an honest reflection of where its author does her own work, and an easy thing to widen in a second pass. Plus, it’s hard not to cheer on who implicitly recognizes that, in June of 2026, Anthropic’s models work best for serious legal work. And that’s without taking into account their Fable model, which had its 10 minutes of fame before the Trump administration decided, after a meme-worthy suggestion from Amazon, that it was a security risk.

The publication decision

Let me end with a word about Professor Munsterman's choice of how to publish this work. I don't know how the prestige/promotion/reward structure works at the University of North Texas. I do know, however, that at many schools she would receive less credit for e-publishing even with a fully reputable organization such as CALI than if she had tried, probably successfully, to place the book with a first or even second tier traditional print outlet. The reason she did not, however, was that those publishers are not equipped to move at the speed needed to cover AI. Whatever benefits she would receive from a tonier editorial staff or a red, brown or blue cover would be dissipated by the book being out of date before it hit the ground. I particularly sympathize because I face the same dilemma and am making the same choice with this review. I could try to place some slightly more scholarly version of this 8,000 word book review with a traditional law review or even an online variant. In fact, this review started life this way (witness the endnotes at the bottom). Were I to succeed, it might look better on my annual report. But by the time it was released it too would be out of date. Perhaps – and I am not fully ready to concede the point – in some fields the benefits of classically vetted publication forms still outweigh those of covering an issue while it is still live, but in the field of AI and legal education Professor Munsterman (and I through this post and this blog more generally) have made the right choice.

It is precisely because it is relatively up to date in June of 2026 that I can confidently prepare to assign significant parts of it in that distant August course Large Language Models for Lawyers that lies ahead. I want them to encounter a model of how a thoughtful lawyer reasons about this technology. This book fits that role perfectly.

After all this, here is the TL;DR. Go download the book. Right now. Read it. Consider it critically with every human and AI tool at your disposal.

[1]Munsterman, ch. 15 (“Dealing with Uncertainty“), noting that policies drafted against GPT-3.5 in early 2023 were obsolete before faculty governance approved them.

[2]Id., ch. 16 (“The Path Forward“), quoting Ethan Mollick, Co-Intelligence: Living and Working with AI.

[3]Munsterman, ch. 1 (“Why This Book, Why Now“).

[4]Munsterman, Appendices A–J.

[5]Munsterman, ch. 2 (“A Primer on LLMs“). A note on the difficulty of the thing she attempts. Explaining how a large language model works to a room of non-technical lawyers was never easy, and it has gotten harder, because the object keeps acquiring new parts. The model a reader meets in 2026 is no longer the clean deep-transformer of the GPT-2 era that a patient teacher could draw on a whiteboard. It is the visible surface of a stack, and an honest account now has to gesture at all three of its layers. There is the training layer, where reinforcement learning — first from human feedback, then from AI feedback — turns a raw next-word predictor into something helpful, cautious, and occasionally sycophantic. There is the architecture and engineering layer, where the tricks that make the thing fast and affordable are exactly the parts most resistant to a clean analogy: mixture-of-experts routing that fires only a sliver of the network for any given token, the sparse and grouped attention schemes that keep the math from exploding, the position and memory engineering that stretches a context window from a few thousand tokens to a million. And there is the newer serving layer, where a reasoning model spends compute "thinking" before it answers, where a small distilled model apes a large one so it can run cheaply, and where the system the user actually touches is wrapped in tools, retrieval, and guardrails that shape its behavior more than its weights do.

I say this as someone with a stake in the admission. I used to think a reasonably technical amateur, winner, after all of the inaugural Innovation Award from Wolfram Research and author of a 350-page book on data science — was up to the challenge of teaching this. Not anymore. And that is the most generous frame I can offer for Munsterman's second chapter, and the fairest one. That she threads a clean explanatory line through this material at all is an achievement; that she leaves a few of the newer threads out — distillation, the mixture-of-experts plumbing, the inference-time compute behind the reasoning models — is the cost of writing about a horizon that recedes as you walk toward it.

[6]Id. (the analogy appears in the book’s discussion of model opacity; see also ch. 15).

[7]Munsterman, ch. 3 (“Choosing Your Platform“), describing “Common Combination #1: ChatGPT + Claude.”

[8]Munsterman, ch. 5 (“Guardrails and Gatekeeping“), discussing Model Rules 1.1, 3.3, and 5.3 and citing Mata v. Avianca, Inc.

[9] I’d be curious if my readers can tell which if any portions of this article were written by me versus AI. In truth, as with most of my writing, it is an evolving blend, with a usual final stage of my work being to purge the prose of the worst AI-tropes even if in fact they were written by me.

[10]Id. (surveying detection-tool error rates, the ESL and neurodivergent penalty, and recent litigation including Newby v. Adelphi University).

[11]Id.

[12]Munsterman, ch. 5; see also ch. 8 (quoting Prof. Dyane O’Leary on prohibition as a proxy for prior advantage).

[13]Munsterman, ch. 12 (“GenAI Policies for Scholarship, Tenure & Promotion“), drawing on Frazier & Rozenshtein, Large Language Scholarship.

[14] The concern extends well beyond law. See Lisa Messeri & M. J. Crockett, Artificial Intelligence and Illusions of Understanding in Scientific Research, 627 Nature 49 (2024) (warning of “monocultures of knowing”); Anil R. Doshi & Oliver P. Hauser, Generative AI Enhances Individual Creativity but Reduces the Collective Diversity of Novel Content, 10 Sci. Adv. eadn5290 (2024); Ilia Shumailov et al., AI Models Collapse When Trained on Recursively Generated Data, 631 Nature 755 (2024) (the “model collapse” by which models trained on AI-generated text lose the tails of the distribution). I have argued that fears of model collapse are sometimes overstated. Seth J. Chandler, Compared to What?: The Paramount Question in the Regulation of Medical AI, 24(1) Hous. J. Health L. & Pol’y 1, 51–52 (2025).

[15]Munsterman, ch. 13 (“For Students: How to Learn Law with GenAI“).

[16]Prompting techniques and tips appear in ch. 4 (“The Art and Science of Effective Prompting“).

[17]Munsterman, ch. 3 (Claude Cowork, the connector ecosystem, Claude Legal Skills) and Appendix C (“Cowork in Practice: A Research Library Analysis“).

[18]Munsterman, ch. 10 (and the FERPA discussion in ch. 7); see also Appendix B (“Microsoft Copilot in Office 365“).

[19] FERPA binds “educational agencies and institutions” receiving federal funds, 20 U.S.C. § 1232g; de-identified records may be released without consent, 34 C.F.R. § 99.31(b); the “school official” exception, reaching outsourced functions under the institution’s direct control, appears at § 99.31(a)(1)(i); disclosure with prior written consent at § 99.30.

[20]Munsterman, ch. 5 (the “Process Portfolio,” including the handwritten issue-tree requirement).

[21]Id.

[22]Id.

[23]Munsterman, ch. 15.

[24]Munsterman, ch. 2 / ch. 15 (model opacity; “we can’t peer inside and see the ’reasoning’ happening“).

[25]Munsterman, ch. 3 (Claude Code). Run from the Claude desktop application, Claude Code usage is drawn against the user’s subscription plan rather than billed per API token.

[26]Munsterman, ch. 2 (defining a “Skill“). In current practice a skill is commonly distributed as a packaged archive containing a markdown instruction file plus supporting code and resources.

[27]Munsterman, ch. 3 (full Poe profile); the legal-vertical research tools appear principally in the Stanford hallucination study discussed in ch. 15.