
The old version of a bad citation was boring. A page number was wrong. A journal title was abbreviated inconsistently. A researcher copied a reference from another paper and preserved someone else’s mistake. The error was annoying, but it usually belonged to the clerical world of academic messiness.
The new version is different.
The new bad citation has a plausible title, a believable author list, a credible publication year, and a journal name that sounds exactly like the sort of journal where such a paper might have appeared. It may even point toward real scholars in the field.
It has the style of evidence without the existence of evidence. It does not look like a typo. It looks like scholarship.
That is what makes the current wave of AI-hallucinated references so dangerous. The problem is not merely that large language models can invent sources. Everyone who has used these systems seriously should know that by now. The problem is that invented sources are beginning to pass through the machinery that turns research into the permanent record.
A citation is not just a footnote. It is a claim about intellectual ancestry. It tells the reader where an assertion came from, what work supports it, and what body of knowledge the author believes they are joining. In scientific publishing, citations are part of the load-bearing structure. They connect trials to reviews, reviews to guidelines, guidelines to practice, and practice to policy. When citations are fabricated, the failure does not stay inside one paper. It enters the chain of reliance.
That is why the recent finding is not simply another story about AI making things up. According to a large-scale 2026 study, researchers auditing 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central estimated that 146,932 hallucinated citations appeared in 2025 alone. The number is large enough to move the discussion out of the category of embarrassing anecdotes and into the domain of infrastructure failure.
The mistake has changed scale. The governance model has not.
The first wave of public AI hallucination stories trained people to look for ridiculous answers. A chatbot invented a legal case. A customer service bot promised something the company never authorized. An AI assistant summarized a document incorrectly. These incidents were often dramatic because they were visible. Someone noticed the output, challenged it, and turned it into a story.
Citation hallucinations are more insidious because they are quiet. They often sit in the most trusted part of a document, the section readers are least likely to inspect line by line. A paper may be legitimate. The research may be real. The author may not be trying to commit fraud. One or two references may simply have been inserted by an AI writing tool, citation assistant, summarization workflow, or careless drafting process. The rest of the article may look sound enough for reviewers to focus on methodology, argument, novelty, or formatting.
That is precisely the danger. The fabricated reference can hide inside otherwise legitimate work.
Once published, the fake citation changes status. It is no longer just an author’s unverified output. It becomes part of a paper. The paper becomes part of a repository. The repository becomes part of a search system, a citation graph, an academic database, a systematic review pipeline, or a future training corpus.
Even when the original mistake is later discovered, the downstream copies may remain. The error can be duplicated, extracted, summarized, cited, scraped, and absorbed into systems whose users will never see the original failure.
This is what makes the story structural. The hallucination is not only a false sentence generated by a model. It is a piece of synthetic misinformation entering systems designed to preserve and distribute knowledge.
Scientific publishing was built around the assumption that bad references exist but remain relatively containable. Editors, reviewers, librarians, indexers, and readers all play partial roles in maintaining integrity, but the system was not designed for a world in which citation-shaped fiction can be generated instantly, repeatedly, and at professional quality by tools used inside ordinary writing workflows.
A hallucinated citation is therefore not just bad content. It is a contamination vector.
The public instinct is to ask why peer review did not catch this. That question is understandable, but it misunderstands what peer review is built to do.
Peer review is already overloaded. Reviewers are typically unpaid, time-constrained, and asked to evaluate research design, methodology, novelty, statistical reasoning, field relevance, and clarity. They are not usually performing forensic verification of every source in every reference list.
In many fields, especially where papers can cite dozens or hundreds of sources, comprehensive reference validation would turn peer review into a different institution entirely.
That does not excuse the failure. It explains why the failure scales.
If a fake citation looks plausible, appears in a familiar format, and names real researchers or real journals, it can pass through the process because the process is not optimized to detect it. The reviewer may assume the citation exists. The editor may assume the reviewer would have noticed. The publisher may assume its production workflow is sufficient. The author may assume the AI tool did not invent anything because the surrounding text looked good.
Every participant has a plausible reason not to be the one who checks.
That is the governance gap. The system has many points of responsibility, but few hard verification gates. A citation list can appear formal without being validated. A source can look academic without being real. A model can produce scholarly texture without scholarly grounding. When the output enters a publication process that relies on distributed trust, the error can move forward because no single actor is structurally required to stop it.
This is why the solution cannot be a scolding campaign about careful authorship. Authors should verify sources, of course. But a problem that appears at repository scale will not be solved by telling individuals to be more careful after the infrastructure has already absorbed their mistakes. The question is where verification belongs in the workflow, what should happen before publication, and who is accountable when false references survive.
The issue becomes more serious when it enters medicine.
A separate audit published in The Lancet examined biomedical papers and found thousands of fabricated references across peer-reviewed literature. The numbers are smaller than the broad cross-repository estimate, but the context is more consequential. Biomedical research is not only a knowledge archive. It feeds clinical reviews, treatment guidelines, hospital protocols, insurance reasoning, regulatory decisions, and professional education.
A fake citation in a biomedical paper is not automatically a fake clinical conclusion. The presence of fabricated references does not mean every affected article is worthless, nor does it prove that AI generated every false source.
Precision matters here. But the risk is still obvious. Medicine depends on evidence chains. If a false reference enters the bottom of that chain, later readers may not know where the evidence ends and the invented scaffolding begins.
This is where the phrase “permanent record” becomes less metaphorical. Scientific literature is durable by design. Papers are indexed so they can be found. Citations are preserved so knowledge can be traced. Databases are built to help future researchers move quickly across the record. The same features that make science cumulative also make contamination difficult to clean once it spreads.
Retraction is a blunt instrument. Correction notices are slow. Database cleanup is uneven. Search systems may retain traces. PDF copies circulate. Institutional repositories preserve versions. Secondary sources quote or paraphrase the affected material. AI systems trained or fine-tuned on scholarly corpora may ingest the residue. The literature does not have a single off switch.
That is why the current problem should not be treated as an academic housekeeping issue. It is a knowledge-supply-chain issue.
There is also an incentive problem.
Academic publishing rewards volume, speed, and visibility. Researchers are pressured to publish. Institutions measure output. Journals compete for submissions, citations, rankings, fees, and prestige. Preprint repositories accelerate circulation. AI tools promise to reduce the friction of writing, editing, summarizing, translating, formatting, and sourcing.
Inside that environment, citation hallucination is not an alien intrusion. It is a predictable failure mode of a system that rewards faster production of scholarly-looking work.
The author under pressure uses AI to polish a manuscript. The tool suggests references. The references look plausible. The author is exhausted, inexperienced, overconfident, or simply careless. The paper moves. Reviewers are busy. Editors are busy. Production systems are uneven. The citation survives.
This is not a story about one bad actor. It is a story about an industrial knowledge system meeting a technology that can generate the appearance of verification faster than the institution can verify.
The deeper problem is that AI does not merely increase the number of mistakes. It changes the economics of mistake production. A human can fabricate citations, but doing so requires some effort. A model can fabricate them fluently and repeatedly as a side effect of ordinary use. The cost of producing plausible false evidence falls dramatically. The cost of checking it does not fall at the same rate unless verification is engineered into the system.
That imbalance is the strategic issue.
The most uncomfortable consequence is recursive.
Scientific literature is not only read by humans. It is also processed by machines. Search engines parse it. Databases structure it. Recommendation systems rank it. Literature-review tools summarize it. Retrieval systems surface it. Future models may train on it or use it as grounding material through retrieval-augmented systems.
If hallucinated citations enter the record, they may later become part of the material used to generate new scholarly output. The model that invented a fake source can help place that source into the literature. The literature can then be scraped, indexed, or retrieved. A future system may encounter the fake citation not as a hallucination but as an apparently external artifact. The error acquires institutional camouflage.
That is the real long-range risk. Synthetic errors can become environmental facts for other synthetic systems.
A human reader may notice that a citation does not resolve. A poorly governed AI workflow may not.
It may simply detect that a reference appears in a paper, appears in a database, appears in a citation context, or appears near a set of credible terms. The fake reference can gain weight because it has been embedded in the record. The hallucination becomes part of the signal environment.
This is not science fiction. It is the predictable result of connecting generative systems to archives that are no longer clean.
For years, AI risk discussions have focused on model outputs. The more important question may be what happens when those outputs are written back into the world. Once synthetic content enters trusted repositories, the distinction between model error and institutional record begins to blur. The model did not merely answer incorrectly. It helped alter the evidence environment from which later answers may be drawn.
AI literacy matters, but it is not enough.
Telling researchers that chatbots hallucinate is useful only to a point. Many sophisticated users already know that. The Fortune story is especially useful because it includes the case of Maxim Topaz, an AI researcher who said an AI tool silently inserted a fabricated reference into his own work. That detail matters because it removes the comforting assumption that this is only a novice problem. Expertise reduces risk, but it does not eliminate workflow failure.
The issue is not whether a user understands AI in the abstract. The issue is whether the workflow forces verification before false material enters a consequential system.
That distinction matters far beyond academia. Enterprises face the same problem when AI-generated claims enter board decks, compliance memos, diligence files, legal summaries, audit documents, procurement analyses, or investor materials. The danger is not only that an AI system says something wrong. The danger is that the wrong thing becomes part of an internal record, then gets reused because it now appears to have organizational provenance.
Academic citation hallucinations are a clean example because the object is verifiable. A citation either points to a real source or it does not. But the broader governance lesson applies to any knowledge workflow where AI output can become institutional input.
The organization that treats verification as a personal virtue will lose to the scale of the problem. The organization that treats verification as infrastructure has a chance.
The obvious technical answer is automated reference checking. Journals, repositories, and publishers should not accept reference lists as decorative text. Citations can be parsed, matched against authoritative databases, flagged for mismatch, and returned to authors before publication. This will not catch every problem, and it will produce edge cases, but it would immediately move the system away from reliance on reviewer intuition.
The deeper answer is workflow governance.
AI-assisted writing should require source provenance when factual claims are generated or revised. Citation tools should distinguish between sources retrieved from verified databases and sources inferred by a language model. Manuscript systems should require unresolved references to be corrected before acceptance. Publishers should preserve machine-readable audit trails for reference validation. Research institutions should treat fabricated citations as a publication-integrity issue rather than an embarrassing copy-editing defect.
The goal is not to ban AI from research writing. That would be unrealistic and, in some cases, counterproductive. AI can help researchers translate, organize, edit, summarize, and search. The issue is whether AI-assisted workflows are allowed to smuggle unverified evidence into systems that depend on traceability.
A mature governance model would not ask whether AI was used in a vague disclosure statement. It would ask which parts of the paper were AI-assisted, which claims were affected, how references were generated or retrieved, what verification occurred, and who signed off before submission. Disclosure without operational detail is too weak for this problem.
The citation has to become a controlled object.
The scientific record has survived fraud, error, publication bias, paper mills, replication failures, and predatory publishing. It will survive AI hallucinations too. But survival is not the same as integrity. The question is how much contamination the system absorbs before it adapts.
The danger is not that every paper becomes suspect. The danger is that trust becomes more expensive. Researchers will have to check more. Reviewers will have to doubt more. Readers will have to verify more. Databases will have to clean more. Institutions will have to spend more effort distinguishing knowledge from knowledge-shaped residue.
That is a strategic shift. Trust in the record used to be produced partly by the scarcity and friction of publication. It took work to create a paper, submit it, review it, index it, and cite it. AI weakens that friction at the front end while leaving much of the verification burden unchanged. When the cost of producing plausible scholarly material falls, the cost of maintaining credibility rises.
The permanent record was built to remember. It was not built to forget at machine speed.
That is why hallucinated citations matter. They reveal a larger transition from content errors to infrastructure contamination. The chatbot did not merely make something up. The system allowed the invented thing to acquire scholarly form, institutional placement, and downstream persistence.
The next phase of AI governance will not be won by better warnings under text boxes. It will be won by hard controls at the points where synthetic output becomes institutional evidence.
For scientific publishing, that means citation verification before acceptance, repository-level screening, publisher accountability, database cleanup, and clear consequences for unverified AI-assisted sourcing. For everyone else, it means recognizing the same pattern in their own records.
The record is no longer protected by its formality. A fake citation can wear the costume of evidence perfectly well.
The question is whether the institutions that depend on knowledge can still tell the difference before the costume becomes part of the archive.