Source Attribution in LLM Outputs

Large language models present a unique attribution problem: their weights compress vast amounts of training text into statistical patterns that erase provenance, so a base model cannot reliably say where any given claim came from. Major systems work around this with retrieval layers — Perplexity inlines numbered citations to live web results, Google's AI Overviews append footnote-style source links, Anthropic's Citations API grounds answers in user-supplied documents, ChatGPT cites only when its browse tool is active — while research into training-data attribution, prompt provenance, and citation hallucination tries to close the gap between cited and uncited generation.

Attribution in large language models is harder than it looks because the model's weights are not a citation index. During training, billions of documents are reduced to gradients that adjust parameters; the original sources are not stored in any retrievable form. When a base model produces a sentence, there is no internal record of which training documents most influenced that sentence, and asking the model to cite its source typically produces a plausible-looking reference that may or may not exist. This is the training-data attribution problem: weights compress provenance away, and post-hoc methods to recover it — influence functions, gradient-based attribution, parameter-group probes — are computationally expensive and only approximate. The practical workaround across deployed systems is to bolt a retrieval layer onto the model, turning generation into RAG (Retrieval-Augmented Generation): How LLMs Access External Knowledge where the cited content is whatever the retriever fetched. Perplexity AI runs a multi-stage pipeline that breaks each query into sub-searches, retrieves around ten candidate pages via hybrid lexical-and-embedding search, reranks them, and inlines numbered citations into the synthesized answer; typically three to four sources end up cited. Google AI Overviews, launched in U.S. Search in May 2024 and expanded to 200+ countries and 40+ languages by mid-2025, append footnote-style links to roughly eight sources per overview, heavily skewed toward already top-ranked pages. ChatGPT cites only when its browse or search tool is active; with browsing off, any reference-shaped text in the response is pattern-matched from training data and may be entirely fabricated. Anthropic Claude introduced a Citations API in 2025 that chunks user-supplied documents into sentences and returns answers grounded in specific spans, reporting roughly 15% better recall than rolled-your-own implementations. The failure mode that has drawn the most attention is fabricated citations from base-model output, most visibly in legal filings. The 2023 case Mata v. Avianca / Fabricated Citations sanctioned New York attorneys $5,000 after they filed ChatGPT-generated briefs citing nonexistent decisions, and a public database tracking such incidents has logged hundreds of further sanctions across U.S. courts, with similar rulings in California, Massachusetts, and federal courts through 2025. Surveys of citation hallucination find rates ranging from about 11% to over 90% depending on model, domain, and prompt, with author names and venues the most error-prone fields. Research direction is converging on two threads: applying W3C PROV-style provenance models to prompts, retrievals, and completions so that downstream consumers can audit a chain of generation; and improving training-data attribution so that even uncited base-model output carries a probabilistic pointer back to its likely influences. Neither solves the underlying compression problem, but together they push toward outputs whose origin can at least be inspected rather than guessed.

Source Attribution in LLM Outputs

Related Knowledge

Citation Hallucination in LLMs

Mata v. Avianca / Fabricated Citations

Why LLMs Prefer Plausible Over True

Have insights to add?