In clinical care, an AI output without a citation is not a recommendation — it is a rumor. A physician cannot evaluate a suggestion they cannot verify. They cannot explain it to a patient. They cannot document it in a chart. And if something goes wrong, they cannot defend it.
The architecture we use
Retrieval before generation. The model never answers from memory alone. Every output starts with a retrieval step that pulls the relevant FHIR resources, guideline sections, or payer policy clauses from a structured index. The generator has access only to what was retrieved.
Span-level attribution. Every claim in the output is annotated with the specific source span that supports it — not just the document, but the exact sentence or data field. When the output says patient is on lisinopril 20mg, that claim links to the MedicationRequest resource ID, the date it was recorded, and the ordering practitioner.
Confidence signaling. When retrieval does not find a source that clearly supports a claim, we do not generate it. We flag the gap explicitly: no documentation of prior ACE inhibitor trial found in the chart. A clinician can act on an explicit gap. They cannot safely act on a hallucinated confirmation.
Audit log on every output. Every generated output is stored with the retrieval context, citations, model version, and timestamp. If a clinician acts on an AI suggestion and something goes wrong, the audit log shows exactly what the model saw and what it said.