6 minute read | November.11.2025
The U.S. Food and Drug Administration’s (FDA) Digital Health Advisory Committee (DHAC) held a public meeting on “Generative Artificial Intelligence-Enabled Digital Mental Health Medical Devices,” with a particular focus on a hypothetical prescription large language model (LLM) therapy chatbot for adults with major depressive disorder (MDD). The Committee examined benefits, risks and risk mitigations across the total product life cycle and offered recommendations on premarket evidence, postmarket monitoring, labeling, and integration into clinical care. The discussion underscores the FDA’s move toward a structured, risk-based framework tailored to the variability and safety profile of generative AI in mental health.
While the FDA has approved digital mental health solutions in recent years, including apps with solutions involving cognitive behavioral therapy (CBT), the agency has yet to clear mental health tools using generative AI. And while there are generative AI chatbot solutions currently utilized in the market, the DHAC meeting is noteworthy for contemplating a prescription LLM chatbot, similar to prescription digital therapeutics.
The Committee grounded its recommendations in an up-front risk estimate for intended use. Experts emphasized that generative AI’s probabilistic, context-sensitive outputs challenge traditional device evaluation, requiring continuous performance monitoring.
They highlighted the potential to expand access and augment care—especially in underserved settings. However, they warned of risks unique to LLMs, including hallucinations, context failures, model drift, misuse, disparate impact across populations, cybersecurity and privacy vulnerabilities, and usability challenges tied to literacy, language and the digital divide.
Life cycle governance featured prominently: tailored premarket evidence, clear human escalation pathways, accessible labeling and consent, and risk-stratified postmarket surveillance.
The Committee advised the FDA to evaluate benefits relative to a defined risk estimate and intended use.
Potential benefits include earlier access to support, improved triage and care orientation, expanded reach in resource-constrained settings, time-sensitive assistance alongside emergency resources, symptom improvement, and AI-enabled personalization and longitudinal assessment.
Risks requiring explicit controls include missed or exacerbated harms due to miscommunication or undetected deterioration; hallucinations; model drift and off-label use; performance disparities; cybersecurity, privacy and data stewardship issues; accessibility and multilingual performance; cost barriers; and risks associated with non-chatbot modalities such as voice or physiological sensing. The Committee called for robust adverse event definitions and reporting pathways, inclusive datasets and ongoing equity monitoring, and consent materials written at accessible literacy levels.
Premarket evidence should be tailored to the LLM therapy’s risk profile. Clinical evaluation should use validated depression endpoints and patient-reported outcomes; transparently measure false negatives for adverse events; include major safety events such as suicidal ideation and self-injury under a broad adverse event definition; minimize exclusion criteria to reflect real-world risk; and follow a stepwise validation from clinician-supervised to semi-autonomous use as evidence permits. Sponsors should characterize treatment dose and frequency, assess overuse risks, incorporate comparators where feasible, and include functional and behavioral health outcomes with qualitative measures of life experience. Technology performance evidence should demonstrate reliability safeguards, trend tracking, targeted education, controls against negative screen use or addictive engagement, structured feedback loops to patients and clinicians, and premarket testing across representative personas to define capability boundaries. Inclusivity and usability should be validated across literacy levels, cultures, and languages, with documented risks and mitigations.
Integration into care should be anchored by predefined human escalation plans and supporting infrastructure, medical screening for comorbidities prior to engagement, one-tap escalation for urgent needs and automated reminders clarifying role and scope, including time-limited use. Postmarket surveillance should be risk-stratified with metrics aligned to premarket commitments, guardrails to prevent scope creep and privacy-protective data collection permitting reanalysis. It should include longitudinal tracking of engagement and outcomes to assess dose–response and overuse, and mandatory incident and adverse event reporting with clear, multi-destination pathways that enable both patient and clinician reports. Labeling should be audience-specific, use plain language, and be transparent about purpose, indications, limits, prescriber qualifications and infrastructure needs, data practices and costs, level of autonomy and required human oversight, and user interaction guidance to minimize risk.
The FDA may rely on these recommendations in forthcoming guidance and in review of AI-enabled digital mental health devices. The direction of likely FDA requirements is clear: design to an explicit risk estimate and aligning endpoints to condition-specific standards and real-world risk. Sponsors will likely have to provide evidence of equitable performance across populations and languages. Critical safeguards include building in human escalation pathways and misuse and overuse controls, accessible explainability, and continuous monitoring for drift, safety, and bias; and prepare transparent labeling calibrated to autonomy and scope. Sponsors and health systems should begin operational planning to meet these expectations.
Manufacturers developing LLM-based prescription therapies for MDD should anticipate a risk-based, life cycle regulatory approach. In practice, this means:
For providers who may incorporate AI-enabled chatbots into talk therapy workflows (or developers building with clinician-users in mind), tracking the interpretation and application of any forthcoming FDA guidance by state professional boards with jurisdiction over behavioral health professionals will be critical. Many states are also regulating the use of AI chatbots by requiring user-facing disclosures confirming user understanding that they are interacting with an AI solution (such as Illinois). California recently passed legislation prohibiting the use of names or branding implying that an AI solution may be licensed or otherwise certified to deliver talk therapy (see our write-up here). The Federation of State Medical Boards’ 2024 report, Navigating the Responsibility and Ethical Incorporation of Artificial Intelligence into Clinical Practice provides a strong overview of professional responsibility issues providers should consider, as we discussed here.
DHAC’s November 6 meeting signals the FDA’s intent to adapt oversight of generative AI mental health tools through a total product life cycle lens. For an LLM-based prescription therapy for adult MDD, expect requirements for rigorous, inclusive premarket evidence; engineered safety, equity, and usability controls; clinically integrated escalation; and robust postmarket surveillance. Sponsors should align product design, clinical strategy, governance and labeling with these expectations, as the FDA refines its regulatory posture.
Want to learn more? Reach out to the authors (Georgia Ravitz, Shari Esfahani, Thora Johnson, Stephen Thau, Jeremy Sherer, and Amy Joseph).