đź“° Full Story
A BMJ Open study published in April 2026 found that five widely used AI chatbots frequently produce inaccurate or incomplete medical information.
Researchers probed ChatGPT (OpenAI), Gemini (Google), Grok (xAI), Meta AI (Meta) and DeepSeek (High-Flyer) using 250 prompts across five topics — cancer, vaccines, stem cells, nutrition and athletic performance — designed to reflect common public queries and misinformation tropes.
Overall, 50% of responses were judged problematic (30% somewhat problematic, 20% highly problematic). Open-ended prompts yielded disproportionately more highly problematic answers than closed questions.
Model behaviour varied: Grok produced significantly more highly problematic responses while Gemini produced the fewest.
Reference quality was poor (average completeness ~40%), with instances of fabricated citations; responses were typically expressed with unwarranted certainty and few caveats.
Only two refusals to answer were recorded, both from Meta AI. The researchers noted limitations: the study stressed models with adversarial prompts, sampled five chatbots tested in February 2025, and commercial systems evolve rapidly, but concluded risks remain if chatbots are deployed for public-facing health guidance without stronger safeguards.
đź”— Based On
🤝 Social Media Insights
Social Summary
The study highlights persistent safety issues (hallucinations, poor references) but responses stress its findings are time‑bound: using older consumer models and small, adversarial samples means results likely represent a lower bound. Updated, larger comparisons with current models and clinicians are needed to assess present-day risk.
🕰️ The Story So Far: An Evolving Timeline
Thursday, April 16, 2026 01:17 UTC
Study: AI chatbots give flawed medical advice
Tuesday, April 14, 2026 06:44 UTC
Study: AI fails differential diagnosis over 80%







đź’¬ Commentary