Daily Technology
·28/01/2026
A recent experiment involving ChatGPT and its new health analysis capabilities revealed significant inaccuracies when processing a decade's worth of personal Apple Watch data. Despite claims of understanding health patterns, the AI provided a concerningly low grade for cardiac health, prompting a visit to a medical professional and expert review, which found the AI's assessments to be unreliable and potentially misleading.
A user provided ChatGPT Health with 10 years of data from their Apple Watch, including millions of steps and heartbeat measurements. The AI initially assigned a failing grade ('F') for cardiac health, causing significant alarm. This prompted the user to consult their doctor, who confirmed their excellent cardiac health, contradicting the AI's assessment.
Cardiologist Eric Topol also reviewed the AI's analysis, deeming it "baseless" and stating that "This is not ready for any medical advice." The AI's evaluation heavily relied on metrics like estimated VO2 max and heart-rate variability, which are known to have limitations and inaccuracies when derived from wearable devices alone.
Anthropic's AI rival, Claude for Healthcare, was also tested with similar data, providing a 'C' grade for cardiac health. While less alarming than ChatGPT's 'F', this assessment was also found to be questionable by experts. Both companies acknowledge their tools are in early testing phases and cannot replace doctors or provide diagnoses, often including disclaimers. However, they willingly provided detailed personal health analyses.
Privacy is another significant concern. While OpenAI states that data used in its Health mode is encrypted and not used for training, these AI tools are not covered by HIPAA, the federal health privacy law. This means user data is not protected by the same stringent privacy standards as traditional healthcare providers.
Further testing revealed inconsistencies in the AI's performance. The user found that repeating the same queries to ChatGPT resulted in fluctuating grades, ranging from 'F' to 'B'. The AI also occasionally forgot crucial personal information, such as gender and age, and sometimes failed to utilize all provided medical data in its analysis. This erratic behavior was described as "totally unacceptable" by experts, who warned it could lead to undue anxiety or a false sense of security.
OpenAI stated that these variations might occur as the AI weighs different data sources differently, and they are working to improve response stability. Similarly, Claude exhibited output variations, which Anthropic attributed to the inherent nature of chatbots.
While AI holds immense potential for unlocking medical insights and improving access to care, current applications in personal health analysis are proving to be unreliable. Experts emphasize that AI models need to be sophisticated enough to account for data noise and weaknesses, and to accurately link this data to ultimate health outcomes. The current generation of AI health tools, despite their advanced capabilities, appear to be overselling their ability to provide accurate personalized health assessments, raising questions about their readiness for widespread use.









