A recent study published in JAMA Network Open examined the accuracy and reliability of nutritional information provided by two versions of the Chat Generative Pre-trained Transformer (ChatGPT) chatbots.
The results suggest that while chatbots may not replace the role of nutritionists, they can improve communication between healthcare professionals and patients if further refined and strengthened.
Study: Consistency and accuracy of artificial intelligence in providing nutritional information. Image credit: Iryna Imago/Shutterstock.com
Many people rely on the internet for access to health, medical, food, and nutrition information. However, studies have shown that nearly half of the nutritional information available on the internet is of poor quality or inaccurate.
AI-powered chatbots have the potential to optimize the way users navigate through a wide range of publicly available scientific knowledge by providing conversational, easily understandable explanations of complex topics.
Prior research has examined how well chatbots can disseminate medical information, but their reliability in providing nutrition information is still relatively unexplored.
About the Study
In this cross-sectional study, the researchers followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines. They evaluated the accuracy of nutritional information provided by ChatGPT-3.5 and ChatGPT-4 chatbots for macronutrients (proteins, carbohydrates, and fats) and the energy content of 222 foods in two languages – traditional Chinese and English.
They provided a prompt asking the chatbot to create a table with the nutritional profile of each food in its uncooked form. This search was conducted in September-October 2023.
Each search was performed five times to assess consistency. The coefficient of variation (CV) was calculated over these five measurements for each food.
The accuracy of the chatbot’s responses was evaluated by comparing its estimates of energy (in kilocalories) or macronutrients (in grams) with the recommendations of nutrition scientists according to the food composition database managed by the Food and Drug Administration of Taiwan.
A response was considered correct if the chatbot’s estimate of energy or macronutrients fell within 10% to 20% of the estimate provided by the nutrition scientists.
The researchers also computed whether the chatbots‘ responses significantly differed from the recommendations of the nutrition scientists and between the two versions of ChatGPT.
There were no significant differences between the estimates of the chatbots and nutrition scientists regarding the fat, carbohydrate, and energy values of eight adult meals. However, the researchers found that the protein estimates fluctuated significantly. The chatbot responses were considered correct for the energy content in 35–48% of the 222 foods included and had a CV of less than 10%. ChatGPT-4, the newer version, performed better overall than ChatGPT-3.5, but tended to overestimate protein content.
The study shows that chatbot responses align well with the recommendations of nutrition scientists in some respects but may overestimate protein content and also exhibit a high degree of inaccuracy.
While they are readily available, they could be a useful tool for people looking for macronutrient and energy information on common foods and are unsure which resources to consult.
However, the authors emphasize that chatbots are not a substitute for nutritionists; They can enhance communication between patients and public health professionals by providing additional resources and simplifying complex medical language into conversational, easily understandable language.
They also note that the foods included in their search may not be commonly consumed, affecting the relevance of their findings.
AI chatbots cannot provide users with personalized nutrition recommendations, accurate portion sizes, or generate specific dietary and nutrition guidelines. Additionally, chatbots may not be able to tailor their responses to the region where the user is located.
Portion sizes and serving units vary significantly from country to country, as well as depending on the type of food and its preparation. Chatbots cannot account for crucial cultural and geographic differences or provide relevant household units for each consumer.
The most significant limitation is that ChatGPT is a general-purpose chatbot and is not specifically specialized in dietetics and nutrition.
The cut-off date for the training dataset was September 2021, meaning that newer research findings would not have been considered. Users should not confuse chatbots with search engines, as their responses are a product of their training datasets and the formulation of the input prompts.
Given the immense popularity of chatbots and other forms of generative AI, future iterations will overcome these limitations and provide more accurate, timely, relevant, and practical information on nutrition and dietetics.
Chen, YC, Ho, DKNH, Chiu, W., Cheah, K., Mayasari, NR, Chang, J. (2023) Consistency and accuracy of artificial intelligence in providing nutritional information. Hoang, YN, JAMA Network Open. doi:10.1001/jamanetworkopen.2023.50367. https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2813295