FreshRSS

🔒
❌ Acerca de FreshRSS
Hay nuevos artículos disponibles. Pincha para refrescar la página.
AnteayerInterdisciplinares

Performance of large language models ChatGPT and Gemini in child and adolescent psychiatry knowledge assessment

by Johanna Charlotte Neubauer, Anna Kaiser, Leon Lettermann, Tobias Volkert, Alexander Häge

Objective

This study evaluates the performance of four large language models—ChatGPT 4o, ChatGPT o1-mini, Gemini 2.0 Flash, and Gemini 1.5 Flash—in answering multiple-choice questions in child and adolescent psychiatry to assess their level of factual knowledge in the field.

Methods

A total of 150 standardized multiple-choice questions from a specialty board review study guide were selected, ensuring a representative distribution across different topics. Each question had five possible answers, with only one correct option. To account for the stochastic nature of large language models, each question was asked 10 times with randomized answer orders to minimize known biases. Accuracy for each question was assessed as the percentage of correct answers across 10 requests. We calculated the mean accuracy for each model and performed statistical comparisons using paired t-tests to evaluate differences between Gemini 2.0 Flash and Gemini 1.5 Flash, as well as between Gemini 2.0 Flash and both ChatGPT 4o and ChatGPT o1-mini. As a post-hoc exploration, we identified questions with an accuracy below 10% across all models to highlight areas of particularly low performance.

Results

The accuracy of the tested models ranged from 68.3% to 78.9%. Both ChatGPT and Gemini demonstrated generally solid performance in the assessment of in child and adolescent psychiatry knowledge, with variations between models and topics. The superior performance of Gemini 2.0 Flash compared with its predecessor, Gemini 1.5 Flash, may reflect advancements in artificial intelligence capabilities. Certain topics, such as psychopharmacology, posed greater challenges compared to disorders with well-defined diagnostic criteria, such as schizophrenia or eating disorders.

Conclusion

While the results indicate that language models can support knowledge acquisition in child and adolescent psychiatry, limitations remain. Variability in accuracy across different topics, potential biases, and risks of misinterpretation must be carefully considered before implementing these models in clinical decision-making.

Cohort profile: characterisation, determinants, mechanisms and consequences of the long-term effects of COVID-19 - providing the evidence base for health care services (CONVALESCENCE) in the UK

Por: Jamieson · A. · Saikhan · L. A. · Raman · B. · Alghamdi · L. · Cheetham · N. J. · Conde · P. · Dobson · R. · Fernandez-Sanles · A. · Folarin · A. · Goudswaard · L. J. · Hamill Howes · L. · Jones · S. · Neubauer · S. · Orini · M. · Pierce · I. · Ranjan · Y. · Rapala · A. · Smith · S. M. · S
Purpose

The pathogenesis of the long-lasting symptoms which can follow an infection with the SARS-CoV-2 virus (‘long covid’) is not fully understood. The ‘COroNaVirus post-Acute Long-term EffectS: Constructing an evidENCE base’ (CONVALESCENCE) study was established as part of the Longitudinal Health and Wellbeing COVID-19 UK National Core Study. We performed a deep phenotyping case-control study nested within two cohorts (the Avon Longitudinal Study of Parents and Children and TwinsUK) as part of CONVALESCENCE.

Participants

From September 2021 to May 2023, 349 participants attended the CONVALESCENCE deep phenotyping clinic at University College London. Four categories of participants were recruited: cases of long covid (long covid(+)/SARS-CoV-2(+)), alongside three control groups: those with neither long covid symptoms nor evidence of prior COVID-19 (long covid(-)/SARS-CoV-2(-); control group 1), those who self-reported COVID-19 and had evidence of SARS-CoV-2 infection, but did not report long covid (long covid(-)/SARS-CoV-2(+); control group 2) and those who self-reported persistent symptoms attributable to COVID-19 but no evidence of SARS-CoV-2 infection (long covid(+)/SARS-CoV-2(-); control group 3). Remote wearable measurements were performed up until February 2024.

Findings to date

This cohort profile describes the baseline characteristics of the CONVALESCENCE cohort. Of the 349 participants, 141 (53±15 years old; 21 (15%) men) were cases, 89 (55±16 years old; 11 (12%) men) were in control group 1, 75 (49±15 years old; 25 (33%) men) were in control group 2 and 44 (55±16 years old; 9 (21%) men) were in control group 3.

Future plans

The study aims to use a multiorgan score calculated as the cumulative total for each of nine domains (ie, lung, vascular, heart, kidney, brain, autonomic function, muscle strength, exercise capacity and physical performance). The availability of data preceding acute COVID-19 infection in cohorts may help identify the consequences of infection independent of pre-existing subclinical disease and also provide evidence of determinants that influence the development of long covid.

❌