An ongoing study recently published in PNAS-Nexus shows that researchers used machine learning to develop a risk prediction model to forecast the future spread of newly discovered variants of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on genomic and epidemiological data.
The study aims to identify novel SARS-CoV-2 strains by monitoring the emergence of new variants and improve the preparedness for a pandemic. However, it remains difficult to pinpoint the mutations that cause a new wave.
Academic researchers have developed various models to predict the course of a pandemic, but none of these systems have been tailored to the antigen-specific distribution. Genomic features have not been incorporated into current epidemiological modeling to reflect the infection dynamics.
In this current study, researchers used an AI-based approach to analyze nine million SARS-CoV-2 genome sequences in 30 countries and detect temporal patterns of variants causing large infection waves. The model utilized data from the Pango lineage, the Global Initiative on Sharing Avian Influenza Data (GISAID), COVID-19 cases, vaccination rates, and non-pharmaceutical interventions.
The analysis focused on 30 nations that reported the most SARS-CoV-2 genome sequences by March 2022, accounting for nine million of the 9.5 million genome sequences cataloged in GISAID since the start of the pandemic.
By March 19, 2022, the included countries had collectively identified 1,151 unique variants, with an average of 72 variants identified in each country since the start of the pandemic. The study accounted for all possible changes in a genomic sequence, such as base substitutions, deletions, and insertions. Additionally, the researchers provided two measures to characterize the diversity of variants over time, including variant entropy and heterogeneity.
The model aimed to identify SARS-CoV-2 variants that caused over 1,000 cases per million people within three months. The study applied 31 predictive factors to estimate the infectivity of the variants using machine learning techniques.
The findings showed that the model could accurately predict 73% of the variants that would cause a COVID-19 wave of over 1,000 infections in the following three months after one week of observation, and this performance increased to 80% after two weeks. The AUC values for the model were 86% for one-week predictions and 91% for two-week predictions.
In conclusion, this study developed a prediction model based on nine million genetic sequences from 30 countries to forecast the emergence of new SARS-CoV-2 variants. The improved accuracy of the model underscores the need to integrate genetic variables into more sensitive models. The journal reference for this study is „Levi, R., El Ghali, Z. & Shoshy, A. (2024). Vorhersage der Ausbreitung von SARS-CoV-2-Varianten: Eine durch künstliche Intelligenz ermöglichte Früherkennung. PNAS-Nexus 3(1). doi:10.1093/pnasnexus/pgad424.“