A recent UW-Madison study found using AI in genetic studies can result in faulty conclusions, highlighting a “pervasive bias” in research that relies on the technology.
Qiongshi Lu, an associate professor in the university’s Department of Biostatistics and Medical Informatics, led the study that was published recently in the scientific journal Nature Genetics.
While AI is being used to help researchers parse through many thousands of genetic variations across study participants, in hopes of identifying connections between genes and various diseases, the university notes these relationships are “not always straightforward.”
One such effort, the National Institutes of Health’s All of Us project, seeks to leverage huge datasets including genetic profiles and health information. The program gathers data from people across the country with a goal of improving the field of precision medicine.
But some databases have missing data on health conditions targeted by researchers, and scientists are using artificial intelligence to bridge “data gaps,” according to the UW-Madison release.
“It has become very popular in recent years to leverage advances in machine learning, so we now have these advanced machine-learning AI models that researchers use to predict complex traits and disease risks with even limited data,” Lu said in a statement.
But his team found a commonly used machine learning algorithm being used in genome association studies “can mistakenly link” multiple genetic variations with the risk of developing Type 2 diabetes. This conclusion about potential false positives identified by AI applies broadly to AI-assisted studies, the release shows.
“The problem is if you trust the machine learning-predicted diabetes risk as the actual risk, you would think all those genetic variations are correlated with actual diabetes even though they aren’t,” Lu said.
Lu’s team has also put forth a statistical method to help researchers “guarantee the reliability” of genome-wide association studies that use AI. It can eliminate the bias introduced by AI that’s relying on incomplete data, according to the university. Lu calls the proposed fix “statistically optimal.”
See the release.