Most algorithms, like the powerful Deep Learning, rely on Big Data. But genomic data is not big, quite the opposite, it is wide, so these approaches do not work.
Learning simple patterns from a huge number of examples.
Learning complex patterns from a small number of examples. This is the challenge with genomic data.
Due to the wide nature of genomic data the number of patients is infinitely smaller than the number of variants.
To find the causes for complex polygenic diseases current methods need 40 Million^2 patients– or 500x the population of the earth.
All current methods indiscriminately tests every single genetic variant or combination of variants for a correlation to the disease.
Mathematically this approach is a dead end, because each variant tested requires at the very least 1 patient.
Not every combination of genetic variants needs to be tested.
Our AI algorithms select a handful of complex combinations of genetic variants based on evidence driven disease models.
We use a knowledge graph, an organized database with information from medical journals, protein interaction and protein co-expression database to turn all scientific evidence into disease models.
While a standard knowledge graph-based approach, as used by competitors, is prone to finding only things that are obvious, our AI does not fall into this trap.
Akin to evolutionary algorithms, our AI adds factors without an obvious link to a given disease and is able to evaluate on Millions of data points whether a new factor improves our understanding of a disease.
Over years of training these algorithms have learned what makes a good non-obvious addition to a disease model.
Our WIDE DATA algorithms found the complex patterns behind COVID-19 that pharmas and consortia missed leading to the discovery of a novel drug target, CDK6, that previous efforts were unable to link to COVID-19. CDK6 inhibitors for the treatment of critically-ill COVID-19 patients are now entering clinical trials phase 2.