Next generation insights from genomic data


How much insight are you getting from genomic data?

Big data, with billions of observations has sky-rocketed entire industries and enabled innovations like self-driving cars.

With the completion of the Human Genome Project and the advent of Next-Generation Sequencing, 'Big Data Revolution' in genomics and in the treatment, cure, and prevention of diseases was anticipated.

Unfortunately this did not happen. Why not?

Genomic Data = Wide Data

Number of Subjects

Number of Features

Big data: Millions of subjects, manageable amounts of information per subject

Wide data: Few subjects, an entire genome of information per subject

Genomic data is not big, but wide. With Wide Data, where the sample size (for example, the number of patients in a study) is much smaller than the number of variables (for example, the number of variants in a genome), the statistical problem of multiple testing emerges. Standard statistical models struggle to tell signal from noise, in particular when testing not single variants, but complex patterns.


Mass sequencing alone is not the answer. To reliably detect complex patterns that define polygenic diseases one would need to sample 500 times the population of Earth.'s Solution to the Wide Data Problem

Instead of aimlessly testing all of the billions of possible combinations of genetic variants for a correlation to a disease, our AI works as a smart filter that selects hypotheses based on mining biomedical data bases and the scientific literature.

This results in models of diseases that contain manageable numbers of genetic variants but take into account the complexities of how these variants interact.

Our models enable better disease prediction, higher quality recommendations for drug targets, and more accurate patient stratification than competing methods.


Current approaches to estimating genetic predispositions (for example, by computing Polygenic Risk Scores) detect associations of single gene variants with diseases, but fail to detect interactions or epistasis. Our approach identifies previously undetectable gene interaction.

Known Genes

New Genes

Our resulting advanced understanding of disease genomics leads to 20% better prediction and diagnostics and previously undetected drug target linkage.



Information from over ten unstructured biomedical data sources - structured and connected on a knowledge graph for direct insights.

For example, for a given gene variant, our library provides the gene containing the variant, the protein it encodes, whether it is druggable, and if so, drugs (and candidates) available to target it, and which diseases are linked to it. The nature of knowledge graphs allows discovery of related drugs, diseases, genes and pathways.

Drug Target Linkage

Drug target linkages establish that the target addressed by a drug is central to a given disease mechanism. Clinical trials with genetic evidence for drug target linkage are twice as likely to succeed.

Our platform checks for drug target linkage and predicts the success of clinical trials.


Highly accurate polygenic diagnostics identify previously undiagnosed patients and take diagnostics into the age of next-generation sequencing - allowing us to leave the current break-and-fix approach behind in favor of the new predict-and-prevent paradigm.



Jörn Klinger


Marco Schmidt


Charles Ravarani


Christian Hebenstreit


Alex Schwinges

Data Scientist

Justin Cope

Data Scientist

Margaretha Lamparter

Data Scientist

Hannes Baukmann

Data Scientist

Radi Hilaneh

Creative Developer

Mark Laurie

Research Assistant



White Paper

White Paper (large data)