GenoGra: an enabling technology for AI in genetics

In a previous blog post, we explored the vast potential of artificial intelligence (AI) in genetics, extending from advanced diagnostics and tailored therapies to basic research. However, integrating AI into this field has its challenges. Two main obstacles are the quality and quantity of genetic data: these must be vast, accurate, and free of bias to train AI algorithms effectively. In addition, the preliminary preparation of these massive genetic datasets is a process that requires considerable computational power and specialized expertise and can, therefore, affect and slow down the flow of genomic analysis.

The stages of genomic analysis

This flow can be divided into three main phases: primary, secondary, and tertiary. The primary phase focuses on DNA collection and sequencing. During this process, the genetic material belonging to the sample is digitized into sub-sequences of the original genome called reads. Next, the secondary step comes into play, converting this raw data produced by the first step into intelligible and interpretable information, enabling the search for specific nucleotide sequences in the sample and the identification of variants.

This step requires considerable computational power and technical expertise involving sequence alignment and correction. Finally, the tertiary stage exploits the information obtained to perform complex analysis and find a correlation between the variants found and particular conditions or traits. At this stage, AI becomes crucial, as it enables efficient and in-depth analysis of complex genetic patterns.

Secondary analysis: the real bottleneck

A key aspect of secondary analysis is the significant dependence on a reference genome, e.i., the average (consensus) genomic sequence generated from a set of genomes. The reference genome is used as a template to compare and interpret new genetic sequences, but it can incorporate significant biases related to gender and ethnicity. This situation arises because many reference genomes have been compiled using data predominantly from specific populations, often under-representing overall genetic diversity. As a result, this can lead to misinterpretations or incomplete genetic data in populations other than those on which the reference genome is based, with potential implications for the accuracy of genetic diagnostics and the development of personalized therapies.

In addition, although technically advanced, the tools available for secondary analysis can be very complex in their practical application. This complexity not only increases the risk of human error but also requires specialized skills, making the analysis less accessible to a broad spectrum of researchers and clinicians. Redundancy in tool functionality and their limited ability to scale effectively with the size of genetic datasets add further challenges, leading to inefficiencies that can significantly slow down the analysis process.

Therefore, secondary analysis has become a bottleneck in the entire genomic analysis flow. This obstacle impacts the speed with which results can be obtained and can also affect the quality and reliability of the conclusions drawn, directly affecting subsequent analysis steps and clinical applications. Therefore, the need to improve and optimize these tools and methodologies is critical to ensure that genomic analysis can fully realize its revolutionary potential in research and personalized medicine.


GenoGra is, therefore, at the heart of this challenge with innovative solutions. Our technology overcomes traditional limitations by offering optimized tools for pangenomic analysis, which considers all the genetic variability in a population and merges it into a pangenome. Specifically, with our solutions, it is possible to build affordable, personalized pangenomes that reflect the genetic diversity of all contributing samples. This is made possible by using genome graphs, which allow variability between individuals to be represented more effectively and compactly. Such structures simplify analysis and reduce complexity, human error, and data redundancy.

With a graph-based structure, we can represent multiple genomes in a single network of interconnected information where areas common to multiple genomes are merged within shared nodes while variations generate branches. This structure thus allows us to collect all population information and consider even the rarest variations.

Changing the genome representation paradigm thus allows us to be much more accurate in the analysis, but not only that! An increase in the accuracy of this step of the analysis also unlocks several improvements downstream of the alignment, allowing us to create analysis pipelines with far fewer tools than traditional methods. In this way, we make the work of technicians and clinicians easier, allowing for easier and faster analysis.

However, the introduction of graph structures poses a significant challenge regarding the algorithmic complexity of the analysis, potentially resulting in analysis times that are too long for real-world applications. To solve this problem, GenoGra’s solutions leverage commonly used hardware (i.e., GPUs) to accelerate computation in a completely transparent way to the user. This lets our customers focus on the analysis without worrying about technical complexities.

GenoGra is driving a fundamental change in how we think about genomic data. Our technology opens new frontiers in genetic research, making analysis more accessible, accurate, and free of “bias,” thus accelerating the path to truly personalized medicine.