GenoGra on ANSA: solving the CNV bottleneck in pangenome graphs

Jun 4, 2026

a close up of a metal structure with a pattern on it

ANSA, Italy's national news agency, dedicated a piece to GenoGra's latest research. Their science desk covered Panphorte, a new computational method we developed together with the Department of Electronics, Information and Bioengineering at Politecnico di Milano, published in Frontiers in Bioinformatics.

Here's what it's about

The bottleneck that graph-based genomic analysis has been living with for years

For years, one of the most stubborn bottlenecks in pangenome graph analysis has been handling Copy Number Variations (CNVs) and Variable Number Tandem Repeats (VNTRs), repetitive DNA sequences that may repeat 2 times in one genome and 40 in another. These variants have been consistently misrepresented or discarded in standard graph-based pangenomic workflows. The result: incomplete analyses, degraded alignment behavior, and blind spots directly linked to disease.
In graph-based pangenome references, CNVs are typically represented as alternative acyclic paths.
This creates three concrete problems for anyone doing population genomics or pangenome graph analysis:

Downstream analysis is hindered — acyclic misrepresentation distorts variant calling and comparative workflows
Alignment behavior degrades — reads misalign or fail to map in repeat-rich regions
Graph visualizations become uninterpretable — repeated loci appear as ambiguous branching paths rather than biologically meaningful structures

These aren't edge cases. CNVs and VNTRs are present across all species and are directly associated with the onset of several diseases, making their exclusion a real scientific cost, not just a technical limitation.

What Panphorte does, and how it works

Panphorte is a topology-optimization methodology for pangenome graphs. Given an annotated pangenome graph with haplotype paths, it:

Detects repeat-driven misrepresentations within superbubbles
Isolates shared repeat sequences across distinct subpaths
Refactors the graph by splitting nodes and introducing explicit cycles, encoding CNVs and VNTRs faithfully without losing information

The result is a pangenome graph that more accurately reflects the underlying biology, with measurable improvements across every key performance metric:

Up to 71.69% reduction in memory footprint
Up to 34.4% improvement in exact read matches
Substantially clearer visual identification of repeated loci

A C++ command-line implementation is available, along with a complementary pipeline combining Panphorte with GFAffix for further redundancy reduction in non-repeat regions.

Why this matters, for the field and for GenoGra

Pangenomics is built on a fundamental promise: map the full genetic variability of a species, not just what fits a single reference genome. That promise breaks down the moment large classes of variation, like CNVs, are excluded or misrepresented.
Panphorte closes a gap that has been acknowledged in population genomics and graph-based genomic analysis for years but remained unsolved at the graph-structure level.
For GenoGra, it represents both a peer-reviewed scientific result and a direct building block of our pangenomic software platform, designed to make genomic analysis software scalable, complete, and production-ready for organizations working across human genomics, agriculture, animal breeding, and industrial microbiology.

"We essentially always discarded this information. Now we don't have to." — Mirko Coggi, Chief Research Officer of GenoGra.

Read the full ANSA coverage (Italian) →

Read the peer-reviewed paper on Frontiers in Bioinformatics →