MIRERC 063/2025: The Kenyan Reference Genome Initiative: Whole genome sequencing of 1,000 healthy adults
Abstract
ABSTRACT
Background
African populations remain largely underrepresented in global genomic datasets despite significant advances in genomics and precision medicine, accounting for less than 3% of publicly available data. This underrepresentation has major implications for clinical care, as therapeutics and diagnostics developed in non-African populations may not work as effectively or safely in African settings. Africa is home to the most genetically diverse populations in the world, offering immense potential to uncover novel variants critical to understanding disease risk, treatment response, and human biology. However, this potential remains largely untapped due to limited investment in large-scale sequencing efforts. Kenya, with over 40 distinct ethnic and linguistic communities and growing biomedical infrastructure, is well-positioned to address this gap.
Aim
This study aims to establish the first nationally representative whole genome dataset from healthy, unrelated Kenyan adults to support pharmacogenomic discovery, ancestry research, and population-specific precision medicine. The Kenyan Reference Genome Initiative (KRGI) will sequence the genomes of 1,000 individuals drawn from all 47 counties, representing Kenya’s rich ethnic and geographic diversity.
Methods
We propose a cross-sectional, population-based study targeting healthy, unrelated adults aged 18–35 years enrolled in Kenyan universities and colleges. Participants will undergo clinical screening, and eligible individuals will provide blood samples for DNA extraction. Whole genome sequencing will be performed at 30x coverage using Illumina NovaSeq 6000 technology. Sequencing data will undergo comprehensive bioinformatics analysis to identify single nucleotide variants, INDELs, and structural variants, as well as to perform ancestry deconvolution and pharmacogenomic annotation. De-identified demographic, clinical, and genomic data will be securely stored within a Kenya-hosted genomic database, with controlled access for licensed academic and industry use.
Expected outcome
The Kenyan Reference Genome Initiative will generate a high-quality whole genome dataset from 1,000 healthy, unrelated adults across all 47 counties. This resource will reflect Kenya’s ethnic and geographic diversity and enable the identification of SNPs, INDELs, structural variants, ancestry markers, and pharmacogenomic alleles. These findings will improve understanding of population structure and drug response, supporting more accurate polygenic risk scoring, targeted treatment, and inclusive clinical research. The study will also establish a secure, Kenya-hosted genomic database with restricted access for approved research and industry partners, ensuring data sovereignty and ethical oversight. Findings will be published to inform future research, health policy, and the development of genomics-informed care in Africa. Collectively, these outcomes will position Kenya as a regional leader in population genomics and personalized medicine.