Projects

These case studies highlight selected research projects and methods I have developed in the context of Alzheimer’s disease genetics and computational genomics.

Genetic risk architecture of Alzheimer’s disease

Problem:
Alzheimer’s disease risk is influenced by many genetic variants acting across biological pathways, but interpreting how these variants contribute to disease mechanisms remains challenging.

My role:
Developing computational approaches to investigate how different forms of genetic variation — from common variants to structural variation — contribute to Alzheimer’s disease risk.

Approach:
Combining statistical genetics and long-read sequencing approaches to study genetic variation across the genome and within complex loci. This includes pathway-based polygenic risk score analyses and long-read sequencing of complement system genes to resolve structural variation.

Evidence: - Ongoing research in Alzheimer’s disease genetics
- Results contributing to multiple collaborative research projects

Long-read sequencing of complement genes in Alzheimer’s disease

Problem:
Several complement system genes associated with Alzheimer’s disease contain complex structural variation and segmental duplications that are difficult to resolve using short-read sequencing.

My role:
Leading the bioinformatics analysis of PacBio long-read sequencing datasets in Alzheimer’s disease research at the UK Dementia Research Institute.

Approach:
Developing analysis workflows for variant detection and structural variant discovery using long-read sequencing data, integrating multiple variant callers and performing quality control and benchmarking across datasets.

Evidence: - Analysis workflows under development
- Results contributing to ongoing Alzheimer’s disease genetics research

Structural variation at the CR1 locus in Alzheimer’s disease

Problem:
The CR1 locus is one of the strongest genetic risk loci for Alzheimer’s disease, but its structure includes large segmental duplications that make copy number variation difficult to resolve using short-read sequencing.

My role:
Designing and implementing the bioinformatics analysis to characterise structural variation at the CR1 locus using long-read sequencing data from Alzheimer’s disease cases and controls.

Approach:
Extracting targeted genomic regions from PacBio HiFi alignments and analysing copy number variation and structural variants using long-read variant callers. The analysis integrates multiple tools and reference assemblies to resolve complex genomic architecture at the CR1 locus.

Evidence: - Ongoing analysis of long-read sequencing datasets
- Results informing ongoing Alzheimer’s disease genetics research

Pathway-based polygenic risk score analysis for Alzheimer’s disease

Problem:
Polygenic risk scores are often calculated genome-wide, making it difficult to interpret how specific biological pathways contribute to disease risk.

My role:
Developing and applying pathway-based PRS approaches to explore how biological systems such as endocytic pathways contribute to Alzheimer’s disease susceptibility.

Approach:
Using GWAS summary statistics and curated pathway gene sets (KEGG, Reactome, GO) to compute pathway-specific polygenic risk scores and evaluate their association with plasma biomarkers.

Evidence: - Ongoing research project
- Results contributing to a manuscript in preparation

Benchmarking variant calling approaches for long-read sequencing

Problem:
Variant detection in long-read sequencing data can vary substantially depending on the choice of tools and parameters.

My role:
Evaluating and comparing multiple variant calling tools to identify robust strategies for analysing PacBio HiFi sequencing datasets.

Approach:
Benchmarking variant callers including Longshot, Clair3, and bcftools for SNVs and small indels, and Sniffles, SVIM, and CuteSV for structural variants.

Evidence: - Comparative analyses supporting long-read sequencing workflows

Computational workflows for genomics analysis

Problem:
Bioinformatics projects often lack clear documentation and reproducible workflows, making analyses difficult to reproduce or extend.

My role:
Designing project structures and documentation frameworks to support reproducible and collaborative bioinformatics research.

Approach:
Developing workflows using GitHub, Quarto, and high-performance computing environments to organise analyses, track results, and communicate methods clearly.

Evidence: - Public GitHub repositories
- Reproducible documentation and analysis reports

Multi-omics analysis of macrophage lipid metabolism

Problem:
Macrophage lipid metabolism plays a key role in immune regulation, but the transcriptional mechanisms controlling lipid composition in tissue-resident macrophages were not well understood.

My role:
Led computational and experimental analysis integrating transcriptomics and lipidomics data to investigate the role of the transcription factor GATA6 in regulating macrophage lipid metabolism.

Approach:
Combined LC-MS lipidomics with gene expression analysis and multivariate statistical methods to characterise lipid metabolic pathways regulated by GATA6.

Evidence: - Publications in Cellular and Molecular Life Sciences and related journals
- Conference presentations at international lipidomics meetings