research

Our lab studies neurological/neurodegenerative disease, transcriptomics, and basic human genome questions using a combination of novel bioinformatic methodology, high-throughput sequencing data, high performance computing, and machine learning techniques.

Neurodegenerative Disease

Chronic Traumatic Encephalopathy Transcriptomics

Chronic Traumatic Encephalopathy (CTE) is a progressive neurodegenerative disease caused by repetitive head impacts, as experienced by some military personnel, victims of domestic abuse, and players of violent sports. CTE pathology resembles that observed in Alzheimer's Disease, but has different spatial distribution and molecular signature. This area of research uses high throughput transcriptomic data generated from brains donated to the BU CTE Center Brain Bank to characterize and understand disease mechanisms.

Central Nervous System Microbiome

Multiple lines of evidence suggest microbes play a direct role in central nervous system (CNS) disorders. Beyond the so-called “gut-brain” axis which posits interactions between gut microflora and the CNS influence neurological health, numerous studies have detected microbes directly in brain tissue from individuals with neurodegenerative disease, including Parkinson’s Disease, Alzheimer’s Disease, ALS, and multiple sclerosis. This area of research investigates the evidence of microbial signals found in transcriptomic data generated from CNS tissues across a variety of conditions.

The Pan-Neurodegenerative Disease Phenotype

Although the specific affected brain regions and pathology vary among different neurodegenerative diseases, gene expression studies in post mortem brain tissue show similar activation of neuroimmune and neuroinflammatory pathways. This area of research involves large scale comparison of post mortem brain gene expression data across many different neurodegenerative diseases to identify common and unique signals.

Genomics Methods Development

Annotation-free Transcriptome Analysis

Although only ~10% of the human genome codes for genes, >85% of the genome is transcribed to RNA. RNASeq data, especially ribo-depleted data, contain substantial numbers of reads that originate from intergenic regions that are typically ignored when performing gene-centric analyses using gene annotations. Our lab has developed methods that examine gene expression data for patterns ab initio, without filtering reads based on annotated regions. These methods have identified genomic loci that have transcript associated with biologically and clinically relevant features in Huntington's Disease that have gone unnoticed in annotation-restricted analyses.

Human Sequence Variation in Complex Genomic Loci

Large scale public human genome sequence projects have generated individual-specific genome assemblies. These data provide exciting opportunities to identify and catalogue human variation in specific loci, especially those in complex, repetitive, or disease-associated parts of the genome. Our lab develops sequence analysis methodologies that investigate interesting regions of the human genome, including rDNA arrays, the MAPT region, and telomeres.

Metatranscriptomic Taxonomic and Functional Characterization

Samples containing mixtures of organisms, including environmental, host microbiome site tissue, and symbiotic holobiont systems, contain a complex and diverse set of RNA transcripts. Due to dynamic organism and transcript copy number conditions and complex orthology among genomes, identifying which organisms and genomes are present in which proportion is challenging. Our lab develops methods that resolve ambiguity and mapping of RNA sequencing reads from mixed samples to taxonomic distributions and functional signatures with the goal of identifying associations with biological, environmental, and clinical variables of interest.

Sequence Graph Representations of Transcription

Sequence graphs are data structures used to represent and encode variations within sets of related genomes. Our lab is interested in graph based representations of families of transcripts that can encode and reveal complex expression patterns associated with quantities of interest.