Projects


Natural language and data science

Project ESPAÑOL is an application that leverages high-throughput analysis of classical Spanish-language poems to create custom lesson plans and guide discovery of new texts for practice and study. The dataset contains 10,000 Spanish-language poems in the public domain from 200 authors covering 500 years, totaling 2.5 million words. An unsupervised clustering algorithm in Python grouped the poems into 4 difficulty levels by analyzing the frequencies of 10,000+ verb forms conjugated across 18 grammatical tenses for the 550+ most common verbs. The full texts, metadata, and grammatical statistics can be accessed in an interactive Plotly Dash application.

Biomarkers and genomic medicine

at Parexel International

At a clinical research organization for a major pharmaceutical client, I performed data ingestion and developed ETL pipelines for genomic sequencing datasets (GWAS & xQTL) from clinical study populations to identify specific variations in the genome that may relate to disease and illness. As part of the biomarkers and genomic medicine team, I bridged the scientific analysts who require efficient management of this data at scale, and the data scientists/engineers who lack familiarity with bioinformatics methods and resources.

Examples of tools created for internal client use:

  • Lift over genomic regions from datasets on human genome build 37 to build 38
  • Update genomic coordinates for rsIDs from older builds of dbSNP
  • Generate custom entity flat files for xQTL studies

Neural cell types and circuits for vocal learning

at University of Texas Southwestern Medical Center in the Roberts lab and Konopka lab

My post-doctoral research used comparative high-throughput transcriptomics to understand how the brain produces complex, learned behaviors like speech and language. These projects were the first to implement single-cell RNA sequencing in songbirds, establishing a template for the field from molecular to computational components, and the results have broad implications for understanding the genetic toolkits that neurons and circuits use to perform advanced computations.

Open-access links to publications:

Neuromodulators of motivation and reward in vocal communication

at University of Wisconsin-Madison in the Riters lab

My doctoral dissertation examined the neural control of vocal communication across contexts in songbirds. I identified neurotensin, a neuropeptide involved in motivation and reward that strongly interacts with dopamine, as a potential modulator of context-specific vocalizations. By establishing links between neurotensin and vocal communication for the first time, this research also contributed to a better understanding of neurotensin’s role in the regulation of social behavior more generally.

Open-access links to publications: