Projects
Natural language and data science
Project ESPAÑOL is an application that leverages high-throughput analysis of classical Spanish-language poems to create custom lesson plans and guide discovery of new texts for practice and study. The dataset contains 10,000 Spanish-language poems in the public domain from 200 authors covering 500 years, totaling 2.5 million words. An unsupervised clustering algorithm in Python grouped the poems into 4 difficulty levels by analyzing the frequencies of 10,000+ verb forms conjugated across 18 grammatical tenses for the 550+ most common verbs. The full texts, metadata, and grammatical statistics can be accessed in an interactive Plotly Dash application.
Biomarkers and genomic medicine
At a clinical research organization for a major pharmaceutical client, I performed data ingestion and developed ETL pipelines for genomic sequencing datasets (GWAS & xQTL) from clinical study populations to identify specific variations in the genome that may relate to disease and illness. As part of the biomarkers and genomic medicine team, I bridged the scientific analysts who require efficient management of this data at scale, and the data scientists/engineers who lack familiarity with bioinformatics methods and resources.
Examples of tools created for internal client use:
- Lift over genomic regions from datasets on human genome build 37 to build 38
- Update genomic coordinates for rsIDs from older builds of dbSNP
- Generate custom entity flat files for xQTL studies
Neural cell types and circuits for vocal learning
at University of Texas Southwestern Medical Center in the Roberts lab and Konopka lab
My post-doctoral research used comparative high-throughput transcriptomics to understand how the brain produces complex, learned behaviors like speech and language. These projects were the first to implement single-cell RNA sequencing in songbirds, establishing a template for the field from molecular to computational components, and the results have broad implications for understanding the genetic toolkits that neurons and circuits use to perform advanced computations.
Open-access links to publications:
Neuromodulators of motivation and reward in vocal communication
at University of Wisconsin-Madison in the Riters lab
My doctoral dissertation examined the neural control of vocal communication across contexts in songbirds. I identified neurotensin, a neuropeptide involved in motivation and reward that strongly interacts with dopamine, as a potential modulator of context-specific vocalizations. By establishing links between neurotensin and vocal communication for the first time, this research also contributed to a better understanding of neurotensin’s role in the regulation of social behavior more generally.
Open-access links to publications:
- (2018) Neurotensin and neurotensin receptor 1 mRNA expression in song-control regions changes during development in male zebra finches
- (2018) Co-localization patterns of neurotensin receptor 1 and tyrosine hydroxylase in brain regions involved in motivation and social behavior in male European starlings
- (2016) Song in an affiliative context relates to the neural expression of dopamine- and neurotensin-related genes in male European starlings
- (2015) Neurotensin neural mRNA expression correlates with vocal communication and other highly-motivated social behaviors in male European starlings
- (2015) Neurotensin immunolabeling relates to sexually-motivated song and other social behaviors in male European starlings (Sturnus vulgaris)