Nicholas Eagles

Bio

I’m currently a data scientist and bioinformatician in the team led by Leonardo Collado-Torres at LIBD.

Here, I design data pipelines, apply statistics and machine learning to better understand genomic data, and develop BI visualizations to communicate insights.

I also help install and maintain software for LIBD and collaborators, mentor and share my knowledge through data science guidance sessions, and direct the R Stats Club. Some example topics I’ve taught there include implementing machine-learning models in Python, leveraging LLMs for data science, and running PyTorch-based software with GPUs at HPC environments.

I use technologies like R (tidyverse + Bioconductor), Python (pandas + scikit-learn), and git daily. I’ve also developed expertise with SQL, Databricks, Airflow, dbt, and Tableau.

Portfolio

Data Science Snippets

Here I showcase actual code I’ve developed for data-science tasks throughout my career.

Building machine-learning models: here I use scikit-learn to train and test some candidate cell-type-classification models.
Data wrangling, cleaning, and visualization: Beginning with several messy, large datasets, I use dplyr in R to clean and integrate them. Along the way, I visualize key patterns with ggplot2.
Deep learning: In a personal project to build a neural-network-powered chess engine, I use Keras/Tensorflow to build and fit a CNN-based model.

Data Engineering Projects

These personal projects are heavier on the SQL, ETL/ELT pipelining, and data-warehouse-interaction side of things.

ELT pipeline + data warehouse: here I designed a data warehouse using the medallion architecture, and implemented an ELT pipeline using dbt and Airflow to populate it with cleaned, BI-ready tables.
Titanic Databricks project: here I engineer a Databricks workflow/ ETL pipeline with the well-known Titanic dataset touching Pyspark, Databricks notebooks + dashboards, Unity Catalog, and more. As a bonus I recreated the final visualizations with interactivity in Tableau here.

Bioinformatics Pipelines

These are Nextflow-based pipelines for processing genomic data where I was the lead developer.

SPEAQeasy: bulk RNA-seq preprocessing workflow, quantifying features like genes into SummarizedExperiment R objects ( code | documentation | paper)
BiocMAP: preprocessing workflow for bisulfite sequencing data, quantifying methylation in bsseq R objects ( code | documentation | paper)

Education

IBM | Online

Data Science Professional Certification | October 2024 - January 2025

University of Maryland, Baltimore County | Baltimore, MD

B.S. in Mathematics | September 2013 - May 2018

Experience

Lieber Institute for Brain Development | Research Associate II- Data Science | November 2018 - Present

Publications

Integrating gene expression and imaging data across Visium capture areas with visiumStitched

Eagles et al., BMC Genomics, 2024.

BiocMAP: a Bioconductor-friendly, GPU-accelerated pipeline for bisulfite-sequencing data

Eagles et al., BMC Bioinformatics, 2023.

SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses.

Eagles et al., BMC Bioinformatics, 2021.

My full list of publications is available at my ORCID profile.

Public Presentations

Bioconductor | Bioconductor | 2023

Presented a workshop: A Bioconductor-style differential expression analysis powered by SPEAQeasy recorded as a video
Presented a short talk: Spot Deconvolution in the Post-Mortem Human DLPFC recorded as a video

Biological Data Science | Cold Spring Harbor Laboratory | 2022

Presented a poster: Benchmarking spot-level cell-type deconvolution methods using Visium immunofluorescence benchmark data on the human dorsolateral prefrontal cortex.

European Bioconductor | Bioconductor | 2020

Presented my SPEAQeasy paper.

Bio