Bio

I’m currently a data scientist and bioinformatician in the team led by Leonardo Collado-Torres at LIBD.

Here, I use R, Python, shell scripting, and more to develop computational pipelines for processing genomic data, explore and implement machine-learning methods, and publish research papers showcasing open-source software and biological findings. My work in genomics spans topics such as bulk-RNA sequencing (for which I published SPEAQeasy), whole-genome bisulfite sequencing (see BiocMAP), spatial transcriptomics (see visiumStitched), and cell-type deconvolution.

I also help install and maintain software for LIBD and collaborators, and mentor and share my knowledge through data science guidance sessions and R Stats Club presentations. Some example topics I’ve taught include implementing machine-learning models in python, system-wide software installations on high-performance computing clusters with Lmod, and running PyTorch-based software with GPUs at HPC environments.

Portfolio

Data Science Snippets

Here I showcase actual code I’ve developed for data-science tasks throughout my career.

  • Building machine-learning models: here I use scikit-learn to train and test some candidate cell-type-classification models.

  • Data wrangling, cleaning, and visualization: Beginning with several messy, large datasets, I use dplyr in R to clean and integrate them. Along the way, I visualize key patterns with ggplot2.

  • Deep learning: In a personal project to build a neural-network-powered chess engine, I use Keras/Tensorflow to build and fit a CNN-based model.

Bioinformatics Pipelines

These are Nextflow-based pipelines for processing genomic data where I was the lead developer.

  • SPEAQeasy: bulk RNA-seq preprocessing workflow, quantifying features like genes into SummarizedExperiment R objects ( code | documentation | paper)

  • BiocMAP: preprocessing workflow for bisulfite sequencing data, quantifying methylation in bsseq R objects ( code | documentation | paper)

Education

IBM | Online

Data Science Professional Certification | October 2024 - January 2025

University of Maryland, Baltimore County | Baltimore, MD

B.S. in Mathematics | September 2013 - May 2018

Experience

Lieber Institute for Brain Development | Research Associate II- Data Science | November 2018 - Present

Public Presentations

Bioconductor | Bioconductor | 2023

Biological Data Science | Cold Spring Harbor Laboratory | 2022

European Bioconductor | Bioconductor | 2020

Nicholas Eagles


Bio

I’m currently a data scientist and bioinformatician in the team led by Leonardo Collado-Torres at LIBD.

Here, I use R, Python, shell scripting, and more to develop computational pipelines for processing genomic data, explore and implement machine-learning methods, and publish research papers showcasing open-source software and biological findings. My work in genomics spans topics such as bulk-RNA sequencing (for which I published SPEAQeasy), whole-genome bisulfite sequencing (see BiocMAP), spatial transcriptomics (see visiumStitched), and cell-type deconvolution.

I also help install and maintain software for LIBD and collaborators, and mentor and share my knowledge through data science guidance sessions and R Stats Club presentations. Some example topics I’ve taught include implementing machine-learning models in python, system-wide software installations on high-performance computing clusters with Lmod, and running PyTorch-based software with GPUs at HPC environments.

Portfolio

Data Science Snippets

Here I showcase actual code I’ve developed for data-science tasks throughout my career.

  • Building machine-learning models: here I use scikit-learn to train and test some candidate cell-type-classification models.

  • Data wrangling, cleaning, and visualization: Beginning with several messy, large datasets, I use dplyr in R to clean and integrate them. Along the way, I visualize key patterns with ggplot2.

  • Deep learning: In a personal project to build a neural-network-powered chess engine, I use Keras/Tensorflow to build and fit a CNN-based model.

Bioinformatics Pipelines

These are Nextflow-based pipelines for processing genomic data where I was the lead developer.

  • SPEAQeasy: bulk RNA-seq preprocessing workflow, quantifying features like genes into SummarizedExperiment R objects ( code | documentation | paper)

  • BiocMAP: preprocessing workflow for bisulfite sequencing data, quantifying methylation in bsseq R objects ( code | documentation | paper)

Education

IBM | Online

Data Science Professional Certification | October 2024 - January 2025

University of Maryland, Baltimore County | Baltimore, MD

B.S. in Mathematics | September 2013 - May 2018

Experience

Lieber Institute for Brain Development | Research Associate II- Data Science | November 2018 - Present

Public Presentations

Bioconductor | Bioconductor | 2023

Biological Data Science | Cold Spring Harbor Laboratory | 2022

European Bioconductor | Bioconductor | 2020