I'm a data scientist and AI researcher with a PhD in mathematics.

skills.

I am a data scientist and AI researcher with a PhD in Mathematics. Building on 14 years of research and academic experience, I have spent recent years shifting my focus to applications of machine learning and artificial intelligence. I have had several amazing opportunities to contribute to exciting projects, work closely with teams handling real world data, and develop code that is used in production. Along the way, I have built a strong foundation in many concepts, tools, and technologies, including:

  • Python programming, SQL, and Git
  • Agents: LangGraph, LangChain, Pydantic, LLMs, Agentic Frameworks
  • Machine Learning packages: Tensorflow, Keras, Scikit-Learn
  • Data Manipulation and Visualization packages: Pandas, Polars, Matplotlib, Seaborn, LanceDB
  • Scientific Computing packages: Scipy, Numpy
  • Machine Learning Algorithms and Frameworks: Supervised Learning, Neural Networks, Unsupervised Learning, Time Series Forecasting

When joining a project that is new to me, I learn complex systems quickly. I love to learn, and I'm really, really, good at it. I'm always eager to hear about a fun, hard, problem that a motivated team is working to solve.

current.

Quome (2024 - Present)

I am a Senior Data Scientist at Quome. I work on the research team developing a multi-agent system for generating and deploying apps on our secure cloud. Our focus is designing and implementing experiments with agentic patterns, prompt engineering, and LLMs.

Skills and Tools: Research, Python, Git, LangGraph, LangChain, Pydantic, LLMs, Vector Databases

CMU (2018 - Present)

I am an Assistant Professor of Mathematics at Colorado Mesa University. I teach many upper division math courses, as well as mentoring senior capstone projects.

Skills and Tools: Teaching, Research, Project Management, Professional Technical Writing, Data Visualization

recent projects and certs.

Project: Advanced Retrieval Augmented Generation (RAG)

Awarded Top Project - Erdös Institute Deep Learning Boot Camp - Spring 2024

The Erdös Institute Deep Learning Boot Camp is a selective program designed as a follow up to the Data Science Boot Camp (see below). Projects are evaluated by the course coordinator as well as industry partners.

The parameters of this project were provided by industry partners at Aware, who played the role of primary stakeholders throughout the project. They provided a real world dataset (a massive collection of Reddit posts), guidance, and feedback throughout the project.

objectives

Implement the retrieval component of a RAG pipeline such that it is able to:
  • Retrieve highly relevant results
  • Retrieve results quickly (less than a second)

methods

  • Processed and stored documents in a database using Polars and LanceDB
  • Used existing metadata to impose some structure on the data, and engineered new metadata that leveraged the inherent relationships between documents.
  • Implemented language embeddings to build a vector database, and then an ANN Index, to speed up queries using LangChain, HuggingFace, and LanceDB.
  • Developed/Adapted three metrics (Mean Reciprocal Rank, Extended Mean Reciprocal Rank, and Normalized Discounted Cumulative Gain) for evaluating model performance and improvement, and manually labeled a set of sample queries and results to establish a baseline.
  • Tested more than 150 retrieval configurations, varying pre-embedding parameters such as chunk size and embedding model, as well as pre-retrieval filters implemented using SQL queries, and then re-ranking using engineered metadata.

results

  • Achieved significant improvement over baseline across all three metrics, including a 40% improvement in Extended Mean Reciprocal Rank over baseline configurations.
  • In the best performing configurations, the top result was 2X as likely to be relevant to the query (as opposed to just related, or not relevant).
  • Retrieval times we kept below ~300ms when querying the entire dataset (more than 5.5 million documents).

Project: Groundwater Forecasting

Awarded First Place - Erdös Institute Data Science Boot Camp - Fall 2023

The Erdös Institute Data Science Boot Camp is a selective program designed to prepare PhD holders, and PhD students, for careers in data science and machine learning. Projects are evaluated and ranked by industry experts from a variety of disciplines.

objectives

  • Build a model that is able to forecast groundwater levels using weather and surface water data.
  • Present results to stakeholders in a format that is interactive and easy to utilize.

methods

  • Collected, cleaned, and compiled data from a variety of government and commercial sources.
  • Selected and engineered features for model training.
  • Trained and evaluated several supervised machine learning models
  • Wrapped keras models in custom scikit-learn transformers/estimators to improve training and tuning efficiency
  • Selected and fine tuned a convolutional neural network (CNN) model, using a long short term memory (LSTM) framework, for our final model

results

  • Our final model produced a robust forecast that was capable or making accurate (within ~10%) predictions as far as 7 years beyond the training data, a 67% improvement over baseline models.
  • Results were summarized and made available via an interactive streamlit web app.

Certificate: Machine Learning Specialization

Coursera Specialization Certificate presented by DeepLearning.AI and Stanford University - Summer 2023

This certification includes courses on topics such as:
  • Building ML models with NumPy and scikit-learn, building and training supervised models for prediction & binary classification tasks (linear, logistic regression).
  • Building and training neural networks with TensorFlow to perform multi-class classification, and building and using decision trees and tree ensemble methods
  • Applying best practices for ML development, and using unsupervised learning techniques for unsupervised learning including clustering and anomaly detection
  • Building recommender systems with a collaborative filtering approach and content-based deep learning methods, and building deep reinforcement learning models
While earning this certificate, I earned several supporting certificates in the following:

hire me.

I'm open to opportunities! Please, contact me if you want me to meet you or your team.