Full Stack Data Scientist/Engineer

Raw Source -> ETL -> Warehousing -> Processing -> AI/ML -> Staging -> Visualization
(Everything Automated)

PROFESSIONAL SUMMARY

For the past decade, I have been building enterprise cloud data products and agile R&D teams in the retail, grocery, and restaurant spaces. My management and product development experience arose out of necessity while working for growing companies. I thrive in autonomy and have fostered the same positive ethic in my teams. I am a champion for standard practices and low tech debt, while devising novel solutions to odd problems. Here are some tech stacks that I have worked with:

  • ETL (Python-based): APIs (REST, SOAP), web/email scraping, flat files (csv, xml, json, text, etc), IoT data (wearables, microcontrollers), government/remote sensing data (Satellite imagery, NOAA, EPA, NWS, NASA, etc)

  • Infrastructure/Automation: Docker, AWS (Beanstalk, EC2, Fargate, Lambda, Event Bridge), Azure, Self-hosted, MQTT, Cron jobs, Alteryx scheduler, Windows scheduler, Python (Djang/Flask/FastAPI)

  • Data Warehousing: SQL Server, PostgreSQL (PostGIS), MySQL, MongoDB, AWS (S3, RDS, EFS, etc.), FTP, NAS

  • AI/ML/Analysis: Python (PyTorch, Scipy, scikit-learn, Pandas, Numpy), R, IDL

  • Visualization: Matplotlib, PowerBI, Japsersoft, Staging for Frontend

  • Workflow/Monitoring: Git, JIRA, Cloudwatch, DataDog, DIY logging, AWS CodeBuild/Commit/Deploy/Pipeline

PROFESSIONAL EXPERIENCE

Taiga Data, Lead Data Scientist/Engineer
Austin, TX — November 2021 - November 2023

SUMMARY

At Taiga, I lead our data science strategy. We build full-stack, CI/CD data science products. These automated tools are built to help clients understand their CPG/KPI data at each of their physical locations. The output of each pipeline below is staged for consumption by a client-facing front end. Here are a few notable projects (Most in Python, SQL, Bash)

  • Anomaly detection (Python, SQL, AWS)

    • Created an automated system to detect stockouts and examine customer behavior when their product of choice is out of stock (i.e. substitute selections)

  • Time series forecasting (Python, SQL, AWS)

    • Created an automated forecasting system based on live data for CPG sales, company/store/product KPI’s, customer traffic, etc.

    • Cadence of the forecasts are configurable and the system is agnostic to the type of time series analysis (i.e. the method of forecasting can be swapped for any other supported method.

    • Used internal data and external factors like historical/forecasted weather, events, holidays, political/economic trends.

  • Price Optimization/Elasticity (Python, SQL, AWS)

    • Created a modeling system for price optimization of any product in inventory with requisite data.

    • Takes into account promotions/coupons/discounts

  • Customer Segmentation (Python, SQL, AWS)

    • Created a clustering model to segment customers based on in-store behavior.

    • Applied analysis for frequency, baskets, LTV, etc.

  • Large Language Model Integration (Python, SQL, AWS)

    • Created a wrapper for large language models so that clients are able to ask plain-word questions to their data. The wrapper returns the code necessary to transform the raw data to get an answer to a prompt

  • Weather/Ocean/Atmospheric Engine (Python, SQL)

    • Collected and analyzed atmospheric and oceanographic remote sensing data for use in modeling through automated

      pipelines.

eSite Analytics, Data Analyst/Scientist & Software Developer
Charleston, SC — February 2018 - November 2021

SUMMARY

At eSite Analytics, I use my backgrounds in data software architecture/development and statistical processes to lead the development and maintenance of data pipelines that create machine learning models for client site selections. My role at eSite has evolved into one that designs and builds automated data processes.Anomaly detection (Python, SQL, AWS)

  • Site Selection Analysis (R, Python, Alteryx, SQL, Azure)

    • Created Forest and Regression models with thousands of initial variables to determine the optimal locations for clients to place a new brick and mortar store

  • ETL Automation and Standardization (R, Python, Alteryx, SQL)

    • Created a company framework that ingests, processes, and distributes data

    • Automated data extraction and cleaning from a variety of client-provided sources - FTP, flat files, DBs, API’s, etc.

    • Accessed remote sensing and government dataset to inform modeling.

  • Misc Product Development

    • Weather Forecasting and Severe Weather/Hurricane Tracking

    • Web Scraping

    • Traffic analytics

  • Internal Utilities(Python, Alteryx, Azure)

    • Created internal utilities that help our data analysts become more efficient by automating redundant tasks. Misc Duties where I wear a lot of hats

    • Report Design (Alteryx Server, XML, MS Power BI)

    • Creating Data SOPs

    • I get pulled in a lot of different directions, building things for every department

OneDataSource, Software Developer
Charleston, SC — December 2015 - February 2018

SUMMARY

At OneDataSource, I gather financial, transactional, employee labor, and inventory data in its raw form; then through scripting, reporting tools, and statistics I take this data and transform it into curated reports that contain actionable insights for the client.

  • ETL (Perl, Python, MySQL, AWS)

    • Automated extraction of data from multiple sources via REST, SOAP, IMAP, JSON, XML, .xls, .csv, FTP, SFTP, etc. into a single consistent environment which can be queried to engage with front end applications.

    • These automated processes serve as the backbone of an enterprise business analytics and reporting service for some of the largest privately owned franchisee groups in the country.

  • Predictive Analytics (Python, Perl, MySQL, AWS)

    • Created a process to predict time-series metrics for our clients such as sales, speed of service, labor costs, etc.

    • The process analyzes -- by multiplicative decomposition -- historical data provided by the client, weather history, and event history. It then builds a forecast model that can be dynamically and rapidly applied and displayed from a database table. This project is object oriented.

  • Data Migration (Perl, MySQL, AWS)

    • Wrote and edited scripts and database files for migration of our entire business processes from physical servers to cloud servers.

  • Internal Utilities (Python, Perl, Jaspersoft)

    • Created internal reports to visually display the status of polling processes for customer service troubleshooting.

    • Created testing scripts to QA parser scripts, aggregator scripts, databases, and data.

College of Charleston, Astrophysics Researcher
Charleston, SC — August 2013 - May 2015

SUMMARY

Researched with several CofC faculty (Dr. Jon Hakkila; Dr. George Chartas) and research teams aiding in the creation of data pipelines and analysis software (written in Python and IDL) for studying phenomena ranging from gamma-ray bursts to the search for intermediate mass black holes. Here I learned a lot about working with raw data from remote sensing satellites.

  • Scientific Code Development (IDL, Python)

    • Created routines to statistically analyze time-series gamma-ray burst data

EDUCATION

College of Charleston
Charleston, SC
August 2020

M.S. Data Science and Analytics (3.9 GPA)
Thesis: A Search for Self-Similarities in BATSE Gamma-Ray Burst Emissions Using Agglomerative Clustering

College of Charleston
Charleston, SC
May 2015

B.S. Physics - Focus on Computational Physics
B.S. Astrophysics - Focus on High-Energy Computational Astronomy
Minor: Mathematics/Statistics

Gardner-Webb University
Boiling Springs, NC

B.A. Philosophy and Theology - Focus on Logic
Minor: Spanish May 2011

PUBLICATIONS & PRESENTATIONS

PUBLICATIONS

  • Hakkila et.al., Smoke and Mirrors: Signal-to-Noise and Time-Reversed Structures in Gamma-Ray Burst Pulse Light Curves, Astrophysical Journal, 2018, https://arxiv.org/abs/1804.10130

  • Cannon, A Search for Self-Similarities in BATSE Gamma-Ray Burst Emissions Using Agglomerative Clustering, 2020, proquest

PRESENTATIONS & POSTERS

  • A Structure-Fitting Process for Gamma-Ray Burst Light Curves

  • A Search for Self Similarities in Batse Gamma-Ray Burst Emissions Using Agglomerative Clustering

  • A Preliminary Analysis of Complex Gamma-Ray Burst Pulses

  • Searching for Emission Episode Self Consistency in Gamma-Ray Burst Light Curves

PROJECTS & HOBBIES

YouTube & World Records

I know YouTube is a weird thing to put on a resume, but I have some fun projects on there. Namely, one where I calculated the optimal route through all 50 states with an MCMC technique and then actually drove it. We ended up smashing the world record by over half a day. The channel is called NerdStoke.

Personal Biometric Data Aggregation and Modelling

I have been training for ultra marathons and I figured, why not use my skills in analysis and programming to help me out? I have written several integrations with a large suite of apps and devices that passively track dozens of metrics on my health daily. These metrics include body composition, nutrition, exercise metrics, sleep quality, location and altitude, everything I read, and much more. This data is all collected automatically through data pipelines I set up in my home lab. I then created a convolutional neural network to approximate a solution to an ODE whose parameters Contain information about the health of my cardiovascular system and current recovery. So far it has been a ton of fun seeing exactly what levers I can pull to make the biggest impact on my health.

Science News Website

getscienced.com was a website created to deliver up-to-date content from the most recent academic articles to the non-scientist consumer through a collection of academics translating the scientific jargon into layman’s terms. It was when I created this site that I learned the power of collaboration, marketing, and lead generation. The site’s intellectual property has since been acquired by Pfizer, which last I checked, operates under different branding at getscience.com.

Exoplanet Detection with Bayesian Blocks

I use a technique called Bayesian Blocks to develop a new method for detecting Exoplanets. The method searches a time-series starlight signal and uses Bayesian Blocks to identify a significant change in the star’s light. If several significant changes show results in a Fourier analysis, then a planet may exist.

Lemons Race Car

That is Lemons, not Le Mans. It is a hilarious racing circuit where the maximum that you can spend on a car is $500 (except for safety equipment). We purchased an early 2000’s Toyota Camry for $480 and have been getting it ready to pass the strict safety inspections. Now we just have to figure out what to do with the remaining $20.

HOBBIES

On spare afternoons and weekends, I can normally be found tinkering in my garage on home-made science and technology projects and building out my home lab. To get myself moving, I frequently compete in professional surfing competitions up and down the east coast and gulf. I recently got into competing in ultra running, which is painful. I also enjoy other endurance/actions sports, sailing, and windsurfing.