Full Stack Data Scientist/Engineer
Raw Source -> ETL -> Warehousing -> Processing -> AI/ML -> Staging -> Visualization
(Everything Automated)
PROFESSIONAL SUMMARY
For the past decade, I have been building enterprise cloud data products and agile R&D teams in the retail, grocery, and restaurant spaces. My management and product development experience arose out of necessity while working for growing companies. I thrive in autonomy and have fostered the same positive ethic in my teams. I am a champion for standard practices and low tech debt, while devising novel solutions to odd problems. Here are some tech stacks that I have worked with:
ETL (Python-based): APIs (REST, SOAP), web/email scraping, flat files (csv, xml, json, text, etc), IoT data (wearables, microcontrollers), government/remote sensing data (Satellite imagery, NOAA, EPA, NWS, NASA, etc)
Infrastructure/Automation: Docker, AWS (Beanstalk, EC2, Fargate, Lambda, Event Bridge), Azure, Self-hosted, MQTT, Cron jobs, Alteryx scheduler, Windows scheduler, Python (Djang/Flask/FastAPI)
Data Warehousing: SQL Server, PostgreSQL (PostGIS), MySQL, MongoDB, AWS (S3, RDS, EFS, etc.), FTP, NAS
AI/ML/Analysis: Python (PyTorch, Scipy, scikit-learn, Pandas, Numpy), R, IDL
Visualization: Matplotlib, PowerBI, Japsersoft, Staging for Frontend
Workflow/Monitoring: Git, JIRA, Cloudwatch, DataDog, DIY logging, AWS CodeBuild/Commit/Deploy/Pipeline
PROFESSIONAL EXPERIENCE
Applied Geographic, Senior Data Scientist/Engineer
Austin, TX (Remote) — January 2024 - Present
At AGS, I lead the development of The company’s next generation of data pipelines,processing, and delivery. We aggregate geospatial data from around the web to create demographic, psychographic,and land-use data products for our clients. These products are shipped at any geospatial granularity, in bulk and on-demand, and in dozens of file formats. Here are a few notable projects (Python, Postgres, Bash, AWS, etc.)
Parcel and land-use (Python, Postgres, AWS, Git)
Created an automated pipeline to scrape geospatial data from multiple government and private agencies to inform a
land use model.
Created a dissolved polygon version of the land use model.
Wrote an API and export logic for on-demand and bulk data delivery.
Automated the entire infrastructure into AWS with automated CI/CD, automated processing, and server autoscaling
with RDS, CodePipeline, Beanstalk, EC2, S3, etc.
Crime risk pipeline (Python, Git)
Aggregated incident level crime data from departments across the country to inform a model for crime risk. This
codebase scrapes the department sites and their GIS portals for data and aggregates it into a standard schema.
Taiga Data, Lead Data Scientist/Engineer
Austin, TX (Remote) — November 2021 - November 2023
At Taiga, I lead our data science strategy. We build full-stack, CI/CD data science products. These automated tools are built to help clients understand their CPG/KPI data at each of their physical locations. The output of each pipeline below is staged for consumption by a client-facing front end. Here are a few notable projects (Most in Python, SQL, Bash)
Anomaly detection (Python, SQL, AWS)
Created an automated system to detect stockouts and examine customer behavior when their product of choice is out of stock (i.e. substitute selections)
Time series forecasting (Python, SQL, AWS)
Created an automated forecasting system based on live data for CPG sales, company/store/product KPI’s, customer traffic, etc.
Cadence of the forecasts are configurable and the system is agnostic to the type of time series analysis (i.e. the method of forecasting can be swapped for any other supported method.
Used internal data and external factors like historical/forecasted weather, events, holidays, political/economic trends.
Price Optimization/Elasticity (Python, SQL, AWS)
Created a modeling system for price optimization of any product in inventory with requisite data.
Takes into account promotions/coupons/discounts
Customer Segmentation (Python, SQL, AWS)
Created a clustering model to segment customers based on in-store behavior.
Applied analysis for frequency, baskets, LTV, etc.
Large Language Model Integration (Python, SQL, AWS)
Created a wrapper for large language models so that clients are able to ask plain-word questions to their data. The wrapper returns the code necessary to transform the raw data to get an answer to a prompt
Weather/Ocean/Atmospheric Engine (Python, SQL)
Collected and analyzed atmospheric and oceanographic remote sensing data for use in modeling through automated
pipelines.
eSite Analytics, Data Analyst/Scientist & Software Developer
Charleston, SC — February 2018 - November 2021
At eSite Analytics, I use my backgrounds in data software architecture/development and statistical processes to lead the development and maintenance of data pipelines that create machine learning models for client site selections. My role at eSite has evolved into one that designs and builds automated data processes.Anomaly detection (Python, SQL, AWS)
Site Selection Analysis (R, Python, Alteryx, SQL, Azure)
Created Forest and Regression models with thousands of initial variables to determine the optimal locations for clients to place a new brick and mortar store
ETL Automation and Standardization (R, Python, Alteryx, SQL)
Created a company framework that ingests, processes, and distributes data
Automated data extraction and cleaning from a variety of client-provided sources - FTP, flat files, DBs, API’s, etc.
Accessed remote sensing and government dataset to inform modeling.
Misc Product Development
Weather Forecasting and Severe Weather/Hurricane Tracking
Web Scraping
Traffic analytics
Internal Utilities(Python, Alteryx, Azure)
Created internal utilities that help our data analysts become more efficient by automating redundant tasks. Misc Duties where I wear a lot of hats
Report Design (Alteryx Server, XML, MS Power BI)
Creating Data SOPs
I get pulled in a lot of different directions, building things for every department
OneDataSource, Software Developer
Charleston, SC — December 2015 - February 2018
At OneDataSource, I gather financial, transactional, employee labor, and inventory data in its raw form; then through scripting, reporting tools, and statistics I take this data and transform it into curated reports that contain actionable insights for the client.
ETL (Perl, Python, MySQL, AWS)
Automated extraction of data from multiple sources via REST, SOAP, IMAP, JSON, XML, .xls, .csv, FTP, SFTP, etc. into a single consistent environment which can be queried to engage with front end applications.
These automated processes serve as the backbone of an enterprise business analytics and reporting service for some of the largest privately owned franchisee groups in the country.
Predictive Analytics (Python, Perl, MySQL, AWS)
Created a process to predict time-series metrics for our clients such as sales, speed of service, labor costs, etc.
The process analyzes -- by multiplicative decomposition -- historical data provided by the client, weather history, and event history. It then builds a forecast model that can be dynamically and rapidly applied and displayed from a database table. This project is object oriented.
Data Migration (Perl, MySQL, AWS)
Wrote and edited scripts and database files for migration of our entire business processes from physical servers to cloud servers.
Internal Utilities (Python, Perl, Jaspersoft)
Created internal reports to visually display the status of polling processes for customer service troubleshooting.
Created testing scripts to QA parser scripts, aggregator scripts, databases, and data.
College of Charleston, Astrophysics Researcher
Charleston, SC — August 2013 - May 2015
Researched with several CofC faculty (Dr. Jon Hakkila; Dr. George Chartas) and research teams aiding in the creation of data pipelines and analysis software (written in Python and IDL) for studying phenomena ranging from gamma-ray bursts to the search for intermediate mass black holes. Here I learned a lot about working with raw data from remote sensing satellites.
Scientific Code Development (IDL, Python)
Created routines to statistically analyze time-series gamma-ray burst data
EDUCATION
College of Charleston
Charleston, SC
August 2020
M.S. Data Science and Analytics (3.9 GPA)
Thesis: A Search for Self-Similarities in BATSE Gamma-Ray Burst Emissions Using Agglomerative Clustering
College of Charleston
Charleston, SC
May 2015
B.S. Physics - Focus on Computational Physics
B.S. Astrophysics - Focus on High-Energy Computational Astronomy
Minor: Mathematics/Statistics
Gardner-Webb University
Boiling Springs, NC
B.A. Philosophy and Theology - Focus on Logic
Minor: Spanish May 2011
PUBLICATIONS & PRESENTATIONS
PUBLICATIONS
Hakkila et.al., Smoke and Mirrors: Signal-to-Noise and Time-Reversed Structures in Gamma-Ray Burst Pulse Light Curves, Astrophysical Journal, 2018, https://arxiv.org/abs/1804.10130
Cannon, A Search for Self-Similarities in BATSE Gamma-Ray Burst Emissions Using Agglomerative Clustering, 2020, proquest
PRESENTATIONS & POSTERS
A Structure-Fitting Process for Gamma-Ray Burst Light Curves
A Search for Self Similarities in Batse Gamma-Ray Burst Emissions Using Agglomerative Clustering
A Preliminary Analysis of Complex Gamma-Ray Burst Pulses
Searching for Emission Episode Self Consistency in Gamma-Ray Burst Light Curves
PROJECTS & HOBBIES
YouTube & World Records
I know YouTube is a weird thing to put on a resume, but I have some fun projects on there. Namely, one where I calculated the optimal route through all 50 states with an MCMC technique and then actually drove it. We ended up smashing the world record by over half a day. The channel is called NerdStoke.
Personal Biometric Data Aggregation and Modelling
I have been training for ultra marathons and I figured, why not use my skills in analysis and programming to help me out? I have written several integrations with a large suite of apps and devices that passively track dozens of metrics on my health daily. These metrics include body composition, nutrition, exercise metrics, sleep quality, location and altitude, everything I read, and much more. This data is all collected automatically through data pipelines I set up in my home lab. I then created a convolutional neural network to approximate a solution to an ODE whose parameters Contain information about the health of my cardiovascular system and current recovery. So far it has been a ton of fun seeing exactly what levers I can pull to make the biggest impact on my health.
Science News Website
getscienced.com was a website created to deliver up-to-date content from the most recent academic articles to the non-scientist consumer through a collection of academics translating the scientific jargon into layman’s terms. It was when I created this site that I learned the power of collaboration, marketing, and lead generation. The site’s intellectual property has since been acquired by Pfizer, which last I checked, operates under different branding at getscience.com.
Exoplanet Detection with Bayesian Blocks
I use a technique called Bayesian Blocks to develop a new method for detecting Exoplanets. The method searches a time-series starlight signal and uses Bayesian Blocks to identify a significant change in the star’s light. If several significant changes show results in a Fourier analysis, then a planet may exist.
Lemons Race Car
That is Lemons, not Le Mans. It is a hilarious racing circuit where the maximum that you can spend on a car is $500 (except for safety equipment). We purchased an early 2000’s Toyota Camry for $480 and have been getting it ready to pass the strict safety inspections. Now we just have to figure out what to do with the remaining $20.
HOBBIES
On spare afternoons and weekends, I can normally be found tinkering in my garage on home-made science and technology projects and building out my home lab. To get myself moving, I frequently compete in professional surfing competitions up and down the east coast and gulf. I recently got into competing in ultra running, which is painful. I also enjoy other endurance/actions sports, sailing, and windsurfing.