Career Profile
Experienced data professional always looking to learn new and creative ways operate. The last couple years have seen a transition from heavy use in python for statistical results work to data engineer. All this to help round-out my knowledge and capabilities.
Experiences
First data science function within our data group to make sense of our unstructured data along with structured data to better support client needs and tap into small, incremental improvements in performance.
- … more to come
Our team is responsible for quantifying the user experience across our self-service product portals. SQL
hiveQL
presto
aws
terraform
git
-
Calculate the user experience on Charters account portals through measuring api responses, page loads, and other custom metrics (terms) depending on the feature we're measuring performance for. These terms are scaled between
0
and1
, weight averaged & summed – giving product developers & owners a concise, relatable number to compare relative to past performance & other features directly. -
Charter captures > 160M events per day, ~140M of them are events relevant to our pipelines, requiring forethought on appropriate index field usage and other efficient data operations: temp-tables, CTEs, aws Athena (presto) or hive (adhoc cluster or standalone cluster).
-
Maintain ETL pipelines that utilize HiveQL run on aws-emr clusters through scheduled aws-coordinators to ensure our pre-aggregations are complete before further aggregations are started.
-
Pipeline aws infrastructure is defined in terraform modules allowing us to only worry about any job-specific alterations while keeping best-practices regarding tagging/permissions/fleet management etc up to date through inheritance of common modules managed by our aws platform team.
-
Templated our pipelines to allow for single-day runs or back filling with addition of one additional parameter. Updates to gitlab CI/CD pipeline, shell script, and the hiveQL were required. This enhancement saves ~ 20 - 40 minutes on average every time a backfill for reprocessing is required.
Time off to focus on career development & gain AWS Solutions Architect certification.
Our team was the analytical and technical support for our client success teams. Our work aims to provide automated, multifaceted campaign results via a multitude of tools include SQL, Spark, Python, and Apache Airflow. python
pyspark
airflow
presto
git
test/holdout audiences
-
Authored pyspark/airflow ETL pipeline to calculate program results, storing results in s3 for nightly ingestion to our snowflake datastore feeding looker dashboards & explores.
-
Technical lead on our AB testing pipeline in object-oriented pyspark and airflow to measure statistical effectiveness of Ibotta platform.
-
Query optimization liaison for our team, consulting with our team on efficient SQL structure and operations to help reduce run time and compute resources.
-
Automated the power analysis to determine optimal control sizes along with stratifying the groups to ensure similar behavior across the treatment & control.
-
Ported repeated data analysis scripts into our internal python package for reproducibility and version control.
-
Addressed continual support requests with adding functionality to our internal python module for rest of the team to use.
I worked to provide a technical background to our data projects with SQL support, advanced tableau, and python on our team within the finance department. Our team operated in close contact with the CFO to be the central analytics hub and to provide advanced data & analytics support to Johns Manville finance. SQL
Tableau
python
requirements-gathering
-
Introduced python to our team with ETL pipelines to feed our MSSQL database.
-
Built a model with keras (tensorflow backend) in python to estimate tax liabilities to bring potential errors to the attention of our tax department thus reducing the possibility of incurring consultant fees.
-
Managed analytics projects to completion supporting commercial sales analytics to internal tax reconciliation
With a team that provided multifaceted analytic support for multiple channels of a large, national-telecom client. We were on top of our descriptive analytics game and worked to apply a more systematic approach to analysis to extract deeper information from our data.
SQL
RStats
Tableau
PowerPivot
requirements-gathering
data-communication
-
Mined our hosted blog and ran sentiment analyses with R to shed light on open-ended participant feedback.
-
Developed a logistic regression model to determine the probability of referrals selling, and experimented with clustering on our program population to delve into promotion results and answer why some groups perform better than others.
-
Integrated R into some of our analyses for both internal and external stakeholder with final reporting is rendered in HTML with the knitr and flexdashboard packages allowing for great deal of interactivity between the stakeholder and the data for those without Tableau licenses.
Our team ran and operated the incentive programs for numerous automobile clients through an in-house developed web app, back-end databases and procedures, reporting, and client support.
-
Learned SQL on the job and ported all of our excel reports to SQL, saving our monday mornings.
-
Support Business Analysts with fielding of participant inquiries in a timely manner.
-
Responsible for development and execution of day-to-day sales reporting distributed to the client.
-
Complete documentation of database and Website procedures.
-
Designed, developed, implemented, and tested new processes and dashboards that improved data analysis and tracking, reduced cost, and improved process effectiveness and accuracy.
-
Directly oversaw10+ of these analysis processes that monitor our performance on compliance contracts valued at millions in quarterly revenue.
-
Consulted with end-users on their requirements to drive my report design, development and testing.
-
Communicated with our pharmaceutical buyers on actionable steps towards maximizing our inventory position whilst minimizing penalties associated with erratic purchasing activity.
-
Conducted mathematical analysis of inventory/service level data to ensure maximum reward payment from our contracted suppliers.
Projects
Some side projects worked on throughout free time
numpy
to predict bike sharing demand.