Work with stakeholders throughout the organization to identify opportunities for leveraging company data to drive business solutions
Mine and analyze data from client databases to drive optimization and improvement of product development, marketing techniques and business strategies
Build, optimize and maintain machine learning models
Assess “health” of new data sources and soundness of data gathering techniques employed by the client
Process, clean, and verify the integrity of data used for analysis
Perform ad-hoc analysis and present results in a clear manner
Experience in one or more of the following programming languages: Python, R, MATLAB, Julia
Experience in data wrangling of (messy) datasets using pandas or dplyr
Experience in exploratory data analysis
Experience in data visualization using one or more of the following packages/tools:
seaborn, matplotlib, plotly, ggplot, Tableau
Knowledge of machine learning techniques and algorithms such as K-nearest
neighbors, naive bayes, support vector machines, random forest, logistic regression, etc.
Awareness of machine learning concepts such as over-fitting and under-fitting, the difference between bias and variance, generalization capability of the prediction model to unseen data, feature engineering, etc.
Excellent written and verbal communication skills for coordinating across teams
A drive to learn and master new technologies and techniques
Bonus Points
A Masters or PhD degree in a relevant discipline
Data engineering experience; e.g., SQL, Hadoop, Spark, cloud computing
Competitive programming experience (e.g., ACM, Topcoder, Code Forces, etc.)
Experience participating in machine learning competitions (e.g., Kaggle, Hacker Earth, etc.)
Strong statistics background
An up-to-date portfolio (on GitHub?) showing your experience in all of the above!
