Phase 1: Define ML use case
Identify and define the ML use case and problems to be solved
Define hypothesis
“Hypothesis” = potential pattern we expect to see in data
Define experiment(s) to validate hypothesis
Identify data source(s)
Agree on metrics to evaluate experiment(s)
Phase 2: Explore data
Describe data
Determine quality and cleanliness
Explore data through queries and visualization
Identify patterns, outliers in data
Phase 3: Select algorithm
Research existing strategies and white papers
Select an algorithm based on hypothesis, type of features, patterns in data
Classification vs. regression
Supervised vs. unsupervised learning
Univariate or multivariate
Time series
DNN vs. Non DNN
Algorithm cheat sheet: show algorithms for different use cases
Document selection and reason
Link farm to research papers and relevant external resources
Deliverables: document decisions related to algorithms
Phase 4: Do feature engineering
Use domain knowledge to identify features
Transform raw data into features
Craft new features as needed
Remove redundant/duplicate features
Remove highly correlated features
Reduce dimensionality as required
Check for class imbalance
Check for data leakage
Phase 5: Build ML model
Select the dataset for training, and test the set
Write code for experiment
Build a model
Determine duration and the amount of data for the initial experiment
Determine whether the model meets ROI requirements and risk requirements
Tools:TensorFlow, Python libraries
Deliverable: TF code/trained model
Code template for use cases
Reference architecture for IaaS solution
Phase 6: Iterate to improve model performance
Evaluate the model result
Visualize the model result
Iterate and Improve the result
Troubleshooting guide for performance and testing techniques
TensorBoard internal asset
Phase 7: Present results, tell a story from the data
Present result: use data + visualization + narrative to tell a story
Tools: Slides, TensorBoard
Deliverables: Results report
Phase 8: Plan for deployment
Make a prediction on production data and build a business case for operationalizing it
Prepare performance and scale requirements for production
Prepare operationalization requirements for training and scoring
Prepare architecture for model training and retraining
Prepare architecture for prediction
Prepare work breakdown structure
Develop proposed timelines for training and retraining the model
Prepare a plan for rollout and the success criteria for increasing traffic.
Tools: DataFlow, BQ, GCS
Deliverables: Architecture Design doc, WBS
Deployment plan
Testing scripts, guide
Phase 9: Deploy and operationalize the model
Convert the model into an API
Build dataset training and scoring architecture
Consume the model in business application(s)
Build an automated test
Build the feedback loop
Operations Guide
Configuration scripts for API
Phase 10: Integrate with business, and monitor
Business process reply on the ML model
Data analysis and feedback loop