Phase 1: Define ML use case
-
Identify and define the ML use case and problems to be solved
-
Define hypothesis
-
“Hypothesis” = potential pattern we expect to see in data
-
-
Define experiment(s) to validate hypothesis
-
Identify data source(s)
-
Agree on metrics to evaluate experiment(s)
Phase 2: Explore data
-
Describe data
-
Determine quality and cleanliness
-
Explore data through queries and visualization
-
Identify patterns, outliers in data
Phase 3: Select algorithm
-
Research existing strategies and white papers
-
Select an algorithm based on hypothesis, type of features, patterns in data
-
Classification vs. regression
-
Supervised vs. unsupervised learning
-
Univariate or multivariate
-
Time series
-
DNN vs. Non DNN
-
-
Assets:
-
Algorithm cheat sheet: show algorithms for different use cases
-
Document selection and reason
-
Link farm to research papers and relevant external resources
-
-
Deliverables: document decisions related to algorithms
Phase 4: Do feature engineering
-
Use domain knowledge to identify features
-
Transform raw data into features
-
Craft new features as needed
-
Remove redundant/duplicate features
-
Remove highly correlated features
-
Reduce dimensionality as required
-
Check for class imbalance
-
Check for data leakage
Phase 5: Build ML model
-
Select the dataset for training, and test the set
-
Write code for experiment
-
Build a model
-
Determine duration and the amount of data for the initial experiment
-
Determine whether the model meets ROI requirements and risk requirements
-
Tools:TensorFlow, Python libraries
-
Deliverable: TF code/trained model
-
Assets:
-
Code template for use cases
-
Reference architecture for IaaS solution
-
Phase 6: Iterate to improve model performance
-
Evaluate the model result
-
Visualize the model result
-
Iterate and Improve the result
-
Assets:
-
Troubleshooting guide for performance and testing techniques
-
TensorBoard internal asset
-
Phase 7: Present results, tell a story from the data
-
Present result: use data + visualization + narrative to tell a story
-
Tools: Slides, TensorBoard
-
Deliverables: Results report
Phase 8: Plan for deployment
-
Make a prediction on production data and build a business case for operationalizing it
-
Prepare performance and scale requirements for production
-
Prepare operationalization requirements for training and scoring
-
Prepare architecture for model training and retraining
-
Prepare architecture for prediction
-
Prepare work breakdown structure
-
Develop proposed timelines for training and retraining the model
-
Prepare a plan for rollout and the success criteria for increasing traffic.
-
Tools: DataFlow, BQ, GCS
-
Deliverables: Architecture Design doc, WBS
-
Assets:
-
Deployment plan
-
Testing scripts, guide
-
Phase 9: Deploy and operationalize the model
-
Convert the model into an API
-
Build dataset training and scoring architecture
-
Consume the model in business application(s)
-
Build an automated test
-
Build the feedback loop
-
Assets:
-
Operations Guide
-
Configuration scripts for API
-
Phase 10: Integrate with business, and monitor
-
Business process reply on the ML model
-
Data analysis and feedback loop