The Ideal Phases of Machine Learning Projects

The Ideal Phases of Machine Learning Projects

Phase 1: Define ML use case

  • Identify and define the ML use case and problems to be solved
  • Define hypothesis
    • “Hypothesis” = potential pattern we expect to see in data
  • Define experiment(s) to validate hypothesis
  • Identify data source(s)
  • Agree on metrics to evaluate experiment(s)


Phase 2: Explore data

  • Describe data
  • Determine quality and cleanliness
  • Explore data through queries and visualization
  • Identify patterns, outliers in data


Phase 3: Select algorithm

  • Research existing strategies and white papers
  • Select an algorithm based on hypothesis, type of features, patterns in data
    • Classification vs. regression
    • Supervised vs. unsupervised learning
    • Univariate or multivariate 
    • Time series
    • DNN vs. Non DNN
  • Assets: 
    • Algorithm cheat sheet: show algorithms for different use cases
    • Document selection and reason
    • Link farm to research papers and relevant external resources
  • Deliverables: document decisions related to algorithms 


Phase 4: Do feature engineering

  • Use domain knowledge to identify features
  • Transform raw data into features
  • Craft new features as needed
  • Remove redundant/duplicate features
  • Remove highly correlated features
  • Reduce dimensionality as required
  • Check for class imbalance
  • Check for data leakage


Phase 5: Build ML model

  • Select the dataset for training, and test the set
  • Write code for experiment
  • Build a model
  • Determine duration and the amount of data for the initial experiment
  • Determine whether the model meets ROI requirements and risk requirements
  • Tools:TensorFlow, Python libraries
  • Deliverable: TF code/trained model 
  • Assets:
    • Code template for use cases
    • Reference architecture for IaaS solution


Phase 6: Iterate to improve model performance

  • Evaluate the model result
  • Visualize the model result
  • Iterate and Improve the result
  • Assets:
    • Troubleshooting guide for performance and testing techniques
    • TensorBoard internal asset


Phase 7: Present results, tell a story from the data

  • Present result: use data + visualization + narrative to tell a story
  • Tools: Slides, TensorBoard
  • Deliverables: Results report


Phase 8: Plan for deployment

  • Make a prediction on production data and build a business case for operationalizing it
  • Prepare performance and scale requirements for production
  • Prepare operationalization requirements for training and scoring
  • Prepare architecture for model training and retraining
  • Prepare architecture for prediction
  • Prepare work breakdown structure
  • Develop proposed timelines for training and retraining the model
  • Prepare a plan for rollout and the success criteria for increasing traffic.
  • Tools: DataFlow, BQ, GCS
  • Deliverables: Architecture Design doc, WBS
  • Assets:
    • Deployment plan
    • Testing scripts, guide


Phase 9: Deploy and operationalize the model

  • Convert the model into an API
  • Build dataset training and scoring architecture
  • Consume the model in business application(s)
  • Build an automated test
  • Build the feedback loop
  • Assets:
    • Operations Guide
    • Configuration scripts for API


Phase 10: Integrate with business, and monitor

  • Business process reply on the ML model
  • Data analysis and feedback loop


50 AI Secrets: How Every Fortune 50 Company is Using AI Right Now

Get notified when we publish a new story.

Our Most Recent Articles

Tutorial: Building Your First Kubeflow Pipelines Workflow (Part 2)
Data science workflows on Kubernetes with Kubeflow Pipelines (Part 1)
A Tale of Two Companies
The Ideal Phases of Machine Learning Projects