Home
About
Our Services
- AI Applied
- AI Accelerate
Key Industries
Resources
Labs
Blog
Contact us

r354-screen-shot-2020-03-24-at-42210-pm-15851571968554.jpg

OUR LATEST ARTICLES

The Ideal Phases of Machine Learning Projects

Phase 1: Define ML use case

Identify and define the ML use case and problems to be solved
Define hypothesis
- “Hypothesis” = potential pattern we expect to see in data
Define experiment(s) to validate hypothesis
Identify data source(s)
Agree on metrics to evaluate experiment(s)

Phase 2: Explore data

Describe data
Determine quality and cleanliness
Explore data through queries and visualization
Identify patterns, outliers in data

Phase 3: Select algorithm

Research existing strategies and white papers
Select an algorithm based on hypothesis, type of features, patterns in data
- Classification vs. regression
- Supervised vs. unsupervised learning
- Univariate or multivariate
- Time series
- DNN vs. Non DNN
Assets:
- Algorithm cheat sheet: show algorithms for different use cases
- Document selection and reason
- Link farm to research papers and relevant external resources
Deliverables: document decisions related to algorithms

Phase 4: Do feature engineering

Use domain knowledge to identify features
Transform raw data into features
Craft new features as needed
Remove redundant/duplicate features
Remove highly correlated features
Reduce dimensionality as required
Check for class imbalance
Check for data leakage

Phase 5: Build ML model

Select the dataset for training, and test the set
Write code for experiment
Build a model
Determine duration and the amount of data for the initial experiment
Determine whether the model meets ROI requirements and risk requirements
Tools:TensorFlow, Python libraries
Deliverable: TF code/trained model
Assets:
- Code template for use cases
- Reference architecture for IaaS solution

Phase 6: Iterate to improve model performance

Evaluate the model result
Visualize the model result
Iterate and Improve the result
Assets:
- Troubleshooting guide for performance and testing techniques
- TensorBoard internal asset

Phase 7: Present results, tell a story from the data

Present result: use data + visualization + narrative to tell a story
Tools: Slides, TensorBoard
Deliverables: Results report

Phase 8: Plan for deployment

Make a prediction on production data and build a business case for operationalizing it
Prepare performance and scale requirements for production
Prepare operationalization requirements for training and scoring
Prepare architecture for model training and retraining
Prepare architecture for prediction
Prepare work breakdown structure
Develop proposed timelines for training and retraining the model
Prepare a plan for rollout and the success criteria for increasing traffic.
Tools: DataFlow, BQ, GCS
Deliverables: Architecture Design doc, WBS
Assets:
- Deployment plan
- Testing scripts, guide

Phase 9: Deploy and operationalize the model

Convert the model into an API
Build dataset training and scoring architecture
Consume the model in business application(s)
Build an automated test
Build the feedback loop
Assets:
- Operations Guide
- Configuration scripts for API

Phase 10: Integrate with business, and monitor

Business process reply on the ML model
Data analysis and feedback loop

03.06.2020

The Complete Guide to Bringing AI to Your Organization

GET THE EBOOK ▾

Get notified when we publish a new story.

Our Most Recent Articles

MedGemma: A New Era for Healthcare AI

🧠 Host Your Own AI Model - In-House

How to extract knowledge from documents with Google PaLM 2 LLM

DevFest West Coast 2020

50 AI Secrets: How Every Fortune 50 Company is Using AI Right Now

GET THE REPORT →

OUR LATEST RESOURCES

OUR LATEST ARTICLES

MedGemma: A New Era for Healthcare AI

MedGemma is Google's revolutionary open AI model for healthcare, offering unprecedented control and data sovereignty. Its self-hosting capability ensures privacy and governance, enabling custom fine-tuning with proprietary data. This democratizes advanced medical AI, empowering organizations to revolutionize patient care and research with purpose-built precision.

🧠 Host Your Own AI Model - In-House

In an era dominated by cloud computing, there are still compelling reasons to host AI models on-premises. While cloud-based solutions offer scalability and convenience, certain environments demand more control, reliability, and privacy. Hosting models locally ensures greater data governance, allows compliance with industry or regulatory standards, and enhances security by keeping sensitive information within a closed network. It also becomes essential in situations where internet connectivity is unreliable or unavailable, such as in remote facilities, secure government operations, or offline field deployments. Additionally, on-prem hosting can offer reduced latency, cost predictability, and full control over model execution and updates—making it a critical choice for organizations with strict operational or compliance requirements. This will show you how to run a basic document Q&A offline using: Ollama + local LLM (Gemma3, Mistral, Llama3.3, etc.) LangChain FAISS (vector DB) SentenceTransformers (embeddings) PyPDF (PDF loading)

How to extract knowledge from documents with Google PaLM 2 LLM

PaLM 2 is Google's next generation large language model that builds on Google’s legacy of breakthrough research in machine learning and responsible AI. It excels at advanced reasoning tasks, including code and math, classification and question answering, translation and multilingual proficiency, and natural language generation better than previous state-of-the-art LLMs. It can accomplish these tasks because of the way it was built – bringing together compute-optimal scaling, an improved dataset mixture, and model architecture improvements. This article offers a quick and straightforward method for leveraging the PaLM 2 API to extract knowledge and ask questions from text.

OUR HEADQUARTERS

Headquartered in the heart of Portland, Oregon, our satellite offices span North America, Europe, the Middle East, and Africa.

(503) 922-1164

Our address is
US Custom House
220 NW 8th Ave

“Hypothesis” = potential pattern we expect to see in data

Classification vs. regression

Supervised vs. unsupervised learning

Univariate or multivariate

Time series

DNN vs. Non DNN

Algorithm cheat sheet: show algorithms for different use cases

Document selection and reason

Link farm to research papers and relevant external resources

Code template for use cases

Reference architecture for IaaS solution

Troubleshooting guide for performance and testing techniques

TensorBoard internal asset

Deployment plan

Testing scripts, guide

Operations Guide

Configuration scripts for API