4 Takeaways from ‘How Google Does Machine Learning’ course

dekoupled

--

Today, Machine Learning (ML) technology is simplified and abstracted to an API call so you can solve a data-intensive pattern matching problem easily. Google’s Move Mirror is a great example.

While creating a standalone ML-based consumer app is reasonably straightforward, it can be quite challenging to infuse ML at scale into a mission-critical enterprise-class cloud platform. Enterprise apps have to consider various steps in the machine learning life cycle including data cleansing, integration, and production deployment. Operationalizing ML is a topic by itself and I’ll share more on that in a future post.

When I researched about operationalizing ML, I found an online course titled How Google Does Machine Learning. Intrigued by the title, I took the course to understand how leveraging ML in consumer web apps compares to enterprise apps. The course gave me fresh perspectives on ML and validated several of my views on bringing ML to life.

In this post, I’ll share four key learning from the course.

1. Machine Learning (ML) effort allocation — Expectation vs. Reality

From a business process automation project standpoint, Machine Learning component is just one small piece of the total effort. The diagram below compares the expected effort with reality across various steps in making the project successful and this fairly matches with my recent experience.

Source: How Google does Machine Learning, Coursera

My takeaway is ML model generation and parameter tuning are important but it can’t be the only focus of an ML project. Data gathering and cleansing, integrating ML predictions and feedback loop, and the infrastructure for deploying ML models in production are some of the critical steps that consume more effort.

Check Google’s blog post on ML surprise for more on this point.

Product managers leading the ML project should consider above effort allocation in prioritization and planning in order to deliver the value of ML to end customer.

2. ML Strategy is first and foremost a data strategy

This seems obvious at first. But Google suggests big data and analytics as a pre-requisite for doing any machine learning. In other words, ML is used to automate the insight generation process (of analytics) at scale with a learning feedback loop. The bigger point is if you don’t have the data to run analytics then you can’t do ML.

Simple ML and more data are better than fancy ML and small data. Quality and quantity of data should take precedence over fancy models and algorithms.

Next is support for batch and streaming data. Unless the same ML system runs both the batch data and streaming data the training-serving skew will become an issue. This could be due to data scientists and production team using a different set of coding tools. Apache Beam or Google DataFlow solves this by processing batch and streaming data pipelines in the same way.

3. Multiple models for a business problem

For models to be effective in prediction, it is important to break the problem into smaller tasks and apply the specific model to each task as opposed to using a single monolithic model.

For example, forecasting stock-outs can be divided into predicting product demand, predicting inventory, and predicting restocking time and each one can use a different model. Also, the model could vary by product category as forecasting logic is different for an electronic and a kitchen item.

On a related note, Google has over 4,000 ML models in production that are transforming Google to become an AI-first company.

4. Equality of opportunity

This important ML concept gives individuals an equal chance of desired outcome i.e. when you move a sample from one sub-group to another the outcome remains the same. Depending on the situation the model can minimize or maximize false positive rate (or type I error) or false negative rate (or type II error). The trade-off between Precision and Recall will improve inclusiveness.

To ensure equality of opportunity data should be fair and representative as bias in the data leads to bias in the model.

Data problems like unbalanced distribution, unexpected feature values, missing values, and distribution skew between data sets (i.e., train and test set) can bias the model. Facets, an open-source visualization tool, helps to understand the data, discover anomalies, and help create inclusive ML models.

ML in action

Earlier, I wrote that ML technology is simplified for easier consumption and the course labs prove that statement. In the course lab exercise, I got my hands dirty with Python, DataLab and BigQuery columnar database running on Google Cloud Platform using VMs and server-less managed services.

Running an analysis on a large dataset with 70M rows using BigQuery in less than three seconds is impressive. Finally, the pre-trained models for vision, video, speech, and Natural Language Processing (NLP) are easily accessible via APIs so you don’t have write models for common use cases from scratch.

That wraps my key learning and takeaways from the Google Machine Learning introduction course.

--

--