Book review: Effective Machine Learning Teams

19 May, 2024

Marginal Gains

General details

Book name: Effective Machine Learning Teams

Authors: David Tan, Ada Leung, David Colls

Publisher: O'Reilly Media, Inc.

Purchase links

Organization

Effective Machine Learning Teams consists of three parts that serve as umbrellas for the various topics covered.

Product

In this part of the book, the authors explain the generic product lifecycle overview, along with ML product discovery tools and techniques. They also cover the work planning (inception phase) activities and their success criteria. Delivery activities and metrics description are also covered in this section.

Engineering

Here, Python dependency management principles and tips are outlined (both application- and OS-level). There is also an example with training and serving an ML model.

Another useful aspect of the Engineering part is the subject of automated testing: its benefits and obstacles, auto-tests strategy and best practices. I really liked the automated testing approach for ML models (including LLM) — this is not something widely covered in other books.

There are also chapters on different types of monitoring in production, IDE configuration tricks, refactoring principles, system health assessment and even MLOps / CD4ML basics.

Teams

This part contains chapters about typical challenges faced by ML teams and what sets apart effective ML teams and/or ML organizations. One of the chapters is dedicated to engineering productivity (flow, cognitive load, etc.)

Target audience

This book is intended mostly for the audience consisting of three categories:

ML engineers
Software developers
Product / project managers

As for me, I am a project manager who previously had some experience managing ML projects (without employing top-notch ML engineering practices, though). Also, I don’t have a hands-on software / ML engineering experience.

My impressions

Judging from the book name, my initial expectation was to read about process / organizational aspect of the ML delivery. As you may have noted, in reality this book is kind of about everything, touching various aspects at different levels of the process. It is structured quite well, with good clarity and readability, just covers a broad range of topics. Still, I was surprised that there was little on the generic ML project development life cycle.

Product management is a relatively new subject to me, so I found most of the chapters useful (I already knew some stuff, but it was a quick and nice refresher). Also, I found the fundamentals of the automated testing very valuable, especially ML automated testing. But dependency management and IDE tricks were a miss for me.

Personally, I skipped the technical “follow along” parts that required some coding or setup of the development environment (this is not my area of specialization). MLOps and CD4ML were new concepts to me, so that was a nice introduction.

Chapters on building effective teams were not eye-opening to me, as I already studied many materials on the subject. The engineering productivity chapter is a good addition, as this concept is seldom covered in the management books (at least from my experience). The effective ML organizations part was also a miss for me, as this stuff is outside my current area of responsibility.

Summary

Effective Machine Learning Teams feels a bit random, as it covers a lot of ground in very different areas. So seasoned professionals that look for a specific narrow knowledge may not find all the chapters 100% useful. It also won’t be a perfect fit for any category of the intended audience (managers / devs / ML engineers).

Still, I would recommend this book to a beginner audience that doesn’t have much experience working on software development or ML projects.

I rated this book on Goodreads as 4/5.