Two Types of Full Stack Machine Learning Engineering
There are two types of Full Stack Machine Learning Engineering in my mind.
Vertical Full Stack ML
The first kind can be viewed as vertical: the end-to-end execution of machine learning from data collection to deployment. Inside Landing AI, after successfully bringing multiple machine learning solutions into production, we distilled our experience and knowledge into a complete methodology of “Machine Learning Lifecycle”. This is the “secret sauce” of how we take our ML projects to production and we constantly improve it and fill in more content with feedbacks coming from more recent successes. There will be an official blog post released soon on this “Machine Learning Lifecycle”, but here is a sneak peek on the main components in the lifecycle:
Define and create a data collection process: build and fine-tune the necessary data collection pipeline. In computer vision, this can go as far as setting up the camera system and lighting to harness the input photos/videos at the desired quality; in a recommender system, this is usually the pipeline that records users’ actions that reflect their interest.
Prepare labeled data: this includes defining labeling schema and labeling data. We typically spend lots of effort here to develop and iterate a labeling book until it becomes non-ambiguous, accurate, and comprehensive. Then we will leverage a group of labelers to help us prepare the labeled data, which becomes the bread and butter for our models.
Model iteration: we launch a few model training jobs and conduct a systemic error analysis on the results during evaluation. We will use the insights from the error analysis to drive further improvement on either the data labeling or the model training. This is an iterative process and we will do a few rounds until achieving the target performance metrics.
Continuous Deployment: we need to deploy models and get them running in production in order to realize their values. In Landing AI, our machine learning models typically run on the edge devices inside factories. We create a CI/CD pipeline and an Over-the-Air solution for deploying new inference code and models to staging and production edge devices.
Real-time Monitoring: once the software is deployed, we need to constantly watch out for any suspecting issues like model performance drifting, high latency, and GPU over-heat. Any of the issues could potentially cause model failure and downtime in production, so we need comprehensive dashboards and alarming systems to detect them as early as possible.
This is the vertical full-stack machine learning. You need to get all the components right in order to run machine learning models in production 24/7. They all work closely together and should not operate at a silo; for example, when you are iterating an object detection model and you find your model’s recall is low, the root cause of that may not be imperfect hyperparameters, but instead, it could be that your imaging solution has a very small depth of focus, so the target object becomes out-of-focus. It will be helpful to have such a full-stack mindset and look at the overall pipeline. As the team grows bigger and bigger, engineers tend to specialize and become accountable to only a part of the pipeline. However, if you only look at your own subject area and ignore other pieces in the whole process, you may easily end up in the local minimum despite lots of efforts on optimization. If you step out and closely inspect every step of the process, you may find the root cause lies at another component in the process. In the example I mentioned above, the right thing to do is to re-design your camera systems instead of launching hundreds of experiments to tune hyperparameters.
Horizontal Full Stack ML
The other kind of Full Stack ML is horizontal and this is relatively less often being discussed: be skillful and knowledgeable at many types of ML space, from supervised learning and meta learning to generative models and graph models, and across multiple domains, from computer vision and NLP to recommender systems and reinforcement learning. As you can see, this perspective of full-stack ML is at an open-space. The idea is to keep open-minded, do not fall into only a small comfort area, and expand your boundary as wide as possible.
One of my mentors from the University of Toronto first introduced me to this idea. During his Ph.D. time, he did not only read papers in his own topic area but also spent time looking at good papers from other spaces. He believed this effectively expand his horizon and often provided him with new techniques and fresh ideas to be tried in his own experiments.
Companies like to hire people with a very specific area of expertise. At a hiring process for a role in NLP, you may find the hiring committee tends to favor candidates with NLP background over others in computer vision or reinforcement learning. This naturally pushes people to become dedicated to just one area and ignore other fields of work. However, this is not healthy for the growth of ML. Each discipline of ML is not fully independent from others and we see lots of breakthroughs happening at the intersections. If all the practitioners only look at their own disciplines, then we would miss such breakthroughs and genuine ideas. At a more practical level, the data in the real-world are flowing freely in a variety of formats: when we look at a video on YouTube, it involves some characters in motions, some conversations, some sentiment, and aesthetic styles. If we only look at one type of data format in the video, we miss all other critical data that would be helpful to analyzes this video. Human intelligence allows us to capture and aggregate all types of data surrounding us and make educated decisions, so should the machine intelligence be.
Inside Landing AI, Andrew always promotes the concept of “life-long learning”. He encourages us to read and learn about all fields of ML. We have this idea in mind when doing reading groups, tech talks, and hackathons. It is important for us to look at the amazing ideas in all fields of ML and constantly think “should I bring this to my work” and look at our problems with an open mind. We are solving real-world problems in a way that no one else has tried before. There’s no existing playbook, there’s no best-practice, so there are no boundaries in terms of how the solution should be built. We shouldn’t limit our methods within a tiny space that we are comfortable and familiar with, but instead become full-stack horizontally to find the optimal methods that could solve the problem the best.