At Landing AI we observed how many projects took an unnecessarily long and painful process to complete. It was due to ambiguous defect definitions or poor labeling quality. In comparison, it will make the life of machine learning engineers much easier, and the whole project lifespan much shorter, by having a dataset with high-quality labels. Therefore, it is very important to invest the time in the project’s early stage to clarify defect definitions and formalize labeling.
We iterated the labeling process described above among our many projects. We formalized the defect definitions and introduced the heuristics from SMEs on how to recognize defects into the defect book. It is an important source of truth to train labelers as well as evaluating model predictions at the model iteration stage.
With defect consensus, we can examine the accuracy and completeness of the defect book. We can identify possible misalignments on the knowledge of defect definitions between labelers and SMEs. In the final labeling step, we have multiple labelers label the same dataset and then only approve images with consistent and unambiguous labels. Once this whole process completes, the data is then ready to be used for model training and evaluation.
If you are interested to learn more, you can read my blog post here: https://landing.ai/data-labeling-of-images-for-supervised-learning/