Saji Thoppil, Founder and CTO, DifiNative Technologies
Saji Thoppil, Founder and CTO, DifiNative Technologies, in an interaction with CIOTechOutlook, explained the role of MLOps in the rapidly evolving field of computer vision. He underscored the modular, distributed, and scalable approaches for developing computer vision and the role of custom pipelines, organized data versioning, and CI/CD implementations in the development and deployment of computer vision models. He emphasized the importance of reproducibility, real-time testing, and resource optimization to reduce complexity for vision workloads in a diverse, distributed environment.
With over three decades of experience in the IT industry, Saji Thoppil is an expert technology leader who has designed and operationalized complex, large-scale distributed systems. He has delivered cloud transformation initiatives, invented service broker architectures, and led the adoption of many advanced technologies such as cloud-native frameworks, AIOps, blockchain, 5G edge computing, and quantum computing.
Machine learning can be utilized in wide range of domains including text processing, data analytics, and many varieties of inference. Computer vision appears to be different from those areas. The volume of visual data is significantly larger, making data management a crucial aspect of the machine learning lifecycle in computer vision. In industrial environments such as factories, each environment has its own challenges. Computer vision solutions usually require a specific, hyper-local, custom approach for accuracy. Adopting a modular approach is important to help with the complexities of computer vision deployment. For instance, identifying several factories with similar characteristics allows the development of a dedicated processing pipeline tailored to their specific requirements. Equally important is a customized approach to data versioning - carefully managing how data is stored, tracked, and maintained for particular use cases.
Another important consideration is decentralization. Given the large volume of image and video data in computer vision applications, it is often not feasible to transmit all raw data to the cloud. Therefore, a robust edge strategy is necessary - one that enables efficient data processing and management at the source. This includes handling data rollovers, curating relevant subsets, and selectively transmitting data to centralized cloud systems. These practices form a crucial component of the MLOps pipeline in computer vision, ensuring both performance and scalability across distributed environments. It is important to keep a record of experiment performances. Historical data on experiments is crucial for the process of refining models, and to ensure the final solution aligns with the required KPIs.
A multistage pipeline is a requirement in the context of computer vision applications, due to the complexity and interdependence of many of the components, models, and data. The programs of these systems are at the core of AI models and thus must be handled with accuracy and flexibility. Computer vision systems change more dynamically and distributed than traditional software systems such as cloud or desktop applications. Programs may vary every day, and models are also potentially trained at different points of the lifecycle. This requires a robust continuous integration and continuous deployment (CI/CD) pipeline that accommodates AI and computer vision workloads. At times, developers are developing on one platform, but then deploying on a lower-end edge device, such as a TPU-based board, and NVIDIA Jetson GPU, or a system requiring an OpenVINO package. This introduces cross-platform challenges that must be addressed carefully. Additionally, testing in a large and distributed setup has to be done cautiously before moving to production.
When developing a large language model, the process usually involves tuning a base model to a usable state and then deploying it with minimal variation. Managing datasets for each model version is an important part of the computer vision pipeline. The datasets need to be tracked properly and associated with each model version. It is important to ensure that the correct version is utilized and deployed in a specific factory or factory site. At times, there are multiple model versions located in various places. Proper tagging and version control helps avoid mismatching and support proper deployment. It is critical that all the experiments that are associated with a specific model version are traceable. When a model failing to detect a specific object, being able to quickly trace back, enables rapid identification of issues and supports rollback and audit processes within the overall system.
When a model is deployed, the AI neural network generates a prediction. This prediction may be passed directly to the next step in the pipeline, or it may be refined later along with other elements to generate the final output. Models can fail and take a different path altogether. In such cases, the business metrics associated with the failure can become crucial. They can point out why the model performed below expectations and what impact it has on business performance.
For instance, if a retail or cosmetic product has to change its packaging design such as label updates, or a different color variation it can cause the model to give the desired results. To address this, it is important to use statistical models to monitor drift. This includes plotting histograms to observe shifts in data and using automated methods to detect changes, such as variations in pixel size or image patterns. These tools help track the extent of drift and ensure corrective measures are taken promptly.
Human-in-the-loop systems become a very critical aspect. It is a labor-intensive process. Although there are many automated labeling and annotation tools available today, human judgment is still necessary. For instance, dealing with false detections, where humans need to verify and correct outputs. Even with the automated labeling and increased use of synthetic data, performance in real-world scenarios is often inadequate. This makes it necessary to have humans manually label certain data and perform periodic checks. In certain instances, it becomes important to go back to the data, review it with newly collected images, and add updated labels and annotations to the existing dataset. Although human involvement in these processes is reduced, it is not always removed, especially for the tasks of labeling and flagging.