August 30, 2024
A guest post from Fabrício Ceolin, DevOps Engineer at Comet. Inspired by the growing demand…
Ensuring Long-Term Performance and Adaptability of Deployed Models
When working on any machine learning problem, data scientists and machine learning engineers usually spend a lot of time on data gathering, efficient data preprocessing, and modeling to build the best model for the use case. Once the best model is identified, it is usually deployed in production to make accurate predictions on real-world data (similar to the one on which the model was trained initially). Ideally, the responsibilities of the ML engineering team should be completed once the model is deployed. But this is only sometimes the case.
In the real world, once the model is deployed, you cannot expect it to perform with the same accuracy, as the data distribution sometimes varies. Factors like changes in user behavior, changing trends, an unseen crisis like COVID-19, etc., can affect the data distribution. This is why you can’t expect model deployment to be a one-time process and move on to another project once the deployment is done.
After the model deployment, you must monitor the model’s performance over a certain period to identify potential issues causing model performance to degrade. Model Drift and Data Drift are two of the main reasons why the ML model’s performance degrades over time. To solve these issues, you must continuously train your model on the new data distribution to keep it up-to-date and accurate. Repeatedly training the model on the new data distribution is called Model Retraining.
In this article, you will learn about the common causes of ML model performance degradation, model monitoring, and model retraining as possible solutions.
Next, let’s check out the most common reasons (Model Drift and Data Drift) for model performance degradation in detail.
Model drift, sometimes called concept drift, refers to the phenomenon where the statistical properties of the target variable or the relationship between input variables and target variable change over time. One of the most common times this issue occurs is Fraud Detection, where the model is trained on historical data from one year, but the fraud patterns change in the following year due to new methods fraudsters adopt. The model may continue to make predictions based on the patterns from the previous year, leading to a decrease in its performance.
It can happen for various reasons, such as changes in user behavior, environment, or the underlying data generation process. There are different types of model drift in machine learning, including:
When trained on a static dataset, the model assumes that the relationship between input and target variables will remain constant in production. However, due to changes in the target distribution, the model does not keep up with the data and fails to generate the correct predictions. This is because the model’s assumptions are no longer valid and cannot adapt to the changing data distribution.
Data drift occurs when the distribution of input data changes over time. One good example of data drift is behavior analysis in online retail applications. An ML model is trained on historical data to predict purchasing behavior, such as the likelihood of a customer making a purchase or the products they are likely interested in. Over time, there can be changes in customer behavior data due to evolving trends, shifts in demographics, or external factors such as economic conditions or cultural influences. This results in a shift in data distribution and leads to data drift.
The main reasons that cause data drift to occur are:
Like concept drift, data drift also causes the model’s performance to degrade over time as the relationships and patterns learned from historical data by models may no longer hold, causing a decrease in prediction accuracy and reliability.
Now you know the major issues for the model’s performance degradation in production over time. But, as a data scientist or an ML engineer, you focus on the solutions rather than problems, right? This problem also has a solution: you must retrain your models after a certain period (e.g., weekly, monthly, quarterly, etc.) to keep them updated on new trends and shifts. Training these models on the new data distribution is called model retraining. This helps the model to learn the new patterns in the data that come up as the data distribution changes.
Usually, model retraining can be performed in two ways: Manual and Automated. The support team monitors the model’s performance and predictions in manual model retraining. Suppose there is some degradation in the model’s performance (compared to a predefined threshold). In that case, they inform the teams responsible for retraining the models on the newly collected data (with different distributions). Automated model retraining is more advanced but easy in terms of identifying performance degradation and retraining. This approach integrates various MLOps tools and services into the production environment to monitor the model’s performance. If performance falls below a predefined threshold, these tools automatically start the retraining of the model on the new data distribution. Some popular tools are Comet ML, Neptune, Weights & Biases, etc.
Model retraining can create confusion as we think of two different sides. One is training the same model on the new data distribution, and the other is using a different set of features or a new ML algorithm for the same features. Usually, when an ML solution is deployed to production, feature engineering, model selection, and error calculation are done rigorously, which gives the best model for the use case. This is why, when retraining the model, you don’t need to perform all these stages again. Instead, use the new data distribution and train the existing model on that. Changing the model or features will result in an entirely new solution, which is out of the scope of retraining.
Let’s check out why model retraining is widely adopted across various organizations.
Now that you know how important it is to retrain models after a specific period, let’s discuss some of the best practices you must follow to make your ML solution more reliable and trusted.
You cannot consider model training and deployment a one-time process. After deployment, your work is not done. You must monitor the ML system continuously to check for defects, issues, or errors. If the model fails to perform efficiently, it can be easily identified during monitoring. You also need to monitor the data to check if it is consistent and does not contain any errors. Watching these two things can easily indicate when to retrain the model. You must also decide the performance metric/metrics that will best suit your use case.
The first question that comes to the mind of developers is when should I retrain the ML model? and is there a specific trigger? There are different approaches to selecting the proper training schedule:
The relevance of the data is of utmost importance, as it should reflect the problem domain and encompass a diverse range of examples. Additionally, ensuring data quality is essential to avoid noise, errors, or biases that can adversely affect the model’s performance. When retraining the model, you must ensure that you have enough data with almost equal samples belonging to different classes. Also, you need to know how often your data will change. Most importantly, use the data from the same population on which the model was trained initially.
Automated retraining techniques automate the retraining models, reducing manual effort, and streamlining the workflow. This approach uses tools like Comet, Neptune, MLFlow, etc., to monitor the entire ML system and retrain the models. This enables the seamless integration of new data and automatically triggers retraining based on predefined schedules or triggers. This ensures that models are regularly updated with fresh data, allowing them to adapt to changing patterns and trends in the data. Using tools can make the process more efficient, reduce time complexity, and reduce the chances of human error.
Model monitoring is the most prominent component of the entire ML lifecycle as it informs you of how your model performs in production and when it will need retraining. Comet is one of the most popular model monitoring tools with ML experimentation, version control, and collaboration capabilities. Comet provides real-time monitoring by capturing and logging predictions, performance metrics, and other relevant information during the model’s runtime. This enables continuous tracking of the model’s behavior and performance on live data.
One of the critical advantages of Comet is its ability to set up alerts and anomaly detection mechanisms. By defining thresholds or using anomaly detection algorithms, you can receive notifications when the model’s predictions deviate from the expected range. Comet’s centralized dashboard provides comprehensive visualizations of performance metrics, allowing you to monitor critical indicators such as accuracy, precision, recall, or custom-defined metrics specific to your use case. It can also track data drift, allowing you to take proactive measures, such as triggering retraining or fine-tuning, to ensure the model remains accurate and effective.
With Comet, you can also set up different alerts that notify you when something goes wrong. For example, when working on a classification use case, you can define a condition based on accuracy or an F1 score. You will receive a notification when the model falls below the specified values.
Comet also promotes collaboration and knowledge sharing among the team, facilitating a collective effort to monitor and maintain the deployed models. Most prominently, Comet can integrate seamlessly with existing infrastructure and monitoring systems, ensuring compatibility and ease of integration within your production environment.
It’s easy to monitor models in production using Comet’s features, and model retraining becomes easy. You can read more about Comet for model monitoring here.
After reading this article, you know that regularly retraining models is crucial for maintaining accuracy, adapting to changing data patterns, and addressing the risk of model decay over time. By retraining models, organizations can harness the power of new data and improve model performance, leading to better predictions and decision-making in real-world scenarios. Also, you are now aware that model monitoring in production is the key to retraining the models.
While multiple MLOps solutions are available in the market, Comet is one of the best, with features like tracking and monitoring model performance, comparing different experiments, detecting anomalies, alerting and incident response, and responding quickly to issues. By embracing the importance of model retraining in production, combined with the power of Comet, organizations can stay at the forefront of machine learning advancements and deliver robust and dependable models for real-world applications.
If you have any questions, you can connect with me on LinkedIn or Twitter; thanks.