In order for a ship to stay on course, a captain must always stand by the helm.
The same rule applies to data models. Inevitably, machine learning (ML) data models will degrade over time. But with proper inspection and monitoring, that drift can be minimized.
For data scientists and ML engineers across New York City, the steps of monitoring and inspecting are unique to each company. At Kensho, Sireesh Gururaja, an ML team lead, uses a “human-in-the-loop” (HITL) solution to correct model drift. Humans evaluate Kensho’s predictive data and provide anecdotal evidence regarding accuracy to Gururaja’s team members, who then re-train the data models if necessary.
To save time and provide better accuracy, Gururaja and Dr. Adrianna Loback, an ML engineer at Cherre, use tools such as SQL, Fluentd, pipeline triggers and Prometheus to automate correction of model drift. Below, they share more tools, processes and advice for keeping predictive data models on course.
Sireesh Gururaja, a team lead of ML operations and internal tools, said there’s no substitute for inspecting data models. At Kensho, a company that leverages data to develop AI, humans evaluate data from the models to pronounce its accuracy. If the data is off, the model is inspected and retrained.
What steps do you take early on in the ML modeling process to prevent model drift?
At Kensho, we like to think about model drift two ways: the changes in the data the model sees over time, and also when the models adapt to new use cases with slightly different domains. Both drifts are important, since model predictions in a finance context are frequently viewed as time series and degrading accuracy over time destroys the utility of those time series.
Models trained on financial news can generalize poorly to SEC filings, or vice versa, because of differences in how those documents use language. Early in the process, we find that there’s no substitute for inspecting your models.
What is your process for monitoring, detecting and correcting for model drift once a model has been deployed?
Many of our machine learning models are deployed as part of human-in-the-loop (HITL) solutions: The model’s predictions serve as a baseline that is then corrected by humans. In those cases, we’re able to leverage the feedback of the people using our models as anecdotal evidence of model performance, in addition to the numerical evaluation on the new data that HITL processes frequently provide.
We’re working on automating the collection and correction of model predictions using our existing app monitoring stack such as Prometheus for metrics. We also use a combination of Fluentd and SQL to store richer information about model predictions before we correct those annotations in our in-house annotation tool, DataQueue.
We find that there’s no substitute for inspecting your models.”
What advice do you have for other data scientists that are looking to better manage model drift?
Drift can be complicated to diagnose and numbers frequently fail to capture the user experience your model will provide. Understanding what failures your users care about and whether your model makes those mistakes can help you set a threshold for when to retrain your models. Periodically reevaluate your models and make sure you have a handle on their behavior on live data.
Model drift can only be acknowledged when the data model is monitored, according to Adrianna Loback, an ML engineer with a Ph.D in computational neuroscience. At Cherre, a real estate data platform, Loback and her team incorporated automated pipelines in their ML data model to monitor and implement methods that correct model drift when detected.
What steps do you take early on in the ML modeling process to prevent model drift?
Data and relationships between independent and target variables can change over time. To ensure robust predictive performance, we incorporate monitoring techniques to correct for model drift as part of the ML lifecycle. In the early design phase of Cherre’s ML modeling process, we design ML process architectures that incorporate automated pipelines to monitor and implement methods that detect and correct for model drift.
What is your process for monitoring, detecting and correcting for model drift once a model has been deployed?
Some common metrics to monitor the validation sets are the macro F1 score and scaled mean absolute percent error (MAPE). If performance drops, the model is retrained to correct for model drift. We may also implement an iterative process involving further intervention, which utilizes the previously trained model’s performance as a baseline to compare interventions iteratively until model drift is corrected. We are working towards developing fully automated ML pipelines that would trigger notifications whenever performance anomalies are detected.
The best approach for counteracting drift will depend on the nature of the data.”
What advice do you have for other data scientists that are looking to better manage model drift?
The best approach for counteracting drift will depend on the nature of the data. I would emphasize the importance of the exploratory data analysis phase. My advice is to build continuous systems around ML models for monitoring and drift correction as part of the standard deployment process.