Data Shift/Drift

machine-learning

Data shift is defined as the change of the underlying relationship between input and output data from an ML model. A shift in the distribution of the data requires the model to be retrained.

The feedback loop is one of the solutions to overcome data shift. It detects performance change and retrains the deployed model by newly collected data.

Downside to retraining the model is a potential introduced bias.

Source Correcting Dataset Shift in Machine Learning | Engineering Education (EngEd) Program | Section

Causes of Data drift

Sample selection bias
Change of environments (difference in training/test environments)

Types of Data drift

Covariate shift: A shift of the input variables, where the target variable remains unchanged
Prior probability shift: A shift of the target variable, where the input variable remain unchanged
Concept drift: A change in relationships between the input and output variables in the problem. It’s neither related to the data distribution nor the class distribution.

Correcting Data drift

Dropping biased features
Using Adversarial Search with two competing Agents to “win the game”
Reweighting features

Example: Loan Application Model

— Youtube: ML Drift: Identifying Issues Before You Have a Problem

Fabian Untermoser

Recent Notes

SolarAssets

Tech Stack

Barcamp 2023 Vim Workshop

Introduction to Obsidian

Home Lab