机器学习模型性能基线对比监控
在生产环境中,构建有效的模型监控系统需要建立清晰的性能基线并持续跟踪关键指标。以下是一个可复现的监控方案。
基线建立步骤
首先,收集模型在稳定期的性能数据作为基准:
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score, precision_score, recall_score
# 假设已有的稳定期预测结果
baseline_metrics = {
'accuracy': 0.92,
'precision': 0.89,
'recall': 0.91,
'f1_score': 0.90
}
# 计算基线标准差
baseline_std = {
'accuracy_std': 0.02,
'precision_std': 0.03,
'recall_std': 0.025
}
实时监控配置
设置监控指标阈值:
- 准确率下降超过基线±3σ(即±0.06)
- 精度下降超过基线±4σ(即±0.12)
- 召回率下降超过基线±3σ(即±0.075)
告警规则配置
在Prometheus中配置告警规则:
groups:
- name: model_performance
rules:
- alert: ModelAccuracyDrop
expr: abs(model_accuracy - 0.92) > 0.06
for: 5m
labels:
severity: critical
annotations:
summary: "模型准确率下降超过基线"
description: "当前准确率为 {{ $value }},超出阈值0.06"
- alert: ModelPrecisionDrop
expr: abs(model_precision - 0.89) > 0.12
for: 3m
labels:
severity: warning
annotations:
summary: "模型精度异常下降"
description: "当前精度为 {{ $value }},超出阈值0.12"
数据收集与上报
通过Flask接口收集实时预测数据并上报:
from flask import Flask, request
import prometheus_client
app = Flask(__name__)
accuracy_metric = prometheus_client.Gauge('model_accuracy', '模型准确率')
precision_metric = prometheus_client.Gauge('model_precision', '模型精度')
@app.route('/predict', methods=['POST'])
def predict():
# 处理预测逻辑
accuracy = calculate_accuracy(predictions)
precision = calculate_precision(predictions)
# 上报指标
accuracy_metric.set(accuracy)
precision_metric.set(precision)
return {'result': 'success'}
该方案可有效识别模型性能下降并及时告警,确保模型在生产环境的稳定性。

讨论