模型性能指标的多维度可视化监控系统
核心监控指标配置
1. 准确率监控
- 指标:准确率、精确率、召回率、F1分数
- 配置示例:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
class ModelMonitor:
def __init__(self):
self.alert_thresholds = {
'accuracy': 0.95,
'precision': 0.90,
'recall': 0.85,
'f1': 0.88
}
def calculate_metrics(self, y_true, y_pred):
return {
'accuracy': accuracy_score(y_true, y_pred),
'precision': precision_score(y_true, y_pred),
'recall': recall_score(y_true, y_pred),
'f1': f1_score(y_true, y_pred)
}
2. 性能指标监控
- 响应时间(P95/P99)
- 每秒处理请求数(QPS)
- 内存使用率
- CPU占用率
- 配置告警:
alerts:
- name: "High Latency Alert"
metric: "response_time_p95"
threshold: 2000
operator: ">"
duration: "5m"
severity: "warning"
- name: "Memory Usage Alert"
metric: "memory_usage_percent"
threshold: 85
operator: ">"
duration: "10m"
severity: "critical"
3. 数据质量监控
- 输入数据分布变化检测
- 特征漂移检测
- 样本分布稳定性
可视化实现:使用Grafana + Prometheus集成,配置Prometheus采集器和Grafana仪表盘。
部署步骤:
- 安装Prometheus和Grafana
- 配置model_monitor.py的指标上报
- 在Grafana中导入预定义仪表盘JSON模板
- 设置告警规则并绑定钉钉/企业微信通知

讨论