机器学习模型输入特征缺失率监控系统
在生产环境中,模型输入数据的完整性直接影响预测质量。本文构建一个基于Prometheus和Grafana的特征缺失率监控系统。
核心监控指标
# 特征缺失率指标定义
feature_missing_rate{feature_name="age", model_version="v1.2"} 0.025
feature_missing_rate{feature_name="income", model_version="v1.2"} 0.008
# 计算公式
missing_count = sum(feature_is_null{feature_name="age"})
total_count = count(feature_value{feature_name="age"})
missing_rate = missing_count / total_count
监控实现步骤
- 数据采集:在模型推理入口添加特征验证逻辑
import prometheus_client
from prometheus_client import Gauge, Counter
# 初始化指标
FEATURE_MISSING_RATE = Gauge('feature_missing_rate', 'Missing rate of features', ['feature_name', 'model_version'])
MISSING_COUNT = Counter('feature_missing_count', 'Count of missing values', ['feature_name', 'model_version'])
# 特征验证函数
@app.route('/predict', methods=['POST'])
def predict():
data = request.json
for feature in ['age', 'income', 'gender']:
if data.get(feature) is None:
MISSING_COUNT.labels(feature_name=feature, model_version='v1.2').inc()
# 计算缺失率并上报
- 告警配置:在Prometheus规则文件中添加告警规则
# prometheus.rules.yml
groups:
- name: feature_missing_alerts
rules:
- alert: HighFeatureMissingRate
expr: feature_missing_rate > 0.05
for: 5m
labels:
severity: critical
category: data_quality
annotations:
summary: "High missing rate detected for {{ $labels.feature_name }}"
description: "Feature {{ $labels.feature_name }} missing rate is {{ $value }} which exceeds threshold of 5%"
- 可视化配置:在Grafana中创建仪表板,包含实时缺失率趋势图和告警状态面板
告警阈值建议
- 低风险:缺失率 < 1%
- 中风险:1% ≤ 缺失率 < 5%
- 高风险:缺失率 ≥ 5%
该系统可有效识别数据质量问题,为模型维护提供数据支撑。

讨论