微服务环境下大模型服务监控

微服务环境下大模型服务监控踩坑记录

最近在参与一个大模型微服务化改造项目时，遇到了一个典型的监控问题。我们的大模型服务拆分成多个微服务后，发现调用链路变得异常复杂。

问题背景

在使用Spring Cloud Gateway进行服务治理时，原本简单的模型推理请求变成了多级调用：API网关 → 模型服务A → 模型服务B → 数据库。当模型推理出现延迟时，很难快速定位是哪个环节出了问题。

我的踩坑过程

最初尝试使用Prometheus + Grafana方案，但发现数据采集存在以下问题：

无法准确获取每个微服务的调用耗时
缺乏模型推理过程中的关键指标监控
服务间链路追踪不完整

解决方案

经过多次调试，最终采用以下方案：

# application.yml配置
management:
  endpoints:
    web:
      exposure:
        include: prometheus,health,info
  metrics:
    web:
      server:
        request:
          autotime:
            enabled: true

配合OpenTelemetry进行链路追踪，使用自定义的监控指标来记录模型推理时间：

@MicrometerTimer(name = "model.inference.duration", description = "模型推理耗时")
public ModelResponse predict(ModelRequest request) {
    long startTime = System.currentTimeMillis();
    try {
        return modelService.predict(request);
    } finally {
        long duration = System.currentTimeMillis() - startTime;
        // 记录指标
        meterRegistry.timer("model.inference.duration").record(duration, TimeUnit.MILLISECONDS);
    }
}