TensorFlow Serving服务自动扩缩容配置指南
最近在生产环境部署TensorFlow Serving微服务时踩了不少坑,特此记录一下自动扩缩容的配置方案。
环境准备
我们使用Docker容器化部署,基础镜像为tensorflow/serving:latest。首先创建Docker Compose文件:
version: '3'
services:
tensorflow-serving:
image: tensorflow/serving:latest
container_name: tf-serving
ports:
- "8501:8501"
- "8500:8500"
volumes:
- ./models:/models
environment:
- MODEL_NAME=mnist_model
- MODEL_BASE_PATH=/models
deploy:
replicas: 2
resources:
limits:
memory: 4G
reservations:
memory: 2G
Kubernetes自动扩缩容配置
在K8s环境中,我们通过Horizontal Pod Autoscaler实现自动扩缩容:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: tf-serving-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: tf-serving-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
负载均衡配置
使用Nginx作为负载均衡器:
upstream tf_serving {
server 127.0.0.1:8501 weight=1;
server 127.0.0.1:8502 weight=1;
}
server {
listen 80;
location / {
proxy_pass http://tf_serving;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
关键踩坑点
- 内存限制设置不当:初始配置
memory: 2G导致服务频繁重启 - 负载均衡权重不均:Nginx配置未考虑模型推理时间差异
- 扩缩容阈值不合理:CPU使用率70%太保守,建议85%左右
实际部署时记得先测试单节点性能,再逐步增加副本数。

讨论