在多集群环境下部署TensorFlow Serving时,负载分发策略直接影响模型服务的性能和可用性。本文将对比两种主流方案:基于Nginx的静态负载均衡和基于Kubernetes Ingress的动态负载均衡。
方案一:Nginx静态负载均衡 适用于固定集群规模的场景。首先创建Docker容器:
FROM tensorflow/serving:latest
COPY model /models/model
ENV MODEL_NAME=model
EXPOSE 8500 8501
CMD ["tensorflow_model_server", "--model_base_path=/models/model", "--rest_api_port=8500", "--grpc_port=8501"]
然后配置Nginx负载均衡:
upstream tensorflow_servers {
server tf-serving-1:8500;
server tf-serving-2:8500;
server tf-serving-3:8500;
}
server {
listen 80;
location / {
proxy_pass http://tensorflow_servers;
}
}
方案二:Kubernetes Ingress动态负载均衡 通过K8s服务发现自动分发请求。创建服务配置:
apiVersion: v1
kind: Service
metadata:
name: tf-serving-service
spec:
selector:
app: tensorflow-serving
ports:
- port: 8500
targetPort: 8500
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: tf-serving-ingress
spec:
rules:
- host: tf-serving.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: tf-serving-service
port:
number: 8500
性能对比:Nginx方案延迟低但扩展性差,K8s方案动态性强但资源开销大。建议根据业务场景选择。
部署步骤:
- 构建并推送TensorFlow Serving镜像到私有仓库
- 部署多个Serving实例到不同节点
- 配置负载均衡器
- 测试请求分发效果

讨论