K8s集群节点资源分配优化

在Kubernetes集群运维过程中，节点资源分配优化是提升集群效率和稳定性的重要环节。本文将分享我们在实际运维中遇到的资源分配问题及优化方案。

问题背景

我们观察到集群中部分节点频繁出现Pod驱逐现象，通过kubectl describe node命令发现节点内存使用率长期维持在90%以上。进一步排查发现，是由于默认的资源请求和限制设置不合理导致的。

优化方案

1. 资源请求与限制调整

首先，我们需要为工作负载设置合理的requests和limits参数。以Deployment为例：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.19
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"

2. 节点污点与容忍配置

针对特定应用，我们通过节点污点和Pod容忍度来实现资源隔离：

# 给节点添加污点
kubectl taint nodes node01 dedicated=prod:NoSchedule

# Pod配置容忍度
spec:
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "prod"
    effect: "NoSchedule"

3. 资源配额管理

通过ResourceQuota控制命名空间资源使用：

apiVersion: v1
kind: ResourceQuota
metadata:
  name: prod-quota
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi

实施效果

通过上述优化措施，集群节点资源利用率从原来的85%降至70%，Pod驱逐率降低90%，整体稳定性显著提升。

建议定期使用kubectl top nodes和kubectl top pods监控资源使用情况，并根据业务变化动态调整资源配置。