K8s第十篇Scheduler调度器

Kubernetes Scheduler:

Kubernetes Scheduler的作用就是根据特定的调度算法将Pod调度到指定的工作节点上,这一过程也叫绑定(bind),Scheduler的输入为需要调度的Pod和可以被调度的节点(Node)信息,输出为调度算法选择的Node。

Scheduler调度过程可分为两个阶段:
  • 预选:根据配置的Predicates Policies(默认为Default Provide中定义的default predicates policies集合)过滤掉那些不满足这些Policies的Nodes,身下的Nodes就作为优选的输入。
  • 优选:根据配置的Priorities Polices(默认为Defaule priorities policies集合)给预选后的Nodes进行打分排名,得分最高的Nodes即作为最合适的Node,该Pod就Bind到这个Node。
特殊调度方式:
  • nodeName:根据节点名指定Pod运行再哪个节点上。
  • nodeSelector:根据节点的标签指定Pod运行再哪个节点上。
  • 倾向性
  • 污点容忍。
常见的预选方式:
  • CheckNodeCondition:检查是否再节点磁盘不可用,没有准备好的情况下将Pod调度到节点上。
  • GeneralPredicates
    • HostName:如果Pod定义了hostName属性,则检查节点的名字是否与hostName匹配。
    • PodFitsHostPorts:Pod定义的hostPort属性可以适配节点的端口。
    • MatchNodeSelector:Pod定义的nodeSelector属性可以与节点的标签适配。
    • PodFitsResources:检查节点是否有足够的资源来运行Pod。
  • NoDiskConlict:检查节点是否满足Pod存储卷的要求(默认不启用)。
  • PodToleratesNodeTaints:检查Pod上的tolerations(容忍的污点)属性是否适配节点上的污点。
  • PodToleratesNodeNoExcuteTaints:如果Pod在绑定节点之后,节点上出现Pod不能容忍的污点,则驱离Pod(默认不启用)。
  • CheckNodelabelPresence:检查节点上标签的存在性(默认不启用)。
  • CheckServiceAffinity:根据当前Pod对象所属的Service已有的其他Pod对象绑定的节点来进行绑定(为了让Service上的Pod都运行在相同的节点上,默认不启用)。
  • CheckVolumeBinding:检查节点的存储卷是否被别的Pod所绑定。
  • CheckNodeMemoryPressure:检查节点内存资源是否存在处于压力过大的状态。
  • CheckNodePIDPressure:检查节点进程资源是否存在压力过大的状态。
  • CheckNodeDiskPressure:检查节点硬盘资源是否存在压力过大的状态。
  • MatchInterPodAffinity:检查节点是适配于Pod的倾合性和反倾合向的要求。
常见的优选方式:
  • LeastRequested:根据节点空闲资源量与节点总资源量的比例来评估,比例越小的胜出。
    • CPU:总容量减去已经占用的容量之和乘以10,再除总容量(capacity-sum(requested)* 10/capaity)。
    • Memory:总容量减去已经占用的容量之和乘以10,再除总容量,再除2(capacity-sum(requested)* 10/capaity)/2。
  • BalancedResourceAllocation:均衡资源分配方式,根据CPU和内存资源使用率相近诚度来分配资源。
  • NodePreferAvoidPods:根据节点的注解信息来判断。
  • TaintToleration:基于Pod资源对节点污点容忍程度判断。
  • SelectorSpreading:与当前Pod对象同属的标签选择器,选择适配的其他Pod对象节点越多的得分越低。
  • InterPodAffinity:遍历Pod对象的倾向条目,匹配项越多的,得分越高。
  • MostRequested:与LeastRequested相反,根据节点空闲资源量与节点总资源量的比例来评估,比例越大的胜出。
  • NodeLabel:根据节点是否适配标签来评分,不存在则不得分。
  • ImageLocality:根据节点是否存在资源定义的image属性和已有镜像的体积来进行评分。
nodeSelector(节点标签):
#如果Pod的nodeSelector属性没有适配的节点,则会一直处于Pending状态。
#直到集群中有nodeSelector属性适配的节点,Pod才能被创建。
[root@k8smaster ~]# cd /data/
[root@k8smaster data]# mkdir scheduler
[root@k8smaster data]# cd scheduler/
[root@k8smaster scheduler]# vim pod-demo.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-demo
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
  nodeSelector:
    disktype: ssd
[root@k8smaster scheduler]# kubectl apply -f pod-demo.yaml 
pod/pod-demo created
[root@k8smaster scheduler]# kubectl get pods 
NAME          READY   STATUS    RESTARTS   AGE
pod-demo      0/1     Pending   0          99s
pod-sa-demo   1/1     Running   7          12d
pod1          1/1     Running   3          6d
[root@k8smaster scheduler]# kubectl get nodes --show-labels
NAME        STATUS   ROLES    AGE   VERSION   LABELS
k8smaster   Ready    master   76d   v1.16.2   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8smaster,kubernetes.io/os=linux,node-role.kubernetes.io/master=
k8snode1    Ready    <none>   76d   v1.16.2   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8snode1,kubernetes.io/os=linux
k8snode2    Ready    <none>   76d   v1.16.2   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8snode2,kubernetes.io/os=linux
[root@k8smaster scheduler]# kubectl label node k8snode1 disktype=ssd
node/k8snode1 labeled
[root@k8smaster scheduler]# kubectl get pods -o wide
NAME          READY   STATUS    RESTARTS   AGE     IP            NODE       NOMINATED NODE   READINESS GATES
pod-demo      1/1     Running   0          5m48s   10.244.1.20   k8snode1   <none>           <none>
pod-sa-demo   1/1     Running   7          12d     10.244.1.16   k8snode1   <none>           <none>
pod1          1/1     Running   3          6d      10.244.1.17   k8snode1   <none>           <none>
nodeAffinity(节点倾合性):
  • preferredDuringSchedulingIgnoredDuringExecution:软倾合性,尽量满足。
  • requiredDuringSchedulingIgnoredDuringExecution:硬倾合性,必须满足。

[root@k8smaster scheduler]# vim pod-affinity-demo.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: pod-node-affinity-demo
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: zone
            operator: In
            values:
            - foo
            - bar 
[root@k8smaster scheduler]# kubectl apply -f pod-affinity-demo.yaml 
pod/pod-node-affinity-demo created
#集群中的没有节点满足requiredDuringSchedulingIgnoredDuringExecution硬倾合性要求,所以pod处于Pending状态。
[root@k8smaster scheduler]# kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
pod-demo                 1/1     Running   1          24h
pod-node-affinity-demo   0/1     Pending   0          75s
pod-sa-demo              1/1     Running   8          13d
pod1                     1/1     Running   4          7d1h

[root@k8smaster scheduler]# vim pod-affinity-demo-2.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: pod-node-affinity-demo-2
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: zone
            operator: In
            values:
            - foo
            - bar 
        weight: 60
[root@k8smaster scheduler]# kubectl apply -f pod-affinity-demo-2.yaml 
pod/pod-node-affinity-demo-2 created
#集群中的没有节点满足preferredDuringSchedulingIgnoredDuringExecution软倾合性要求,所以pod能够处于Running状态。
[root@k8smaster scheduler]# kubectl get pods
NAME                       READY   STATUS    RESTARTS   AGE
pod-demo                   1/1     Running   1          24h
pod-node-affinity-demo     0/1     Pending   0          8m51s
pod-node-affinity-demo-2   1/1     Running   0          6s
pod-sa-demo                1/1     Running   8          13d
pod1                       1/1     Running   4          7d1h
[root@k8smaster scheduler]# kubectl delete -f .
pod "pod-node-affinity-demo-2" deleted
pod "pod-node-affinity-demo" deleted
pod "pod-demo" deleted
podAffinity(pod倾合性):
#将pod-second调度到与pod-first倾合的节点之上,这样两个Pod无论如何都能运行在同一个节点上。
[root@k8smaster scheduler]# vim pod-affinity-demo-3.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: pod-first
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-second
  labels:
    app: backend
    tier: db
spec:
  containers:
  - name: busybox
    image: busybox:latest
    imagePullPolicy: IfNotPresent
    command: ["sh","-c","sleep 3600"]
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - {key: app, operator: In, values: ["myapp"]}
        topologyKey: kubernetes.io/hostname
[root@k8smaster scheduler]# kubectl apply -f pod-affinity-demo-3.yaml 
pod/pod-first created
pod/pod-second created
#发现pod-first与pod-second都运行在node2上。
[root@k8smaster scheduler]# kubectl get pods -o wide
NAME          READY   STATUS    RESTARTS   AGE    IP            NODE           NOMINATED NODE   READINESS GATES
pod-first     1/1     Running   0          102s   10.244.2.16   k8snode2   <none>           <none>
pod-sa-demo   1/1     Running   8          13d    10.244.1.26   k8snode1   <none>           <none>
pod-second    1/1     Running   0          102s   10.244.2.17   k8snode2   <none>           <none>
pod1          1/1     Running   4          7d2h   10.244.1.24   k8snode1   <none>           <none>
[root@k8smaster scheduler]# kubectl delete -f pod-affinity-demo-3.yaml 
pod "pod-first" deleted
pod "pod-second" deleted
podAntiAffinity:(pod反倾合性)
#将pod-second不要调度到与pod-first倾合的节点之上,这样两个Pod无论如何都不能运行在同一个节点上。
[root@k8smaster scheduler]# vim pod-affinity-demo-4.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: pod-first
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-second
  labels:
    app: backend
    tier: db
spec:
  containers:
  - name: busybox
    image: busybox:latest
    imagePullPolicy: IfNotPresent
    command: ["sh","-c","sleep 3600"]
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - {key: app, operator: In, values: ["myapp"]}
        topologyKey: kubernetes.io/hostname
[root@k8smaster scheduler]# kubectl apply -f pod-affinity-demo-4.yaml 
pod/pod-first created
pod/pod-second created
#发现pod-first运行在node2上,而pod-second则运行在node1上。
[root@k8smaster scheduler]# kubectl get pods -o wide
NAME          READY   STATUS    RESTARTS   AGE    IP            NODE       NOMINATED NODE   READINESS GATES
pod-first     1/1     Running   0          7s     10.244.2.18   k8snode2   <none>           <none>
pod-sa-demo   1/1     Running   8          13d    10.244.1.26   k8snode1   <none>           <none>
pod-second    0/1     Running  0           7s     10.244.1.25   k8snode1   <none>           <none>
pod1          1/1     Running   4          7d2h   10.244.1.24   k8snode1   <none>           <none>
taint的effect定义对Pod的排斥等级:
  • Noschedule:仅影响调度过程,对现存的Pod对象不产生影响。
  • NoExecute:即影响调度过程,也影响现存的Pod对象,不容忍的Pod对象将被驱逐节点。
  • PreferNoschedule:软排斥,如果Pod实在没有地方可调度,则不会驱逐Pod对象。
给指定的节点定义污点的容忍等级:
[root@k8smaster scheduler]# kubectl get nodes
NAME        STATUS   ROLES    AGE   VERSION
k8smaster   Ready    master   77d   v1.16.2
k8snode1    Ready    <none>   77d   v1.16.2
k8snode2    Ready    <none>   77d   v1.16.2
[root@k8smaster scheduler]# kubectl taint node k8snode1 node-type=production:NoSchedule
node/k8snode1 tainted
[root@k8smaster scheduler]# kubectl taint node k8snode2 node-type=production:NoSchedule
node/k8snode2 tainted
[root@k8smaster data]# cd scheduler/
[root@k8smaster scheduler]# vim myapp-dep.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-deploy
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      release: canary
  template:
    metadata:
      labels:
        app: myapp
        release: canary
    spec:
      containers:
      - name: myapp
        image: ikubernetes/myapp:v1
        ports:
        - name: http
          containerPort: 80
[root@k8smaster scheduler]# kubectl apply -f myapp-dep.yaml 
deployment.apps/myapp-deploy created
#由于我们的node1和node2存在污点,所以Pod无法被调度。
[root@k8smaster scheduler]# kubectl get pods
NAME                            READY   STATUS    RESTARTS   AGE
myapp-deploy-7d77647d96-5c282   0/1     Pending   0          4s
myapp-deploy-7d77647d96-7cdbv   0/1     Pending   0          4s
myapp-deploy-7d77647d96-nkj7k   0/1     Pending   0          4s
pod-sa-demo                     1/1     Running   8          13d
pod1                            1/1     Running   4          7d3h
定义Pod的污点容忍:
[root@k8smaster scheduler]# vim myapp-dep.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-deploy
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      release: canary
  template:
    metadata:
      labels:
        app: myapp
        release: canary
    spec:
      containers:
      - name: myapp
        image: ikubernetes/myapp:v1
        ports:
        - name: http
          containerPort: 80
      tolerations:
      - key: "node-type"        #能容忍的污点key。
        operator: "Equal"       #Equal等于表示key=value,Exists不等于,表示当值不等于下面value正常。
        value: "production"     #值。
        effect: "NoSchedule"    #effect策略。
[root@k8smaster scheduler]# kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE    IP            NODE       NOMINATED NODE   READINESS GATES
myapp-deploy-76b466f676-5q85w   1/1     Running   0          92s    10.244.1.27   k8snode1   <none>           <none>
myapp-deploy-76b466f676-9jx6w   1/1     Running   0          97s    10.244.2.21   k8snode2   <none>           <none>
myapp-deploy-76b466f676-j47ps   1/1     Running   0          91s    10.244.2.22   k8snode2   <none>           <none>
pod-sa-demo                     1/1     Running   8          13d    10.244.1.26   k8snode1   <none>           <none>
pod1                            1/1     Running   4          7d3h   10.244.1.24   k8snode1   <none>           <none>
删除污点:
[root@k8smaster data]# kubectl taint node k8snode1 node-type=production:NoSchedule-
node/k8snode1 untainted
[root@k8smaster data]# kubectl taint node k8snode2 node-type=production:NoSchedule-
node/k8snode2 untainted

版权声明:本文为weixin_43803147原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。