Kubernetes Scheduler:
Kubernetes Scheduler的作用就是根据特定的调度算法将Pod调度到指定的工作节点上,这一过程也叫绑定(bind),Scheduler的输入为需要调度的Pod和可以被调度的节点(Node)信息,输出为调度算法选择的Node。
Scheduler调度过程可分为两个阶段:
- 预选:根据配置的Predicates Policies(默认为Default Provide中定义的default predicates policies集合)过滤掉那些不满足这些Policies的Nodes,身下的Nodes就作为优选的输入。
- 优选:根据配置的Priorities Polices(默认为Defaule priorities policies集合)给预选后的Nodes进行打分排名,得分最高的Nodes即作为最合适的Node,该Pod就Bind到这个Node。
特殊调度方式:
- nodeName:根据节点名指定Pod运行再哪个节点上。
- nodeSelector:根据节点的标签指定Pod运行再哪个节点上。
- 倾向性
- 污点容忍。
常见的预选方式:
- CheckNodeCondition:检查是否再节点磁盘不可用,没有准备好的情况下将Pod调度到节点上。
- GeneralPredicates
- HostName:如果Pod定义了hostName属性,则检查节点的名字是否与hostName匹配。
- PodFitsHostPorts:Pod定义的hostPort属性可以适配节点的端口。
- MatchNodeSelector:Pod定义的nodeSelector属性可以与节点的标签适配。
- PodFitsResources:检查节点是否有足够的资源来运行Pod。
- NoDiskConlict:检查节点是否满足Pod存储卷的要求(默认不启用)。
- PodToleratesNodeTaints:检查Pod上的tolerations(容忍的污点)属性是否适配节点上的污点。
- PodToleratesNodeNoExcuteTaints:如果Pod在绑定节点之后,节点上出现Pod不能容忍的污点,则驱离Pod(默认不启用)。
- CheckNodelabelPresence:检查节点上标签的存在性(默认不启用)。
- CheckServiceAffinity:根据当前Pod对象所属的Service已有的其他Pod对象绑定的节点来进行绑定(为了让Service上的Pod都运行在相同的节点上,默认不启用)。
- CheckVolumeBinding:检查节点的存储卷是否被别的Pod所绑定。
- CheckNodeMemoryPressure:检查节点内存资源是否存在处于压力过大的状态。
- CheckNodePIDPressure:检查节点进程资源是否存在压力过大的状态。
- CheckNodeDiskPressure:检查节点硬盘资源是否存在压力过大的状态。
- MatchInterPodAffinity:检查节点是适配于Pod的倾合性和反倾合向的要求。
常见的优选方式:
- LeastRequested:根据节点空闲资源量与节点总资源量的比例来评估,比例越小的胜出。
- CPU:总容量减去已经占用的容量之和乘以10,再除总容量(capacity-sum(requested)* 10/capaity)。
- Memory:总容量减去已经占用的容量之和乘以10,再除总容量,再除2(capacity-sum(requested)* 10/capaity)/2。
- BalancedResourceAllocation:均衡资源分配方式,根据CPU和内存资源使用率相近诚度来分配资源。
- NodePreferAvoidPods:根据节点的注解信息来判断。
- TaintToleration:基于Pod资源对节点污点容忍程度判断。
- SelectorSpreading:与当前Pod对象同属的标签选择器,选择适配的其他Pod对象节点越多的得分越低。
- InterPodAffinity:遍历Pod对象的倾向条目,匹配项越多的,得分越高。
- MostRequested:与LeastRequested相反,根据节点空闲资源量与节点总资源量的比例来评估,比例越大的胜出。
- NodeLabel:根据节点是否适配标签来评分,不存在则不得分。
- ImageLocality:根据节点是否存在资源定义的image属性和已有镜像的体积来进行评分。
nodeSelector(节点标签):
#如果Pod的nodeSelector属性没有适配的节点,则会一直处于Pending状态。
#直到集群中有nodeSelector属性适配的节点,Pod才能被创建。
[root@k8smaster ~]# cd /data/
[root@k8smaster data]# mkdir scheduler
[root@k8smaster data]# cd scheduler/
[root@k8smaster scheduler]# vim pod-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-demo
labels:
app: myapp
tier: frontend
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
nodeSelector:
disktype: ssd
[root@k8smaster scheduler]# kubectl apply -f pod-demo.yaml
pod/pod-demo created
[root@k8smaster scheduler]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-demo 0/1 Pending 0 99s
pod-sa-demo 1/1 Running 7 12d
pod1 1/1 Running 3 6d
[root@k8smaster scheduler]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8smaster Ready master 76d v1.16.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8smaster,kubernetes.io/os=linux,node-role.kubernetes.io/master=
k8snode1 Ready <none> 76d v1.16.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8snode1,kubernetes.io/os=linux
k8snode2 Ready <none> 76d v1.16.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8snode2,kubernetes.io/os=linux
[root@k8smaster scheduler]# kubectl label node k8snode1 disktype=ssd
node/k8snode1 labeled
[root@k8smaster scheduler]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-demo 1/1 Running 0 5m48s 10.244.1.20 k8snode1 <none> <none>
pod-sa-demo 1/1 Running 7 12d 10.244.1.16 k8snode1 <none> <none>
pod1 1/1 Running 3 6d 10.244.1.17 k8snode1 <none> <none>
nodeAffinity(节点倾合性):
- preferredDuringSchedulingIgnoredDuringExecution:软倾合性,尽量满足。
- requiredDuringSchedulingIgnoredDuringExecution:硬倾合性,必须满足。
[root@k8smaster scheduler]# vim pod-affinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-node-affinity-demo
labels:
app: myapp
tier: frontend
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: zone
operator: In
values:
- foo
- bar
[root@k8smaster scheduler]# kubectl apply -f pod-affinity-demo.yaml
pod/pod-node-affinity-demo created
#集群中的没有节点满足requiredDuringSchedulingIgnoredDuringExecution硬倾合性要求,所以pod处于Pending状态。
[root@k8smaster scheduler]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-demo 1/1 Running 1 24h
pod-node-affinity-demo 0/1 Pending 0 75s
pod-sa-demo 1/1 Running 8 13d
pod1 1/1 Running 4 7d1h
[root@k8smaster scheduler]# vim pod-affinity-demo-2.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-node-affinity-demo-2
labels:
app: myapp
tier: frontend
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: zone
operator: In
values:
- foo
- bar
weight: 60
[root@k8smaster scheduler]# kubectl apply -f pod-affinity-demo-2.yaml
pod/pod-node-affinity-demo-2 created
#集群中的没有节点满足preferredDuringSchedulingIgnoredDuringExecution软倾合性要求,所以pod能够处于Running状态。
[root@k8smaster scheduler]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-demo 1/1 Running 1 24h
pod-node-affinity-demo 0/1 Pending 0 8m51s
pod-node-affinity-demo-2 1/1 Running 0 6s
pod-sa-demo 1/1 Running 8 13d
pod1 1/1 Running 4 7d1h
[root@k8smaster scheduler]# kubectl delete -f .
pod "pod-node-affinity-demo-2" deleted
pod "pod-node-affinity-demo" deleted
pod "pod-demo" deleted
podAffinity(pod倾合性):
#将pod-second调度到与pod-first倾合的节点之上,这样两个Pod无论如何都能运行在同一个节点上。
[root@k8smaster scheduler]# vim pod-affinity-demo-3.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-first
labels:
app: myapp
tier: frontend
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
---
apiVersion: v1
kind: Pod
metadata:
name: pod-second
labels:
app: backend
tier: db
spec:
containers:
- name: busybox
image: busybox:latest
imagePullPolicy: IfNotPresent
command: ["sh","-c","sleep 3600"]
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- {key: app, operator: In, values: ["myapp"]}
topologyKey: kubernetes.io/hostname
[root@k8smaster scheduler]# kubectl apply -f pod-affinity-demo-3.yaml
pod/pod-first created
pod/pod-second created
#发现pod-first与pod-second都运行在node2上。
[root@k8smaster scheduler]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-first 1/1 Running 0 102s 10.244.2.16 k8snode2 <none> <none>
pod-sa-demo 1/1 Running 8 13d 10.244.1.26 k8snode1 <none> <none>
pod-second 1/1 Running 0 102s 10.244.2.17 k8snode2 <none> <none>
pod1 1/1 Running 4 7d2h 10.244.1.24 k8snode1 <none> <none>
[root@k8smaster scheduler]# kubectl delete -f pod-affinity-demo-3.yaml
pod "pod-first" deleted
pod "pod-second" deleted
podAntiAffinity:(pod反倾合性)
#将pod-second不要调度到与pod-first倾合的节点之上,这样两个Pod无论如何都不能运行在同一个节点上。
[root@k8smaster scheduler]# vim pod-affinity-demo-4.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-first
labels:
app: myapp
tier: frontend
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
---
apiVersion: v1
kind: Pod
metadata:
name: pod-second
labels:
app: backend
tier: db
spec:
containers:
- name: busybox
image: busybox:latest
imagePullPolicy: IfNotPresent
command: ["sh","-c","sleep 3600"]
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- {key: app, operator: In, values: ["myapp"]}
topologyKey: kubernetes.io/hostname
[root@k8smaster scheduler]# kubectl apply -f pod-affinity-demo-4.yaml
pod/pod-first created
pod/pod-second created
#发现pod-first运行在node2上,而pod-second则运行在node1上。
[root@k8smaster scheduler]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-first 1/1 Running 0 7s 10.244.2.18 k8snode2 <none> <none>
pod-sa-demo 1/1 Running 8 13d 10.244.1.26 k8snode1 <none> <none>
pod-second 0/1 Running 0 7s 10.244.1.25 k8snode1 <none> <none>
pod1 1/1 Running 4 7d2h 10.244.1.24 k8snode1 <none> <none>
taint的effect定义对Pod的排斥等级:
- Noschedule:仅影响调度过程,对现存的Pod对象不产生影响。
- NoExecute:即影响调度过程,也影响现存的Pod对象,不容忍的Pod对象将被驱逐节点。
- PreferNoschedule:软排斥,如果Pod实在没有地方可调度,则不会驱逐Pod对象。
给指定的节点定义污点的容忍等级:
[root@k8smaster scheduler]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8smaster Ready master 77d v1.16.2
k8snode1 Ready <none> 77d v1.16.2
k8snode2 Ready <none> 77d v1.16.2
[root@k8smaster scheduler]# kubectl taint node k8snode1 node-type=production:NoSchedule
node/k8snode1 tainted
[root@k8smaster scheduler]# kubectl taint node k8snode2 node-type=production:NoSchedule
node/k8snode2 tainted
[root@k8smaster data]# cd scheduler/
[root@k8smaster scheduler]# vim myapp-dep.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-deploy
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: myapp
release: canary
template:
metadata:
labels:
app: myapp
release: canary
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
ports:
- name: http
containerPort: 80
[root@k8smaster scheduler]# kubectl apply -f myapp-dep.yaml
deployment.apps/myapp-deploy created
#由于我们的node1和node2存在污点,所以Pod无法被调度。
[root@k8smaster scheduler]# kubectl get pods
NAME READY STATUS RESTARTS AGE
myapp-deploy-7d77647d96-5c282 0/1 Pending 0 4s
myapp-deploy-7d77647d96-7cdbv 0/1 Pending 0 4s
myapp-deploy-7d77647d96-nkj7k 0/1 Pending 0 4s
pod-sa-demo 1/1 Running 8 13d
pod1 1/1 Running 4 7d3h
定义Pod的污点容忍:
[root@k8smaster scheduler]# vim myapp-dep.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-deploy
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: myapp
release: canary
template:
metadata:
labels:
app: myapp
release: canary
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
ports:
- name: http
containerPort: 80
tolerations:
- key: "node-type" #能容忍的污点key。
operator: "Equal" #Equal等于表示key=value,Exists不等于,表示当值不等于下面value正常。
value: "production" #值。
effect: "NoSchedule" #effect策略。
[root@k8smaster scheduler]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myapp-deploy-76b466f676-5q85w 1/1 Running 0 92s 10.244.1.27 k8snode1 <none> <none>
myapp-deploy-76b466f676-9jx6w 1/1 Running 0 97s 10.244.2.21 k8snode2 <none> <none>
myapp-deploy-76b466f676-j47ps 1/1 Running 0 91s 10.244.2.22 k8snode2 <none> <none>
pod-sa-demo 1/1 Running 8 13d 10.244.1.26 k8snode1 <none> <none>
pod1 1/1 Running 4 7d3h 10.244.1.24 k8snode1 <none> <none>
删除污点:
[root@k8smaster data]# kubectl taint node k8snode1 node-type=production:NoSchedule-
node/k8snode1 untainted
[root@k8smaster data]# kubectl taint node k8snode2 node-type=production:NoSchedule-
node/k8snode2 untainted
版权声明:本文为weixin_43803147原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。