CentOS7 使用 kubeadm 搭建 kubernetes 集群(踩坑篇)

序言
本人在搭建 k8s 集群的过程中曲折不断,故写下此文,欲与“同是天涯沦落人”分享。

这篇文章会详细描述安装过程中遇到的问题,以及本人相应的解决办法,如读者只想快速搭建 k8s 集群环境,可阅读另一篇文章 —— CentOS7 使用 kubeadm 启动 kubernetes 集群(极速篇)

本人在认知、语言、技术及阅历等种种方面不足,如若文中有误,烦请告知,必将一一更正,邮箱 kennyallen0520@gmail.com。

建议
学习 Kubernetes(简称K8s) 之前,需要有 Linux 基础以及基本掌握 Docker 的使用,在天朝局域网环境下还额外需要科学上网技巧。

简介
Kubernetes  (通常称为 K8s ) 是用于自动部署、扩展和管理 容器化 (containerized)应用程序的 开源 系统 —— 复制于维基百科。

环境
系统及内核版本
CentOS Linux release 7.4.1708 (Core) 最小化安装
内核版本 3.10.0-693.el7.x86_64
系统位数 x86_64

Docker-CE
Client:
Version: 18.03.1-ce
API version: 1.37
Go version: go1.9.5
Git commit: 9ee9f40
Built: Thu Apr 26 07:20:16 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarm

Server:
Engine:
Version: 18.03.1-ce
API version: 1.37 (minimum version 1.12)
Go version: go1.9.5
Git commit: 9ee9f40
Built: Thu Apr 26 07:23:58 2018
OS/Arch: linux/amd64
Experimental: false

kubeadm
kubeadm version: &version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:10:24Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

kubelet
Version: v1.10.2

kubectl
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:22:21Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:10:24Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

文档
Kubernetes

Docker


安装
系统准备
1. 更新系统软件
yum -y upgrade 或 yum -y update
异同: 都可用于更新 yum 源软件,但 upgrade 会过滤废弃的安装包

2.安装常用软件
由于是最小化安装,常用软件可按需安装(这大概和 vim 的插件化哲学是一样的吧)

yum -y install vim

安装 Docker-CE
1. 安装依赖软件
yum install -y yum-utils device-mapper-persistent-data lvm2

2.设置 yum 源
yum-config-manager \ --add-repo \ https://download.docker.com/linux/centos/docker-ce.repo

3.禁用 edge
yum-config-manager --disable docker-ce-edge

4.安装最新稳定版本
yum install -y docker-ce

5.开机启动 docker
systemctl enable docker && systemctl start docker

安装 kubeadm,kubelet,kubectl
kubeadm —— 启动 k8s 集群的命令工具
kubelet —— 集群容器内的命令工具
kubectl —— 操作集群的命令工具

1.添加 kubernetes 的 yum 源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
> [kubernetes]
> name=Kubernetes
> baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
> enabled=1
> gpgcheck=1
> repo_gpgcheck=1
> gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
> EOF

2. 安装 kubeadm,kubelet,kubectl
细心的你在添加 yum 源时可能就已经发现二级域名是 google.com,天朝局域网是不可以访问的,这就轮到科学上网技能登场了,但本文并不涉及如何科学上网,因为作者相信这肯定难不倒聪明的你。

使用代理科学上网

export HTTP_PROXY= http://192.168.1.100:1080
export HTTPS_PROXY=$HTTP_PROXY

yum install -y kubelet kubeadm kubectl

开机启动 kubelet

systemctl enable kubelet && systemctl start kubelet

3. 初始化集群
kubeadm init

几分钟后,不出意外就会出现如下错误:
Unfortunately, an error has occurred:
timed out waiting for the condition

This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
- Either there is no internet connection, or imagePullPolicy is set to "Never",
so the kubelet cannot pull or find the following control plane images:
- k8s.gcr.io/kube-apiserver-amd64:v1.10.2
- k8s.gcr.io/kube-controller-manager-amd64:v1.10.2
- k8s.gcr.io/kube-scheduler-amd64:v1.10.2
- k8s.gcr.io/etcd-amd64:3.1.12 (only if no external etcd endpoints are configured)

原因很明显告诉我们是 kubelet 启动失败,由于
1. k8s.gcr.io/kube-apiserver-amd64:v1.10.2
2. k8s.gcr.io/kube-controller-manager-amd64:v1.10.2
3. k8s.gcr.io/kube-scheduler-amd64:v1.10.2
4. k8s.gcr.io/etcd-amd64:3.1.12
这些镜像是存放在 gcr.io(google 镜像仓库),我大天朝局域网是访问不了的。
读者可能会疑惑,刚才已经设置环境变量 HTTP_PROXY 和 HTTPS_PROXY 这种代理方式来科学上网,怎么还会出现这种情况呢。
道理其实也很简单,docker 并不会使用系统的环境变量,需要对 docker 单独进行配置。

mkdir -p /etc/systemd/system/docker.service.d

touch /etc/systemd/system/docker.service.d/http-proxy.conf

echo -e '[Service]\nEnvironment="HTTP_PROXY=http://192.168.1.100:1080"' > /etc/systemd/system/docker.service.d/http-proxy.conf

touch /etc/systemd/system/docker.service.d/https-proxy.conf

echo -e '[Service]\nEnvironment="HTTPS_PROXY=http://192.168.1.100:1080"' > /etc/systemd/system/docker.service.d/http-proxy.conf

重启 docker 使配置生效

systemctl daemon-reload && systemctl restart docker

docker info | grep -i proxy
HTTP Proxy: http://192.168.1.100:1080
HTTPS Proxy: http://192.168.1.100:1080

可以看到配置已经成功

再次 kubeadm init

[preflight] Some fatal errors occurred:
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists

此时会出现预检查抛出的 ERROR,可以忽略

kubeadm init --ignore-preflight-errors=all

Unfortunately, an error has occurred:
timed out waiting for the condition

This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (requir ed cgroups disabled)
- Either there is no internet connection, or imagePullPolicy is set to "Never",
so the kubelet cannot pull or find the following control plane images:
- k8s.gcr.io/kube-apiserver-amd64:v1.10.2
- k8s.gcr.io/kube-controller-manager-amd64:v1.10.2
- k8s.gcr.io/kube-scheduler-amd64:v1.10.2
- k8s.gcr.io/etcd-amd64:3.1.12 (only if no external etcd endpoints are confi gured)

此时仍然出现了未设置代理时的错误,至此读者不要着急,同时也请相信我。

If you are on a systemd-powered system, you can try to troubleshoot the error with the follo wing commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'

仔细看看错误中还有这个提示

journalctl -xeu kubelet

failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"

真是山重水复疑无路,柳暗花明又一村。原来是 kubelet 启动时的 cgroup driver 和 docker 的不一致。

根据官方文档

sed -i "s/cgroup-driver=systemd/cgroup-driver=cgroupfs/g" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

重启 kubelet

systemctl daemon-reload && systemctl restart kubelet

再次 kubeadm init --ignore-preflight-errors=all

此时发现启动卡住了,且日志停在了

[init] This might take a minute or longer if the control plane images have to be pulled.

进一步查看详细日志

journalctl -xeu kubelet

Apr 30 14:02:55 bogon kubelet[53241]: I0430 14:02:55.287868 53241 kubelet_node_status.go:82] Attempting to register node bogon
Apr 30 14:02:55 bogon kubelet[53241]: E0430 14:02:55.294979 53241 kubelet_node_status.go:106] Unable to register node "bogon" with API server: Post https://192.168.19.150:6443/api/v1/...

由于API调用不成功 node 注册失败
分析可能原因: 端口不通,防火墙未开放 API Server 端口

kubeadm init 的日志中也多次出现如下警告

[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluste r may not function correctly

ctrl + c 终止 kubeadm init 命令

开放这两个端口
firewall-cmd --zone=public --add-port=6443/tcp --permanent && firewall-cmd --zone=public --add-port=10250/tcp --permanent && firewall-cmd --reload

此时重置一下 kubeadm 之前执行过的操作,再进行初始化
kubeadm reset && systemctl start kubelet && kubeadm init --ignore-preflight-errors=all

然而还是卡在
[init] This might take a minute or longer if the control plane images have to be pulled.

后面想了想,如果是由于端口问题,那么日志中不可能只是 WARNING 级别,同时日志中也曾出现 Successfully registered node bogon,说明在端口不开放的情况下,API 也是可以调通的。

看了官方文档,也没有丝毫关于这个情况的说明,那么就有可能是操作了官方没有的一些步骤,说到这,应该很容易就能想到是我大天朝不能直接访问 google,加了代理。那么为了排除代理的干扰,只需要简单地增加一个环境变量 NO_PROXY 指定访问本机 IP 时不使用代理。

胜利的曙光终于到来,k8s master 终于初始化成功了!

注意:如果忘记了 token 的 hash,可以在没有 reset (k8s 相关容器没有被删除)的情况下再次执行 kubeadm init 获得,日志中的 token 是相同的

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

kubeadm join 192.168.19.150:6443 --token i2yq5b.tpmy284orbzssb5a --discovery-token-ca-cert-hash sha256:f598777ca9d1f5bb7eee7e30e13cb41934473be0ec8bce9c917795e07156ae04

根据日志信息,要开始使用集群,需要以普通用户执行以下命令

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

你还需要给集群部署一个 pod 网络,让 pod 内部服务之间可以相互通讯


我使用了 Calico ,根据官方文档,它需要以 192.168.0.0/16 网段来初始化

kubeadm reset && systemctl start kubelet && kubeadm init --ignore-preflight-errors=all --pod-network-cidr=192.168.0.0/16

切换到 k8s 用户(k8s 用户必须有 sudo 权限)
su -l k8s

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

安装 Calico pod network


kubectl get nodes
NAME STATUS ROLES AGE VERSION
bogon NotReady master 8m v1.10.2

安装 Calico pod network


测试环境使用单机集群,可以使用如下命令,让 master 上也可以有 pod

kubectl taint nodes --all node-role.kubernetes.io/master-

kubeadm join 加入集群

kubeadm join 192.168.19.150:6443 --token v82z7e.jqya4dn63u0jhkqm --discovery-token-ca-cert-hash sha256:649e61ed10d3452106aea07b2b91e6ba1bd731fe9a4c50f273a0ab11f7b823c7

使用 root 用户执行以上命令,向 k8s 集群加入节点

若出现预检查错误:
[preflight] Some fatal errors occurred:
[ERROR Port-10250]: Port 10250 is in use
[ERROR DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`

同样通过加运行参数,忽略错误

kubeadm join 192.168.19.150:6443 --token v82z7e.jqya4dn63u0jhkqm --discovery-token-ca-cert-hash sha256:649e61ed10d3452106aea07b2b91e6ba1bd731fe9a4c50f273a0ab11f7b823c7 --ignore-preflight-errors=all

出现如下日志,则加入集群成功

kubeadm join 192.168.19.150:6443 --token v82z7e.jqya4dn63u0jhkqm --discovery-token-ca-cert-hash sha256:649e61ed10d3452106aea07b2b91e6ba1bd731fe9a4c50f273a0ab11f7b823c7 --ignore-preflight-errors=all
[discovery] Successfully established connection with API Server "192.168.19.150:6443"

This node has joined the cluster:
* Certificate signing request was sent to master and a response
was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

执行 kubectl get nodes

NAME STATUS ROLES AGE VERSION
bogon Ready master 28m v1.10.2

删除节点

首先释放 bogon 节点资源

kubectl drain bogon --delete-local-data --force --ignore-daemonsets

删除 bogon 节点

kubectl delete node bogon

查看节点
kubectl get nodes
No resources found.

至此搭建 k8s 集群就大功告成了!

版权声明:本文为u012570862原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。