利用 kubeadm 简单搭建k8s(已更新为V1.13.0版本)

1. 基本系统环境

1.1 系统内核

查看当前系统内核(我这里是5.0.5-1.el7.elrepo.x86_64):

uname -a

版本必须大于等于3.10,否则需要升级内核:

# ELRepo 仓库(可以先看一下 /etc/yum.repos.d/ 中是否有yum 源)
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm

# 查看可用内核
yum --disablerepo="*" --enablerepo="elrepo-kernel" list available

# 安装最新内核
yum --enablerepo=elrepo-kernel install kernel-ml

# 查看可用内核,一般上面安装的序号为0
awk -F\' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg

# 设置默认内核为最新的
grub2-set-default 0

# 生成 grub 配置文件
grub2-mkconfig -o /boot/grub2/grub.cfg

# 重启
reboot

# 验证
uname -a

1.2 关闭增强型Linux SELINUX并禁用防火墙

setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
# 更改为 SELINUX=disabled
vim /etc/selinux/config

systemctl stop firewalld
systemctl disable firewalld
# 验证
systemctl status firewalld

1.3 确保hostname、MAC地址和UUID的唯一性

# 查看mac地址确认
cat /sys/class/net/ens33/address

# 查看确认UUID
cat /sys/class/dmi/id/product_uuid

1.4 关闭swap

swapoff -a,并注释/etc/fstab中的

/dev/mapper/centos-swap swap                    swap    defaults        0 0

确认关闭(swap应该为0)

free -m

1.5 配置网桥的流量,避免错误

引入br_netfilter模块:

# 方式1
lsmod | grep br_netfilter

# 方式2
modprobe br_netfilter

配置各节点系统内核参数使流过网桥的流量也进入iptables/netfilter框架中,可以配置到/etc/sysctl.conf中,或者在/usr/lib/sysctl.d//run/sysctl.d//etc/sysctl.d/任意目录下创建配置文件进行配置,比如:

cat <<EOF >  /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

指定加载的配置文件:

# 从/etc/sysctl.d/k8s.conf 加载配置
sysctl -p /etc/sysctl.d/k8s.conf
# 加载任意配置文件
sysctl --system

1.6 节点hosts文件

节点安排如下(根据自己虚拟机的IP自行更改),每个机器上hosts文件都加进去:

10.4.37.24 k8smaster
10.4.37.69 k8snode1
10.4.37.72 k8snode2

或者在克隆机器后,直接更改每个机器的主机命名(其他节点类似):

# 更改主机名
hostname k8smaster
# 编辑hosts,追加
vim /etc/hosts
127.0.0.1 k8smaster

2. 安装docker

详细版本搭配见External Dependencies

添加docker的yum源(yum的源配置 /etc/yum.repos.d/

yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

【注】如果没有代理,将上述yum的源改成国内的,这里使用阿里云:

yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

搜索docker-ce可用镜像:

yum list docker-ce --showduplicates | sort -r

安装指定版本的docker(目前最新的1.14.0版本的kubenetes依赖的docker可以为1.13.1, 17.03, 17.06, 17.09, 18.06, 18.09):

yum install docker-ce-17.09.1.ce-1.el7.centos

创建docker配置目录和docker守护线程配置文件:

mkdir /etc/docker
vim /etc/docker/daemon.json

写入(可以更改驱动为默认的cgroupfs,docker加速器可以去阿里云获取)

{
    "registry-mirrors": ["https://xxxx.mirror.aliyuncs.com"],
    "exec-opts": ["native.cgroupdriver=systemd"]
}

配置网络策略:

iptables -P FORWARD ACCEPT

配置docker启动策略并重新启动:

systemctl daemon-reload && systemctl enable docker && systemctl restart docker

【附】

若版本不符合需要对docker进行升级或者删除后再进行上述的安装:

yum -y remove docker*
rm -rf /var/lib/docker

3. 安装kubelet kubectl kubeadm(v1.14.0)

  • kubeadm:启动集群的命令工具;
  • kubelet:用于启动Pod和容器等对象的工具(核心组件),所以它需要在集群中的每个节点上部署;
  • kubectl:用于和集群通信的命令行;

3.1 配置yum源

如果可以翻墙(科学上网Shadowsocks),可以直接使用google的源:

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kube*
EOF

添加Kubernetes安装源认证key:

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -

【注】如果没有代理,将上述yum的源改成国内的,这里使用阿里云:

cat <<EOF > kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

安装最新的k8s安装工具并设置启动策略:

# 安装最新的kubelet和kubectl以及kubeadm
yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
# 设置启动策略
systemctl daemon-reload && systemctl enable kubelet && systemctl start kubelet

当使用docker时,kubeadm 会自动检测 kubelet 使用的cgroup driver ,并将运行时设置到 /var/lib/kubelet/kubeadm-flags.env 文件中,如果使用不同的CRI(容器运行时),就需要编辑 /etc/default/kubelet(该文件需要手动创建)指定cgroup-driver的值:

Environment=
KUBELET_EXTRA_ARGS=--cgroup-driver=systemd

由于 kubelet 默认的驱动就是cgroupfs,所以只有CRI的cgroup driver不是cgroupfs时才需要指定(k8s推荐docker的cgroup driver配置为systemd)。

拉取核心组件镜像(翻墙不稳定可能需要多拉几次):

kubeadm config images pull

注:

如果无法拉下镜像可以通过kubeadm config images list --kubernetes-version=v1.13.0命令查看具体需要哪些镜像:

k8s.gcr.io/kube-apiserver:v1.13.0
k8s.gcr.io/kube-controller-manager:v1.13.0
k8s.gcr.io/kube-scheduler:v1.13.0
k8s.gcr.io/kube-proxy:v1.13.0
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.2.24
k8s.gcr.io/coredns:1.2.6

然后从docker的google的克隆镜像拉取,完事后更换tag

# 从docker.io拉取上述镜像
docker pull mirrorgooglecontainers/kube-apiserver:v1.13.0
docker pull mirrorgooglecontainers/kube-controller-manager:v1.13.0
docker pull mirrorgooglecontainers/kube-scheduler:v1.13.0
docker pull mirrorgooglecontainers/kube-proxy:v1.13.0
docker pull mirrorgooglecontainers/pause:3.1
docker pull mirrorgooglecontainers/etcd:3.2.24
docker pull coredns/coredns:1.2.6

# 重新打上google的tag
docker tag mirrorgooglecontainers/kube-apiserver:v1.13.0 k8s.gcr.io/kube-apiserver:v1.13.0
# 删除旧镜像
docker rmi mirrorgooglecontainers/kube-apiserver:v1.13.0

# 重新打上
docker tag mirrorgooglecontainers/kube-controller-manager:v1.13.0 k8s.gcr.io/kube-controller-manager:v1.13.0
# 删除旧镜
docker rmi mirrorgooglecontainers/kube-controller-manager:v1.13.0

# 重新打上
docker tag mirrorgooglecontainers/kube-scheduler:v1.13.0 k8s.gcr.io/kube-scheduler:v1.13.0
# 删除旧镜
docker rmi mirrorgooglecontainers/kube-scheduler:v1.13.0

# 重新打上
docker tag mirrorgooglecontainers/kube-proxy:v1.13.0 k8s.gcr.io/kube-proxy:v1.13.0
# 删除旧镜
docker rmi mirrorgooglecontainers/kube-proxy:v1.13.0

# 重新打上
docker tag mirrorgooglecontainers/pause:3.1 k8s.gcr.io/pause:3.1
# 删除旧镜
docker rmi mirrorgooglecontainers/pause:3.1

# 重新打上
docker tag mirrorgooglecontainers/etcd:3.2.24 k8s.gcr.io/etcd:3.2.24
# 删除旧镜
docker rmi mirrorgooglecontainers/etcd:3.2.24

# 重新打上
docker tag coredns/coredns:1.2.6 k8s.gcr.io/coredns:1.2.6
# 删除旧镜
docker rmi coredns/coredns:1.2.6

3.2 克隆机器

 上述基本步骤搞完,为了方便起见,将上述的机器clone2份以作集群测试,将节点hosts文件中的节点IP更改成clone得到的2台机器的IP;

4. Master 节点

4.1 集群初始化

至少2核4G内存

添加flannel

# 拉取flannel镜像
docker pull quay.io/coreos/flannel:v0.10.0-amd64

mkdir -p /etc/cni/net.d/

cat <<EOF> /etc/cni/net.d/10-flannel.conf
{"name":"cbr0","type":"flannel","delegate": {"isDefaultGateway": true}}
EOF

mkdir /usr/share/oci-umount/oci-umount.d -p
mkdir /run/flannel/

cat <<EOF> /run/flannel/subnet.env
FLANNEL_NETWORK=172.100.0.0/16
FLANNEL_SUBNET=172.100.1.0/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
EOF

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml

利用 kubeadm init 一键完成Master节点的安装:

kubeadm init --pod-network-cidr 10.244.0.0/16 --kubernetes-version stable

可以指定kubernetes的版本(如--kubernetes-version=v1.9.1),这里直接用最新的稳定版(1.14.0)。

然后就死等,出现错误查看日志/var/log/messages、查看kubelet的日志:

journalctl -f -u kubelet

kubadm的配置文件:/usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf

如果修改了配置文件后需要:

# 保存配置, 重启
systemctl daemon-reload
systemctl restart kubelet
systemctl enable kubelet && systemctl start kubelet

如果中间出现错误,重新初始化:

kubeadm reset
kubeadm init --pod-network-cidr 10.244.0.0/16 --kubernetes-version stable

打印日志如下:

[init] Using Kubernetes version: v1.14.0
[preflight] Running pre-flight checks
	[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8smaster localhost] and IPs [10.4.37.24 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8smaster localhost] and IPs [10.4.37.24 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8smaster kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.4.37.24]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 30.503797 seconds
[upload-config] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.14" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --experimental-upload-certs
[mark-control-plane] Marking the node k8smaster as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node k8smaster as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: pl1vir.d7e5xy3xy3uuymou
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.4.37.24:6443 --token pl1vir.d7e5xy3xy3uuymou \
    --discovery-token-ca-cert-hash sha256:b6ecd6ad73e072f2290a14213e32b681cd41c9010a9403bb32e1e213f7c167d2

这里需要记录一下 kubeadm init 输出的 kubeadm join …… 命令,后面需要这个命令将各个节点加入集群中,先将它复制到其他地方备份,这玩意儿是真重要(没骗你……)

kubeadm join 10.4.37.24:6443 --token pl1vir.d7e5xy3xy3uuymou \
    --discovery-token-ca-cert-hash sha256:b6ecd6ad73e072f2290a14213e32b681cd41c9010a9403bb32e1e213f7c167d2

上述的令牌(已经加密,生命周期24小时)用于 master 和加入的 node 之间相互身份之间验证,凭借这个令牌可以让任何人将认证的节点加入到该集群,如果需要对令牌进行增、删、查的操作,可以使用 kubeadm token 命令,具体可参看kubeadm token

根据它的提示,为了让 kubelet 为非root用户使用,需要作如下的一些配置:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

当然如果已经是 root 用户安装的(我这边本地虚拟机就是以root用户执行的),可以执行:

export KUBECONFIG=/etc/kubernetes/admin.conf

查看集群状态:

# 查看集群状态
kubectl get cs

# 输出
NAME                 STATUS    MESSAGE             ERROR
controller-manager   Healthy   ok                  
scheduler            Healthy   ok                  
etcd-0               Healthy   {"health":"true"}

【注】

 当然,在随后查看 kubelet 日志的过程中,还是发现有些问题,如下:

4月 01 13:41:52 k8smaster kubelet[21177]: W0401 13:41:52.869421   21177 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
4月 01 13:41:56 k8smaster kubelet[21177]: E0401 13:41:56.656635   21177 kubelet.go:2170] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

4.2 Pod 网络附加组件(重要)

 Pod 的网络组件很重要,这个组件主要作用是让 Pod 之间可以相互通信,该网络组件必须在任何应用部署之前进行部署,当然 CoreDNS 也只有在网络组件安装之后才能正常启动,kubeadm 仅仅支持基于网络(networks)的CNI(Container Network Interface),不支持 kubenet。有些项目提供了使用 CNI 的 k8s Pod,其中某些也支持网络协议。

 POD网络不能与任何主机网络重叠,否则可能导致问题,如果发现网络插件的首选POD网络与某些主机网络发生冲突,应该考虑一个合适的CIDR替换,并在kubeadm init with--pod network cidr期间使用它,并将其作为网络插件yaml文件中的替换,根据上述集群初始化的输出提示,安装Pod的网路插件:

kubectl apply -f [podnetwork].yaml

每个集群中仅且可以安装一个Pod网络组件,可选网络组件有:Calico、Canal、Cilium、Flannel、Kube-router等等,这里选用 Kube-router,将桥接的IPv4流量传递到IPtables的链(重要),这样才能使得 CNI 正常工作,只要把/proc/sys/net/bridge/bridge-nf-call-iptables设置为1即可(在准备工作中已经做过):

sysctl net.bridge.bridge-nf-call-iptables=1

Kube-router 依赖于 kube-controller-manager 为节点分配 CIDR(无类别域间路由,Classless Inter-Domain Routing),因此初始化kubeadm init时带上--pod-network-cidr标识,其实上述初始化过程中我已经带上了该标识,验证一下:

kubectl get pods --all-namespaces

# 结果
NAMESPACE     NAME                                READY   STATUS    RESTARTS   AGE
kube-system   coredns-fb8b8dccf-gzs2k             1/1     Running   0          27h
kube-system   coredns-fb8b8dccf-hs56b             1/1     Running   0          27h
kube-system   etcd-k8smaster                      1/1     Running   1          27h
kube-system   kube-apiserver-k8smaster            1/1     Running   1          27h
kube-system   kube-controller-manager-k8smaster   1/1     Running   1          27h
kube-system   kube-flannel-ds-z7r6t               1/1     Running   0          5h31m
kube-system   kube-proxy-75w9l                    1/1     Running   1          27h
kube-system   kube-scheduler-k8smaster            1/1     Running   1          27h

kubectl get pods --all-namespaces是通过检测 CoreDNS 是否运行来判断网络组件是否正常安装。

4.3 控制平面节点(Plane node)隔离

 默认情况下,出于安全原因,集群不会在master上调度pod,如果偏想在master上调度Pod,对于单节点Kubernetes集群,可以执行:

kubectl taint nodes --all node-role.kubernetes.io/master-

上述命令将会移除所有的node-role.kubernetes.io/master-污点(只要包含该对象的节点,将从中全部移除该对象,包括master节点),这样调度器就可以在任何节点上调度Pod。这一块我是搭建的不是单节点集群,所以不进行配置。

5. Node节点

5.1 向集群中添加 worker node 节点

至少4核16G内存

为了让其他节点也可以使用kubectl相关命令,将Master节点生成的/etc/kubernetes/admin.conf文件复制到普通节点对象的位置上。

上面搞完了Master节点,接着搞普通的节点(worker node),普通的节点就是负责具体工作的节点,开始之前,需要先配置各个节点SHH免密登录,然后再进行加入(上述初始化过程中复制出来的命令):

kubeadm join 10.4.37.24:6443 --token pl1vir.d7e5xy3xy3uuymou --discovery-token-ca-cert-hash sha256:b6ecd6ad73e072f2290a14213e32b681cd41c9010a9403bb32e1e213f7c167d2

如果距离初始化集群的时间太长可以看一下令牌是否过期:

# 查看令牌
kubeadm token list
# 结果如下
TOKEN                     TTL       EXPIRES                     USAGES                   DESCRIPTION                                                EXTRA GROUPS
pl1vir.d7e5xy3xy3uuymou   17h       2019-04-02T11:01:09+08:00   authentication,signing   The default bootstrap token generated by 'kubeadm init'.   system:bootstrappers:kubeadm:default-node-token

可以看到初始化生成的令牌过期时间为2019-04-02T11:01:09(24小时),如果过期需要手动创建新的令牌:

kubeadm token create

系统将生成一个新的令牌(类似于pl1vir.d7e5xy3xy3uuymou),还需要一个加密串discovery-token-ca-cert-hash,可以通过一下命令获取:

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | \
   openssl dgst -sha256 -hex | sed 's/^.* //'

我的第二个令牌为:

kubeadm join 10.4.37.24:6443 --token 4lqozr.xyawinzvspo4zha7 --discovery-token-ca-cert-hash sha256:b6ecd6ad73e072f2290a14213e32b681cd41c9010a9403bb32e1e213f7c167d2

或者直接使用kubeadm token create --print-join-command会直接将加入集群的命令打印出来,然后将命令拼接出来即可将节点加入集群。在实操过程中,join minon 时一直卡住,原因未知,查看日志journalctl -xeu kubelet发现如下异常:

4月 02 09:15:25 k8snode1 kubelet[75413]: E0402 09:15:25.661220   75413 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)

这个问题卡了好久,不找到怎么解决。

重置环境再次加入:

# 重置kubeadm
kubeadm reset
# 重置iptables
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

在master节点上运行kubectl get nodes以查看节点是否加入,获取的节点状态为:

[root@bogon yum.repos.d]# kubectl get nodes
NAME         STATUS     ROLES    AGE   VERSION
k8s-master   Ready      master   15h   v1.13.0
k8s-node-2   NotReady   <none>   15h   v1.13.0
localhost    NotReady   <none>   15h   v1.13.0

查看2个节点上的kubelet的日志journalctl -xeu kubelet

4月 04 09:14:01 k8s-node-1 kubelet[23444]: W0404 09:14:01.200847   23444 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d
4月 04 09:14:01 k8s-node-1 kubelet[23444]: E0404 09:14:01.203078   23444 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

难道worker节点上也要安装 flannel ,尝试在2个节点上进行安装,问题消失。

【注】

在随后使用kubectl get nodes查看集群节点过程中突然出现The connection to the server localhost:8080 was refused - did you specify the right host or port?,解决方案:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

5.2 通过普通的 worker node (广义来讲就是非 Master 节点)控制集群(可选)

 为了在其他机器上使用 kubectl 以便来和集群进行通信,需要将 Master 节点上的配置文件复制到需要使用 kubectl 和集群通信的节点上,在 node 节点上直接进行远程复制:

# 将 master 节点上的配置文件复制到自己机器上
scp root@<master ip>:/etc/kubernetes/admin.conf $HOME/.kube/config
# 获取集群节点
kubectl --kubeconfig $HOME/.kube/config get nodes

但实际在复制后可以使用kubectl get nodes可以直接获取集群节点信息。

【注】

在随后的过程,Master节点突然出现:The connection to the server localhost:8080 was refused - did you specify the right host or port?,将配置文件复制到对应的地方可解决:cp /etc/kubernetes/admin.conf $HOME/.kube/config

5.3 将 API Server 代理到本地(可选)

 如果想从集群外部连接到 API Server,可以使kubectl proxy

scp root@<master ip>:/etc/kubernetes/admin.conf $HOME/.kube/admin.conf
kubectl --kubeconfig $HOME/admin.conf proxy

这样就可以在本地访问http://localhost:8001/api/v1

5.4 卸载

 这个过程主要是撤销 kubeadm 所做的事情,在卸载之前需要确保 node 已经清空了,使用凭证和master通信:

# 清空 node 节点数据
kubectl drain <node name> --delete-local-data --force --ignore-daemonsets

# 卸载 node
kubectl delete node <node name>

然后在待卸载的节点上重置 kubeadm 安装的环境:

kubeadm reset

上述命令重置会清理 iptables 规则或者 IPVS tables,如果有需要就手动:

iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

如果是想重置 IPVS tables,执行:

ipvsadm -C

这样以后如果想执行kubeadm init或者kubeadm join相关命令,直接搞就行。

【注】

注:在之后使用的过程中,系统升级后,发现kubectl命令无法使用:

[root@k8s-master ~]# kubectl get pods
The connection to the server 10.4.37.17:6443 was refused - did you specify the right host or port?
[root@k8s-master ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: activating (auto-restart) (Result: exit-code) since 日 2019-05-05 14:27:27 CST; 10s ago
     Docs: https://kubernetes.io/docs/
  Process: 22295 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
 Main PID: 22295 (code=exited, status=255)

5月 05 14:27:38 k8s-master kubelet[22335]: I0505 14:27:38.036646   22335 client.go:75] Connecting to docker on unix:///var/run/docker.sock
5月 05 14:27:38 k8s-master kubelet[22335]: I0505 14:27:38.036659   22335 client.go:104] Start docker client with request timeout=2m0s
5月 05 14:27:38 k8s-master kubelet[22335]: W0505 14:27:38.037961   22335 docker_service.go:561] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
5月 05 14:27:38 k8s-master kubelet[22335]: I0505 14:27:38.037992   22335 docker_service.go:238] Hairpin mode set to "hairpin-veth"
5月 05 14:27:38 k8s-master kubelet[22335]: I0505 14:27:38.040956   22335 docker_service.go:253] Docker cri networking managed by cni
5月 05 14:27:38 k8s-master kubelet[22335]: I0505 14:27:38.058198   22335 docker_service.go:258] Docker Info: &{ID:CCPW:VM3Q:D47E:JZ5T:HAU2:4627:YCPL:NRPI:WSX2:NH6Y:VDSJ:5XUD Containers:30 ContainersRunning:0 ContainersPaused:0 ContainersStopped:3...ports d_type true]] 
5月 05 14:27:38 k8s-master kubelet[22335]: F0505 14:27:38.058260   22335 server.go:265] failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
5月 05 14:27:38 k8s-master systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
5月 05 14:27:38 k8s-master systemd[1]: Unit kubelet.service entered failed state.
5月 05 14:27:38 k8s-master systemd[1]: kubelet.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

可以发现是kubelet的服务挂了,重启一下kubelet:systemctl restart kubelet,然后发现还是不行,查看日志:

journalctl -xefu kubelet

# 结果
5月 05 14:39:01 k8s-master systemd[1]: kubelet.service holdoff time over, scheduling restart.
5月 05 14:39:01 k8s-master systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit kubelet.service has finished shutting down.
5月 05 14:39:01 k8s-master systemd[1]: Started kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit kubelet.service has finished starting up.
-- 
-- The start-up result is done.
5月 05 14:39:02 k8s-master kubelet[24995]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
5月 05 14:39:02 k8s-master kubelet[24995]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.028334   24995 server.go:417] Version: v1.14.1
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.028511   24995 plugins.go:103] No cloud provider specified.
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.028531   24995 server.go:754] Client rotation is on, will bootstrap in background
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.030730   24995 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.096889   24995 server.go:625] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.097133   24995 container_manager_linux.go:261] container manager verified user specified cgroup-root exists: []
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.097145   24995 container_manager_linux.go:266] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>} {Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms}
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.097216   24995 container_manager_linux.go:286] Creating device plugin manager: true
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.097263   24995 state_mem.go:36] [cpumanager] initializing new in-memory state store
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.097333   24995 state_mem.go:84] [cpumanager] updated default cpuset: ""
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.097340   24995 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]"
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.097388   24995 kubelet.go:279] Adding pod path: /etc/kubernetes/manifests
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.097406   24995 kubelet.go:304] Watching apiserver
5月 05 14:39:02 k8s-master kubelet[24995]: E0505 14:39:02.098718   24995 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: Get https://10.4.37.17:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.4.37.17:6443: connect: connection refused
5月 05 14:39:02 k8s-master kubelet[24995]: E0505 14:39:02.098881   24995 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.4.37.17:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dk8s-master&limit=500&resourceVersion=0: dial tcp 10.4.37.17:6443: connect: connection refused
5月 05 14:39:02 k8s-master kubelet[24995]: E0505 14:39:02.098919   24995 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Get https://10.4.37.17:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-master&limit=500&resourceVersion=0: dial tcp 10.4.37.17:6443: connect: connection refused
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.099995   24995 client.go:75] Connecting to docker on unix:///var/run/docker.sock
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.100008   24995 client.go:104] Start docker client with request timeout=2m0s
5月 05 14:39:02 k8s-master kubelet[24995]: W0505 14:39:02.101221   24995 docker_service.go:561] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.101251   24995 docker_service.go:238] Hairpin mode set to "hairpin-veth"
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.105065   24995 docker_service.go:253] Docker cri networking managed by cni
5月 05 14:39:02 k8s-master kubelet[24995]: I0505 14:39:02.127753   24995 docker_service.go:258] Docker Info: &{ID:CCPW:VM3Q:D47E:JZ5T:HAU2:4627:YCPL:NRPI:WSX2:NH6Y:VDSJ:5XUD Containers:30 ContainersRunning:0 ContainersPaused:0 ContainersStopped:30 Images:41 Driver:overlay DriverStatus:[[Backing Filesystem xfs] [Supports d_type true]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:19 OomKillDisable:true NGoroutines:28 SystemTime:2019-05-05T14:39:02.114019459+08:00 LoggingDriver:json-file CgroupDriver:cgroupfs NEventsListener:0 KernelVersion:3.10.0-957.10.1.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc0005c2070 NCPU:4 MemTotal:3954184192 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:k8s-master Labels:[] ExperimentalBuild:false ServerVersion:17.09.1-ce ClusterStore: ClusterAdvertise: Runtimes:map[runc:{Path:docker-runc Args:[]}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil>} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:06b9cb35161009dcb7123345749fef02f7cea8e0 Expected:06b9cb35161009dcb7123345749fef02f7cea8e0} RuncCommit:{ID:3f2f8b84a77f73d38244dd690525642a72156c64 Expected:3f2f8b84a77f73d38244dd690525642a72156c64} InitCommit:{ID:949e6fa Expected:949e6fa} SecurityOptions:[name=seccomp,profile=default]}
5月 05 14:39:02 k8s-master kubelet[24995]: F0505 14:39:02.127894   24995 server.go:265] failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
5月 05 14:39:02 k8s-master systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
5月 05 14:39:02 k8s-master systemd[1]: Unit kubelet.service entered failed state.
5月 05 14:39:02 k8s-master systemd[1]: kubelet.service failed.

发现是驱动又不一样了,狂醉……(之前记得都改成了cgroupfs),更改驱动即可:

vim /var/lib/kubelet/kubeadm-flags.env

# 指定驱动为cgroupfs,其他的不要动
Environment=
KUBELET_EXTRA_ARGS=--cgroup-driver=cgroupfs