1. 集群现状

rke部署工具版本

1
2
[root@localhost rke2.2.2]# ./rke -version
rke version v0.2.6

rke部署k8s集群版本(用于高可用部署rancher)

1
2
3
[root@localhost rke2.2.2]# kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:36:19Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

rancher版本和负载集群版本
rancher2.2.2
kubernetes 1.13.5

2. 升级版本简介

官方2.3.x版本说明 2.3.0
Rancher 2.3.2,这是Rancher 2.3.x第一个stable的版本,这意味着它是Rancher官方推荐所有用户可用于生产环境的稳定版本!

3. 升级步骤

官方的升级指南高可用升级指南(Helm 2)

3.1 备份正在运行 Rancher Server 的 Kubernetes 集群

创建rke集群快照
如若升级失败用户恢复rancher版本

1
2
3
4
5
6
7
8
9
10
11
[root@localhost rke2.2.2]# ./rke etcd snapshot-save --name etcd-2022021602.db --config cluster.yml
INFO[0000] Starting saving snapshot on etcd hosts
INFO[0000] [dialer] Setup tunnel for host [192.168.10.20]
INFO[0000] [etcd] Saving snapshot [etcd-2022021602.db] on host [192.168.10.20]
INFO[0001] [etcd] Successfully started [etcd-snapshot-once] container on host [192.168.10.20]
INFO[0001] Waiting for [etcd-snapshot-once] container to exit on host [192.168.10.20]
INFO[0001] Container [etcd-snapshot-once] is still running on host [192.168.10.20]
INFO[0002] Waiting for [etcd-snapshot-once] container to exit on host [192.168.10.20]
INFO[0002] Container [etcd-snapshot-once] is still running on host [192.168.10.20]
INFO[0003] Waiting for [etcd-snapshot-once] container to exit on host [192.168.10.20]
INFO[0003] Finished saving snapshot [etcd-2022021602.db] on all etcd hosts

备份负载集群

3.2 更新 Helm chart 仓库

1
helm repo update

确保有需要升级的版本
rancher官方仓库stable版本列表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@localhost rke2.2.2]# helm search rancher -l
rancher-stable/rancher 2.3.4 v2.3.4 Install Rancher Server to manage Kubernetes clusters acro...
rancher-stable/rancher 2.3.3 v2.3.3 Install Rancher Server to manage Kubernetes clusters acro...
rancher-stable/rancher 2.3.2 v2.3.2 Install Rancher Server to manage Kubernetes clusters acro...
rancher-stable/rancher 2.2.13 v2.2.13 Install Rancher Server to manage Kubernetes clusters acro...
rancher-stable/rancher 2.2.11 v2.2.11 Install Rancher Server to manage Kubernetes clusters acro...
rancher-stable/rancher 2.2.10 v2.2.10 Install Rancher Server to manage Kubernetes clusters acro...
rancher-stable/rancher 2.2.9 v2.2.9 Install Rancher Server to manage Kubernetes clusters acro...
rancher-stable/rancher 2.2.8 v2.2.8 Install Rancher Server to manage Kubernetes clusters acro...
rancher-stable/rancher 2.2.7 v2.2.7 Install Rancher Server to manage Kubernetes clusters acro...
rancher-stable/rancher 2.2.6 v2.2.6 Install Rancher Server to manage Kubernetes clusters acro...
rancher-stable/rancher 2.2.5 v2.2.5 Install Rancher Server to manage Kubernetes clusters acro...
rancher-stable/rancher 2.2.4 v2.2.4 Install Rancher Server to manage Kubernetes clusters acro...
rancher-stable/rancher 2.2.3 v2.2.3 Install Rancher Server to manage Kubernetes clusters acro...
rancher-stable/rancher 2.2.2 v2.2.2 Install Rancher Server to manage Kubernetes clusters acro...

3.3 升级 Rancher

从已安装的当前 Rancher Helm chart 中获取通过 --set 传递的值

1
2
3
4
5
6
[root@localhost rke2.2.2]# helm get values rancher
hostname: cloud.jfjbapp.cn
ingress:
tls:
source: secret
privateCA: true

更新到指定版本

1
2
3
4
5
6
helm upgrade rancher rancher-stable/rancher \
--namespace cattle-system \
--set hostname=cloud.jfjbapp.cn \
--set ingress.tls.source=secret \
--set privateCA=true \
--version 2.3.2

升级过程后端集群需要升级rancher-agent从2.2.2到2.3.2会有1-2分钟负载集群不可用状态

稍等rancher和集群恢复正常状态

检查rke集群pod是否正常,确保状态为running并且无重启状态

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@localhost rke2.2.2]# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-system cattle-cluster-agent-74bcd8b6df-zxdrb 1/1 Running 0 3m12s
cattle-system cattle-node-agent-vqm4t 1/1 Running 0 3m8s
cattle-system rancher-848d7f74fd-5bstv 1/1 Running 0 4m30s
cattle-system rancher-848d7f74fd-5hhjg 1/1 Running 0 4m56s
cattle-system rancher-848d7f74fd-fhm7k 1/1 Running 0 5m10s
ingress-nginx default-http-backend-5954bd5d8c-9gjz7 1/1 Running 3 46h
ingress-nginx nginx-ingress-controller-zftwb 1/1 Running 0 7h11m
kube-system canal-ljlxc 2/2 Running 0 7h11m
kube-system coredns-86bc4b7c96-z875n 1/1 Running 0 7h11m
kube-system coredns-autoscaler-5d5d49b8ff-5jwvr 1/1 Running 0 7h11m
kube-system metrics-server-7f6bd4c888-4xxm8 1/1 Running 0 7h11m
kube-system rke-coredns-addon-deploy-job-zj72v 0/1 Completed 0 46h
kube-system rke-ingress-controller-deploy-job-sbprl 0/1 Completed 0 46h
kube-system rke-metrics-addon-deploy-job-xtf8m 0/1 Completed 0 46h
kube-system rke-network-plugin-deploy-job-pd9vg 0/1 Completed 0 46h
kube-system tiller-deploy-7cb87ddf7d-vzr4j 1/1 Running 3 46h

同理检查负载集群

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
root@node02:~# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-system cattle-cluster-agent-7d74b74f89-xrnx8 1/1 Running 0 5m7s
cattle-system cattle-node-agent-4gf92 1/1 Running 0 5m
cattle-system kube-api-auth-nprjl 1/1 Running 0 4m20s
default nginx-6d6d68488c-vq9qd 1/1 Running 1 9h
ingress-nginx default-http-backend-78fccfc5d9-7krhb 1/1 Running 1 9h
ingress-nginx nginx-ingress-controller-jjw6l 1/1 Running 0 7h8m
kube-system canal-wgzpc 2/2 Running 0 7h8m
kube-system kube-dns-58bd5b8dd7-5h4l7 3/3 Running 0 7h8m
kube-system kube-dns-autoscaler-77bc5fd84-8rvss 1/1 Running 0 7h8m
kube-system metrics-server-58bd5dd8d7-s5pcf 1/1 Running 0 7h8m
kube-system rke-ingress-controller-deploy-job-rkdgf 0/1 Completed 0 9h
kube-system rke-kube-dns-addon-deploy-job-8l4xq 0/1 Completed 0 9h
kube-system rke-metrics-addon-deploy-job-62fmt 0/1 Completed 0 9h
kube-system rke-network-plugin-deploy-job-98tdb 0/1 Completed 0 9h

至此,升级完成。

4. 版本回滚

官方的回退指南高可用恢复

4.1 从本地快照还原rke集群

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@localhost rke2.2.2]# ./rke etcd snapshot-restore --name etcd-2022021602.db --config cluster.yml
INFO[0000] Restoring etcd snapshot etcd-2022021602.db
INFO[0000] Successfully Deployed state file at [./cluster.rkestate]
INFO[0000] [dialer] Setup tunnel for host [192.168.10.20]
INFO[0007] [etcd] starting backup server on host [192.168.10.20]
INFO[0007] [etcd] Successfully started [etcd-Serve-backup] container on host [192.168.10.20]
INFO[0012] [remove/etcd-Serve-backup] Successfully removed container on host [192.168.10.20]
INFO[0012] [etcd] Checking if all snapshots are identical
...
INFO[0076] [ingress] ingress controller nginx deployed successfully
INFO[0076] [addons] Setting up user addons
INFO[0076] [addons] no user addons defined
INFO[0076] Finished building Kubernetes cluster successfully
INFO[0076] Restarting network, ingress, and metrics pods
INFO[0078] Finished restoring snapshot [etcd-2022021602.db] on all etcd hosts

等待两个anget容器恢复running状态

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@localhost rke2.2.2]# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-system cattle-cluster-agent-588959c4bf-hpgfr 1/1 Running 2 40s
cattle-system cattle-node-agent-4h6kx 1/1 Running 3 26h
cattle-system rancher-6ff949dc75-6cfbj 1/1 Running 0 42h
cattle-system rancher-6ff949dc75-jnn87 1/1 Running 0 42h
cattle-system rancher-6ff949dc75-pcz7c 1/1 Running 0 42h
ingress-nginx default-http-backend-5954bd5d8c-9gjz7 1/1 Running 3 46h
ingress-nginx nginx-ingress-controller-7td9m 1/1 Running 0 26s
kube-system canal-xv2sd 2/2 Running 0 40s
kube-system coredns-86bc4b7c96-kcnqk 1/1 Running 0 40s
kube-system coredns-autoscaler-5d5d49b8ff-g6h7n 1/1 Running 0 40s
kube-system metrics-server-7f6bd4c888-zv8fc 1/1 Running 0 40s
kube-system rke-coredns-addon-deploy-job-zj72v 0/1 Completed 0 46h
kube-system rke-ingress-controller-deploy-job-sbprl 0/1 Completed 0 46h
kube-system rke-metrics-addon-deploy-job-xtf8m 0/1 Completed 0 46h
kube-system rke-network-plugin-deploy-job-pd9vg 0/1 Completed 0 46h
kube-system tiller-deploy-7cb87ddf7d-vzr4j 1/1 Running 3 46h

4.2 恢复负载集群

选择负载集群恢复找到备份的时间,点击恢复

等待集群更新

等待集群恢复查看rancher-agent版本

至此,集群恢复完成。

5. 升级和回滚负载集群测试

创建2个负载容器

用死循环模拟不间断请求

1
2
3
4
5
6
7
8
9
10
[root@localhost ~]# while true; do curl -I http://192.168.10.21:31274/ ;done
HTTP/1.1 200 OK
Server: nginx/1.21.5
Date: Thu, 17 Feb 2022 01:35:59 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Tue, 28 Dec 2021 18:48:00 GMT
Connection: keep-alive
ETag: "61cb5be0-267"
Accept-Ranges: bytes

6. 总结

在升级到 v2.3.0 时,第一次修改通过 Rancher v2.3.0 之前版本部署的 RKE 集群时,由于要向系统组件中加入 Tolerations,该集群全部的系统组件将会自动重启。升级前请预先做好数据备份。

以上为官方版本说明,经过测试rancher在升级过程中请求不会发生中断,回滚过程中负载集群所有pod都会重新创建模拟请求会发生中断。