Rancher 的異常排解紀錄
無法正確啟動的判別方式
rkeuser@iiidevops4:~$ kubectl get pod -n cattle-system NAME READY STATUS RESTARTS AGE cattle-cluster-agent-6bf6f8fcc4-sznpp 1/1 Running 0 18m cattle-node-agent-79nrh 1/1 Running 23 67d cattle-node-agent-ch6pn 1/1 Running 23 67d cattle-node-agent-jr5bq 1/1 Running 7 7d20h cattle-node-agent-k2fcs 1/1 Running 26 67d rancher-98d8d5cf5-hbjjv 1/1 Running 1 25m rancher-98d8d5cf5-nhlwz 0/1 CrashLoopBackOff 8 25m rancher-98d8d5cf5-zjbzs 0/1 Running 0 105s
- 找出哪個 rancher pod 是 leader
$ kubectl describe configMap cattle-controllers -n kube-system Name: cattle-controllers Namespace: kube-system Labels: <none> Annotations: control-plane.alpha.kubernetes.io/leader: {"holderIdentity":"rancher-98d8d5cf5-hbjjv","leaseDurationSeconds":45,"acquireTime":"2021-09-08T06:40:25Z","renewTime":"2021-09-08T07:02:5... Data ==== Events: <none>
- 可以看到目前的 leader : rancher-98d8d5cf5-hbjjv , 所以可以看一下這 pod 的紀錄
$ kubectl logs rancher-98d8d5cf5-hbjjv -n cattle-system 2021/09/08 06:38:27 [INFO] Rancher version v2.4.15 (cdb64d640) is starting 2021/09/08 06:38:27 [INFO] Rancher arguments {ACMEDomains:[] AddLocal:auto Embedded:false HTTPListenPort:80 HTTPSListenPort:443 K8sMode:auto Debug:false Trace:false NoCACerts:false AuditLog Path:/var/log/auditlog/rancher-api-audit.log AuditLogMaxage:10 AuditLogMaxsize:100 AuditLogMaxbackup:10 AuditLevel:0 Features:} 2021/09/08 06:38:27 [INFO] Listening on /tmp/log.sock I0908 06:38:27.719747 6 http.go:122] HTTP2 has been explicitly disabled : 2021/09/08 06:56:18 [ERROR] AppController p-gn54t/test-20210831-master-sq [helm-controller] failed with : Get "https://10.43.0.1:443/apis/project.cattle.io/v3/namespaces/p-gn54t/apprevisions?labelSelector=io.cattle.field%!F(MISSING)appId%!D(MISSING)test-20210831-master-sq&timeout=30s": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 2021/09/08 06:57:04 [ERROR] PipelineExecutionController p-gn54t/p-qp9qq-1 [pipeline-execution-controller] failed with : pipeline.project.cattle.io "p-gn54t/p-qp9qq" not found 2021/09/08 07:01:20 [ERROR] PipelineExecutionController p-gn54t/p-qp9qq-1 [pipeline-execution-controller] failed with : pipeline.project.cattle.io "p-gn54t/p-qp9qq" not found
不小心砍了 pipeline 的 jenlins POD
- 假設以下的 jenkins POD 不見了! PIPELINE 就無法啟動運行
~$ kubectl get namespace | grep pipeline cattle-pipeline Active 66d p-gn54t-pipeline Active 66d ~$ kubectl get pod -n p-gn54t-pipeline NAME READY STATUS RESTARTS AGE docker-registry-57fbddc6cc-drt29 1/1 Running 4 66d jenkins-75cf8d9966-m2vc8 1/1 Running 0 168m minio-7b7866c65f-7hpl5 1/1 Running 0 167m
- 只要將 pipeline 這個 namespace Exp. p-gn54t-pipeline 刪除, 就會自動建立回來
Rancher 異常無法啟動重新安裝
- 環境 : rke / helm 安裝的 rancher
- 透過 helm uninstall 後, 再執行 helm install 後依然無法正常啟動
- 參考這篇乾淨移除 Rancher與這篇Rancher 中的 CRD說明後, 依照以下的處理方式就能解決
- 刪除 crd 的 dynamicschemas.management.cattle.io
- 刪除 cert-manager 和 cattle-system namespace
- 重新安裝 rancher
修改 Rancher server url 的方式
- 進入 rancher 進階設定頁面
https://<old_rancher_hostname>/g/settings/advanced
- 找到 server-url 進行編輯