====== Rancher 的異常排解紀錄 ======
===== 無法正確啟動的判別方式 =====
*
rkeuser@iiidevops4:~$ kubectl get pod -n cattle-system
NAME READY STATUS RESTARTS AGE
cattle-cluster-agent-6bf6f8fcc4-sznpp 1/1 Running 0 18m
cattle-node-agent-79nrh 1/1 Running 23 67d
cattle-node-agent-ch6pn 1/1 Running 23 67d
cattle-node-agent-jr5bq 1/1 Running 7 7d20h
cattle-node-agent-k2fcs 1/1 Running 26 67d
rancher-98d8d5cf5-hbjjv 1/1 Running 1 25m
rancher-98d8d5cf5-nhlwz 0/1 CrashLoopBackOff 8 25m
rancher-98d8d5cf5-zjbzs 0/1 Running 0 105s
- 找出哪個 rancher pod 是 leader
$ kubectl describe configMap cattle-controllers -n kube-system
Name: cattle-controllers
Namespace: kube-system
Labels:
Annotations: control-plane.alpha.kubernetes.io/leader:
{"holderIdentity":"rancher-98d8d5cf5-hbjjv","leaseDurationSeconds":45,"acquireTime":"2021-09-08T06:40:25Z","renewTime":"2021-09-08T07:02:5...
Data
====
Events:
- 可以看到目前的 leader : rancher-98d8d5cf5-hbjjv , 所以可以看一下這 pod 的紀錄
$ kubectl logs rancher-98d8d5cf5-hbjjv -n cattle-system
2021/09/08 06:38:27 [INFO] Rancher version v2.4.15 (cdb64d640) is starting
2021/09/08 06:38:27 [INFO] Rancher arguments {ACMEDomains:[] AddLocal:auto Embedded:false HTTPListenPort:80 HTTPSListenPort:443 K8sMode:auto Debug:false Trace:false NoCACerts:false AuditLog
Path:/var/log/auditlog/rancher-api-audit.log AuditLogMaxage:10 AuditLogMaxsize:100 AuditLogMaxbackup:10 AuditLevel:0 Features:}
2021/09/08 06:38:27 [INFO] Listening on /tmp/log.sock
I0908 06:38:27.719747 6 http.go:122] HTTP2 has been explicitly disabled
:
2021/09/08 06:56:18 [ERROR] AppController p-gn54t/test-20210831-master-sq [helm-controller] failed with : Get "https://10.43.0.1:443/apis/project.cattle.io/v3/namespaces/p-gn54t/apprevisions?labelSelector=io.cattle.field%!F(MISSING)appId%!D(MISSING)test-20210831-master-sq&timeout=30s": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021/09/08 06:57:04 [ERROR] PipelineExecutionController p-gn54t/p-qp9qq-1 [pipeline-execution-controller] failed with : pipeline.project.cattle.io "p-gn54t/p-qp9qq" not found
2021/09/08 07:01:20 [ERROR] PipelineExecutionController p-gn54t/p-qp9qq-1 [pipeline-execution-controller] failed with : pipeline.project.cattle.io "p-gn54t/p-qp9qq" not found
===== 不小心砍了 pipeline 的 jenlins POD =====
* 假設以下的 jenkins POD 不見了! PIPELINE 就無法啟動運行
~$ kubectl get namespace | grep pipeline
cattle-pipeline Active 66d
p-gn54t-pipeline Active 66d
~$ kubectl get pod -n p-gn54t-pipeline
NAME READY STATUS RESTARTS AGE
docker-registry-57fbddc6cc-drt29 1/1 Running 4 66d
jenkins-75cf8d9966-m2vc8 1/1 Running 0 168m
minio-7b7866c65f-7hpl5 1/1 Running 0 167m
* 只要將 pipeline 這個 namespace Exp. p-gn54t-pipeline 刪除, 就會自動建立回來
* 參考 - https://github.com/rancher/rancher/issues/18779
===== Rancher 異常無法啟動重新安裝 =====
* 環境 : rke / helm 安裝的 rancher
* 透過 helm uninstall 後, 再執行 helm install 後依然無法正常啟動
* 參考這篇**[[https://www.cnblogs.com/37yan/p/14275214.html|乾淨移除 Rancher]]**與這篇**[[https://rancher.com/blog/2018/2018-07-09-rancher-management-plane-architecture/|Rancher 中的 CRD]]**說明後, 依照以下的處理方式就能解決
- 刪除 crd 的 dynamicschemas.management.cattle.io
- 刪除 cert-manager 和 cattle-system namespace
- 重新安裝 rancher
===== 修改 Rancher server url 的方式 =====
* 參考 - https://gist.github.com/janeczku/d3b9eed3b1dee7863b66fba3367a1bd4
* 進入 rancher 進階設定頁面 https:///g/settings/advanced
* 找到 server-url 進行編輯
{{tag>rancher}}