目錄表

安裝 K3s + Rancher WebUI

安裝程序

前置準備 (所有節點)

  1. 更新系統套件

    sudo apt update && sudo apt upgrade -y

  2. 設定主機名稱與 hosts 檔案

    # VM1 (Master)
    sudo hostnamectl set-hostname k3s-master-171
    
    # VM2 (Worker)
    sudo hostnamectl set-hostname k3s-worker-172
    
    # VM3 (Worker)
    sudo hostnamectl set-hostname k3s-worker-173

  3. 編輯 /etc/hosts (所有節點)

    sudo vi /etc/hosts

    Exp. 192.168.1.171 ~ 173

    192.168.1.171  k3s-master-171
    192.168.1.172  k3s-worker-172
    192.168.1.173  k3s-worker-173
  4. 關閉 Swap (所有節點)

    sudo swapoff -a
    sudo sed -i '/swap/!b; /^#/b; s/^/#/' /etc/fstab

  5. 設定防火牆規則 (若有啟用 UFW)

    # Master 節點
    sudo ufw allow 6443/tcp  # Kubernetes API
    sudo ufw allow 2379:2380/tcp  # etcd
    sudo ufw allow 10250/tcp  # Kubelet
    sudo ufw allow 80/tcp  # Rancher HTTP
    sudo ufw allow 443/tcp  # Rancher HTTPS
    
    # Worker 節點
    sudo ufw allow 10250/tcp  # Kubelet
    sudo ufw allow 30000:32767/tcp  # NodePort Services

安裝 K3s

Master 節點 (VM1)

  1. 安裝 K3s Server

    curl -sfL https://get.k3s.io | sh -s - server \
      --write-kubeconfig-mode 644 \
      --disable traefik

    • 停用內建 Traefik (Rancher 會使用自己的 Ingress)
  2. 驗證安裝

    sudo systemctl status k3s
    kubectl get nodes

  3. 取得 Node Token (用於 Worker 加入)

    sudo cat /var/lib/rancher/k3s/server/node-token

    記錄此 Token,稍後 Worker 節點會使用

Worker 節點 (VM2 & VM3)

  1. 安裝 K3s Agent Exp. Master 節點的 IP 位址:192.168.1.171 , 從 Master 取得的 Token xxxxxxxxxx

    curl -sfL https://get.k3s.io | K3S_URL=https://192.168.1.171:6443 \
      K3S_TOKEN=xxxxxxxxxx sh -

  2. 驗證 Worker 加入狀態, 在 Master 節點執行

    kubectl get nodes

    應該會看到三個節點都處於 Ready 狀態

安裝 Rancher WebUI

在 Master 節點 (VM1) 執行

  1. 安裝 Helm

    curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
    helm version

  2. 設定 K3s 權限

    # 1. 設定永久環境變數
    echo 'export KUBECONFIG=/etc/rancher/k3s/k3s.yaml' >> ~/.bashrc
    # 2. 重新載入配置
    source ~/.bashrc
    # 3. 驗證
    kubectl version
    kubectl get nodes
    helm version
    helm list -A

  3. 新增 Rancher Helm Repository

    helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
    helm repo update

  4. 建立 Rancher 命名空間

    kubectl create namespace cattle-system

  5. 安裝 cert-manager (用於 SSL 憑證管理)

    helm repo add jetstack https://charts.jetstack.io
    helm repo update
    helm install cert-manager jetstack/cert-manager \
      --namespace cert-manager \
      --create-namespace \
      --set crds.enabled=true

  6. 驗證 cert-manager 安裝

    kubectl get pods --namespace cert-manager
    kubectl get crd | grep cert-manager

    等待所有 Pod 都處於 Running 狀態。

  7. 安裝 Nginx Ingress Controller

    # 安裝 Nginx Ingress
    helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
    helm repo update
    helm install ingress-nginx ingress-nginx/ingress-nginx \
      --namespace ingress-nginx \
      --create-namespace \
      --set controller.hostNetwork=true \
      --set controller.kind=DaemonSet \
      --set controller.service.type=ClusterIP
    
    # 等待部署完成
    kubectl wait --namespace ingress-nginx \
      --for=condition=ready pod \
      --selector=app.kubernetes.io/component=controller \
      --timeout=120s
    
    # 檢查 Nginx Ingress Pod
    kubectl get pods -n ingress-nginx

  8. 安裝 Rancher Exp. hostname : rancher.ichiayi.com, 預設 admin 密碼 admin@123

    helm install rancher rancher-stable/rancher \
      --namespace cattle-system \
      --set hostname=rancher.ichiayi.com \
      --set replicas=1 \
      --set ingress.tls.source=secret \
      --set bootstrapPassword="admin@123" \
      --set ingress.ingressClassName=nginx

  9. 驗證 Rancher 部署狀態

    kubectl -n cattle-system rollout status deploy/rancher
    kubectl -n cattle-system get pods

存取 Rancher WebUI

  1. 瀏覽器開啟 Rancher URL Exp. https://rancher.ichiayi.com
  2. 登入 Rancher
    • 使用預設密碼: admin@123
    • 首次登入會要求設定新密碼
  3. 查看叢集狀態
    • 登入後會看到 local 叢集 (即當前 K3s 叢集),點選進入可管理所有節點、工作負載和服務。

常用管理指令

Storage 設定

NFS Subdir External Provisioner (動態佈建)

提供給 app1 一個可以永久儲存的空間

  1. 為 app1 建立專用的 PVC Exp. app1-pvc.yaml
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: app1-data-pvc
      namespace: default  # 改成您的 namespace
    spec:
      storageClassName: nfs-client
      accessModes:
        - ReadWriteMany
      resources:
        requests:
          storage: 10Gi  # 根據需求調整大小(只是註記, 無法真的限制)
    kubectl apply -f app1-pvc.yaml
  2. 驗證 PV 和 PVC 狀態

    kubectl get pv
    kubectl get pvc -n default

    Exp.

    jonathan@k3s-master-171:~/app1$ kubectl get pv
    NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                         STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE
    pv-nfs-subdir-external-provisioner         10Mi       RWO            Retain           Bound    default/pvc-nfs-subdir-external-provisioner                  <unset>                          4m10s
    pvc-ea1739ec-04dd-4549-952b-490bf07ec186   10Gi       RWX            Delete           Bound    default/app1-data-pvc                         nfs-client     <unset>                          56s
    jonathan@k3s-master-171:~/app1$ kubectl get pvc -n default
    NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
    app1-data-pvc                         Bound    pvc-ea1739ec-04dd-4549-952b-490bf07ec186   10Gi       RWX            nfs-client     <unset>                 63s
    pvc-nfs-subdir-external-provisioner   Bound    pv-nfs-subdir-external-provisioner         10Mi       RWO                           <unset>                 4m17s

    在 NFS Server 上建立的路徑為 /sharenfsdir/{PVC.namespace}/${.PVC.name} Exp.

    swarm-nfs-159:/swarmdata# tree | more
    .
    ├── default
    │   └── app1-data-pvc
    :

  3. 部署應用 Exp. app1-deployment.yaml
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: app1
      namespace: default
      labels:
        app: app1
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: app1
      template:
        metadata:
          labels:
            app: app1
        spec:
          containers:
          - name: app1
            image: busybox:latest
            command: ["/bin/sh"]
            args:
            - "-c"
            - |
              # 建立測試檔案
              echo "Container started at $(date)" > /data/startup.log
              echo "DATA_DIR: $DATA_DIR" >> /data/startup.log
     
              # 每 60 秒寫入一次心跳
              while true; do
                echo "Heartbeat: $(date)" >> /data/heartbeat.log
                ls -la /data/ > /data/file-list.txt
                sleep 60
              done
            volumeMounts:
            - name: app1-data
              mountPath: /data
            env:
            - name: DATA_DIR
              value: /data
            resources:
              requests:
                memory: "64Mi"
                cpu: "100m"
              limits:
                memory: "128Mi"
                cpu: "200m"
          volumes:
          - name: app1-data
            persistentVolumeClaim:
              claimName: app1-data-pvc
    kubectl apply -f app1-deployment.yaml
  4. 檢查 Pod 狀態

    kubectl get pods -n default
    kubectl describe pod <pod-name> -n default

    查看 NFS 目錄內是否有正確產生檔案

    swarm-nfs-159:/swarmdata/default/app1-data-pvc# ls -lt
    total 12
    -rw-r--r--    1 root     root           339 Nov 26 11:40 file-list.txt
    -rw-r--r--    1 root     root            80 Nov 26 11:40 heartbeat.log
    -rw-r--r--    1 root     root            66 Nov 26 11:39 startup.log
    swarm-nfs-159:/swarmdata/default/app1-data-pvc# cat startup.log
    Container started at Wed Nov 26 03:39:31 UTC 2025
    DATA_DIR: /data
    swarm-nfs-159:/swarmdata/default/app1-data-pvc# cat heartbeat.log
    Heartbeat: Wed Nov 26 03:39:31 UTC 2025
    Heartbeat: Wed Nov 26 03:40:31 UTC 2025
    swarm-nfs-159:/swarmdata/default/app1-data-pvc# cat file-list.txt
    total 16
    drwxrwxrwx    2 root     root          4096 Nov 26 03:39 .
    drwxr-xr-x    1 root     root          4096 Nov 26 03:39 ..
    -rw-r--r--    1 root     root             0 Nov 26 03:40 file-list.txt
    -rw-r--r--    1 root     root            80 Nov 26 03:40 heartbeat.log
    -rw-r--r--    1 root     root            66 Nov 26 03:39 startup.log

  5. 驗證與除錯

    # 檢查 PVC 是否綁定成功
    kubectl get pvc app1-data-pvc -n default
    
    # 查看 Pod-name
    kubectl get pods -n default | grep app1
    
    # 查看 Pod 內的掛載情況 Exp. app1-584b58d766-qwrqk
    kubectl exec -it app1-584b58d766-qwrqk -n default -- df -h
    
    # 測試寫入
    kubectl exec -it app1-584b58d766-qwrqk -n default -- sh -c "echo 'test' > /data/test.txt"
    
    # 在 NFS Server 上確認
    # 檢查檔案是否出現在 192.168.1.159:/swarmdata/default/app1-data-pvc/test.txt

移除設定給 app1 使用的永久儲存空間

  1. 停止使用 PVC 的應用

    # 先刪除正在使用 PVC 的 Deployment/Pod
    kubectl delete deployment app1 -n default
    
    # 確認 Pod 已完全終止
    kubectl get pods -n default | grep app1
    
    # 如果有 StatefulSet 或其他資源也在使用,也需要刪除
    kubectl get all -n default | grep app1

  2. 刪除 PVC

    # 刪除 PVC
    kubectl delete pvc app1-data-pvc -n default
    
    # 檢查 PVC 狀態
    kubectl get pvc -n default

    如果 PVC 卡在 Terminating 狀態:

    # 查看是否有 finalizer 阻止刪除
    kubectl get pvc app1-data-pvc -n default -o yaml | grep finalizers
    
    # 如果需要強制刪除 (謹慎使用)
    kubectl patch pvc app1-data-pvc -n default -p '{"metadata":{"finalizers":null}}'

  3. 清理 NFS Server 上的檔案
    • 進入 NFS Server 到將產生給 app1 使用的目錄移除
    • 檔案通常在 /nfssharedir/namespace/pvc-name/ 下 Exp. /swarmdata/default/app1-data-pvc/

常見問題

1. 如何乾淨移除 Rancher Web UI

  1. 使用 helm uninstall 移除

    helm uninstall rancher -n cattle-system
    # 可能會出現 Error: uninstallation completed with 1 error(s): 1 error occurred: * job rancher-post-delete failed: BackoffLimitExceeded

  2. 使用 kubectl 命令移除與檢查

    # Step 1:刪除卡住的 post-delete job
    kubectl -n cattle-system delete job rancher-post-delete
    # Step 2:手動刪除 Rancher 相關所有資源
    # 刪除 deployment / pod
    kubectl -n cattle-system delete deployment rancher
    kubectl -n cattle-system delete pod -l app=rancher
    # 刪 webhook
    kubectl -n cattle-system delete deployment rancher-webhook
    # 刪所有 secret(⚠️不會刪掉 cluster,其它 workload 都不會受影響)
    kubectl -n cattle-system delete secret --all
    # 刪 configmap
    kubectl -n cattle-system delete configmap --all
    # Step 3:確保 namespace 乾淨
    kubectl get all -n cattle-system
    # 應該只剩下 K3s 建立的一個 service, 如果有其他的 Job or Pod 要全部刪除

2. 如何更新 Rancher Web UI SSL 憑證

  1. 取得 Cloudflare API Token Exp. 具有編輯 DNS : ichiayi.com 權限的 Token Exp. xxxxxxcfapitkoenxxxxxx
  2. 創建 Cloudflare API Token Secret

    kubectl create secret generic cloudflare-api-token-secret \
      --from-literal=api-token=xxxxxxcfapitkoenxxxxxx \
      -n cert-manager

  3. 創建 ClusterIssuer Exp. letsencrypt-cloudflare-issuer.yaml
    apiVersion: cert-manager.io/v1
    kind: ClusterIssuer
    metadata:
      name: letsencrypt-prod
    spec:
      acme:
        # Let's Encrypt production server
        server: https://acme-v02.api.letsencrypt.org/directory
        email: [email protected]  # 修改為您的 email
        privateKeySecretRef:
          name: letsencrypt-prod
        solvers:
        - dns01:
            cloudflare:
              apiTokenSecretRef:
                name: cloudflare-api-token-secret
                key: api-token

    套用設定

    kubectl apply -f letsencrypt-cloudflare-issuer.yaml
  4. 為 Rancher 創建 Certificate Exp. rancher-certificate.yaml
    apiVersion: cert-manager.io/v1
    kind: Certificate
    metadata:
      name: rancher-tls
      namespace: cattle-system
    spec:
      secretName: tls-rancher-ingress  # Rancher 使用的 secret 名稱
      issuerRef:
        name: letsencrypt-prod
        kind: ClusterIssuer
      commonName: rancher.ichiayi.com  # 修改為您的域名
      dnsNames:
      - rancher.ichiayi.com            # 修改為您的域名

    套用設定

    kubectl apply -f rancher-certificate.yaml
  5. 驗證憑證狀態

    # 查看 Certificate 狀態
    kubectl get certificate -n cattle-system
    
    # 查看詳細資訊
    kubectl describe certificate rancher-tls -n cattle-system
    
    # 查看 cert-manager 日誌
    kubectl logs -n cert-manager -l app=cert-manager -f

  6. 查看憑證續期狀態

    kubectl get certificate -n cattle-system -w

2-1 如何建立給其他服務通用的 SSL 憑證

  1. DNS 建立一筆萬用記錄, 對應到 K3s Node 的 IP Exp. *.k3s.ichiayi.com → 192.168.1.171
  2. 沿用上面的 Cloudflare API Token Secret / ClusterIssuer
  3. 建立萬用字元憑證 Exp. *.k3s.ichiayi.com → k3s-certificate.yaml
    apiVersion: cert-manager.io/v1
    kind: Certificate
    metadata:
      name: wildcard-k3s-ichiayi-com
      namespace: default  # 或你要使用的 namespace
    spec:
      secretName: wildcard-k3s-ichiayi-com-tls
      issuerRef:
        name: letsencrypt-prod
        kind: ClusterIssuer
      commonName: "*.k3s.ichiayi.com"
      dnsNames:
      - "*.k3s.ichiayi.com"
    kubectl apply -f k3s-certificate.yaml

3. 如何備份 Rancher Web UI

  1. 透過 Web UI 的 App Chart 安裝 Rancher Backups
  2. 在選單新增的 Rancher Backups 選項點選 Buckups → Create → 選擇備份的目標 Exp. StorageClasses → Edit YAML 設定每 8 小時備份一次

4. 如何進行 Rancher Web UI 更新

  1. 更新 Helm Repository

    helm repo update

  2. 查看可用的版本

    helm search repo rancher-stable/rancher --versions

  3. 備份當前配置

    kubectl get all -n cattle-system -o yaml > rancher-backup.yaml

  4. 執行更新

    helm upgrade rancher rancher-stable/rancher \
      --namespace cattle-system \
      --reuse-values

  5. 驗證更新狀態

    kubectl -n cattle-system rollout status deploy/rancher
    kubectl -n cattle-system get pods

5. 如何設定與取消 K3s 自動更新

設定 K3s 自動更新

  1. 安裝 System Upgrade Controller

    kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml

  2. 建立自動升級計畫(監看 k3s 的 stable channel 版本自動升級)

    cat <<EOF | kubectl apply -f -
    ---
    # Server 升級計畫
    apiVersion: upgrade.cattle.io/v1
    kind: Plan
    metadata:
      name: server-plan
      namespace: system-upgrade
    spec:
      concurrency: 1  # 一次升級一個節點
      cordon: true
      nodeSelector:
        matchExpressions:
        - key: node-role.kubernetes.io/control-plane
          operator: In
          values:
          - "true"
      serviceAccountName: system-upgrade
      upgrade:
        image: rancher/k3s-upgrade
      channel: https://update.k3s.io/v1-release/channels/stable
      drain:
        force: false
        ignoreDaemonSets: true
        deleteLocalData: true
        timeout: 300s  # 5 分鐘超時
    ---
    # Agent 升級計畫
    apiVersion: upgrade.cattle.io/v1
    kind: Plan
    metadata:
      name: agent-plan
      namespace: system-upgrade
    spec:
      concurrency: 1  # 一次只升級一個 agent
      cordon: true
      nodeSelector:
        matchExpressions:
        - key: node-role.kubernetes.io/control-plane
          operator: DoesNotExist
      prepare:
        args:
        - prepare
        - server-plan
        image: rancher/k3s-upgrade
      serviceAccountName: system-upgrade
      upgrade:
        image: rancher/k3s-upgrade
      channel: https://update.k3s.io/v1-release/channels/stable
      drain:
        force: false
        ignoreDaemonSets: true
        deleteLocalData: true
        timeout: 300s
    EOF

  3. 查看升級進度

    # 查看升級計畫
    kubectl get plans -n system-upgrade
    
    # 查看升級任務
    kubectl get jobs -n system-upgrade
    
    # 查看節點狀態
    watch kubectl get nodes

    • 看執行命令的輸出結果

取消 K3s 自動更新

  1. 刪除 Plan(停止所有自動升級)

    kubectl delete plan server-plan agent-plan -n system-upgrade

  2. 修改為固定版本 Exp. v1.33.6+k3s1(不再自動追蹤新版本)

    kubectl patch plan server-plan -n system-upgrade --type=merge -p '{"spec":{"version":"v1.33.6+k3s1","channel":null}}'

  3. 刪除整個 controller(完全停用)

    kubectl delete ns system-upgrade

6. 如何設定 K3s 自動更新結果透過 Discord 通知

  1. 修改配置並部署

    # 下載 k3s-discord-notifier.yaml
    curl -o k3s-discord-notifier.yaml https://raw.githubusercontent.com/tryweb/k3s/refs/heads/main/systools/k3s-discord-notifier.yaml
    
    # 替換你的 Discord Webhook URL Exp. https://discord.com/api/webhooks/144xxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxV5ffPyEp
    sed -i 's|https://discord.com/api/webhooks/YOUR_WEBHOOK_ID/YOUR_WEBHOOK_TOKEN|https://discord.com/api/webhooks/144xxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxV5ffPyEp|' k3s-discord-notifier.yaml
    
    # 修改叢集名稱(可選) Exp. ichiayi K3s
    sed -i 's|我的 K3s 叢集|ichiayi K3s|' k3s-discord-notifier.yaml
    
    # 部署 Discord 通知
    kubectl apply -f k3s-discord-notifier.yaml

  2. 驗證部署

    # 檢查 notifier 是否運行
    kubectl get deployment -n system-upgrade k3s-upgrade-notifier
    
    # 查看日誌
    kubectl logs -n system-upgrade -l app=k3s-upgrade-notifier -f
    
    # 測試 Discord 升級成功通知
    cat <<EOF | kubectl apply -f -
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: test-notify-success
      namespace: system-upgrade
      labels:
        upgrade.cattle.io/plan: "test-plan"
        upgrade.cattle.io/node: "test-node"
    spec:
      template:
        metadata:
          labels:
            upgrade.cattle.io/plan: "test-plan"
        spec:
          containers:
          - name: test
            image: busybox
            command: ["sh", "-c", "echo 'Upgrade successful'; sleep 5"]
          restartPolicy: Never
      backoffLimit: 0
    EOF
    
    # 測試 Discord 升級失敗通知
    cat <<EOF | kubectl apply -f -
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: test-notify-fail
      namespace: system-upgrade
      labels:
        upgrade.cattle.io/plan: "test-plan"
        upgrade.cattle.io/node: "test-node"
    spec:
      template:
        metadata:
          labels:
            upgrade.cattle.io/plan: "test-plan"
        spec:
          containers:
          - name: test
            image: busybox
            command: ["sh", "-c", "echo 'Error: Upgrade failed!'; exit 1"]
          restartPolicy: Never
      backoffLimit: 0
    EOF
    
    
    清理測試 Job
    # 刪除測試 Job
    kubectl delete job test-notify-success test-notify-fail -n system-upgrade

7. 如何確認目前K3s 穩定版最新的版本

8. 如何重啟 K3s cluster 主機