接上文:手动搭建高可用的kubernetes集群(三)
10. 部署Dashboard 插件 官方文件目录:kubernetes/cluster/addons/dashboard
使用的文件如下:
1 2 $ ls *.yaml dashboard-controller.yaml dashboard-rbac.yaml dashboard-service.yaml
新加了 dashboard-rbac.yaml 文件,定义 dashboard 使用的 RoleBinding。
由于 kube-apiserver 启用了 RBAC 授权,而官方源码目录的 dashboard-controller.yaml 没有定义授权的 ServiceAccount,所以后续访问 kube-apiserver 的 API 时会被拒绝,前端界面提示:
403
解决办法是:定义一个名为dashboard 的ServiceAccount,然后将它和Cluster Role view 绑定:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 $ cat > dashboard-rbac.yaml<<EOF apiVersion: v1 kind: ServiceAccount metadata: name: dashboard namespace: kube-system --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1alpha1 metadata: name: dashboard subjects: - kind: ServiceAccount name: dashboard namespace: kube-system roleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.io EOF
配置dashboard-controller 1 2 20a21 > serviceAccountName : dashboard
使用名为 dashboard 的自定义 ServiceAccount
配置dashboard-service 1 2 3 $ diff dashboard-service .yaml.orig dashboard-service .yaml10 a11> type : NodePort
指定端口类型为 NodePort,这样外界可以通过地址 nodeIP:nodePort 访问 dashboard
执行所有定义文件 1 2 3 4 5 $ pwd /home/ych/k8s-repo /dashboard $ ls *.yamldashboard-controller .yaml dashboard-rbac .yaml dashboard-service .yaml $ kubectl create -f .
检查执行结果 查看分配的 NodePort
1 2 3 $ kubectl get services kubernetes-dashboard -n kube-system NAME TYPE CLUSTER -IP EXTERNAL -IP PORT(S) AGEkubernetes-dashboard NodePort 10.254 .104 .90 <none > 80 :31202 /TCP 1 m
NodePort 31202映射到dashboard pod 80端口;
检查 controller
1 2 3 4 5 6 $ kubectl get deployment kubernetes-dashboard -n kube-system NAME DESIRED CURRENT UP-TO -DATE AVAILABLE AGEkubernetes-dashboard 1 1 1 1 3 m $ kubectl get pods -n kube-system | grep dashboard kubernetes-dashboard-6667 f9b4c-4 xbpz 1 /1 Running 0 3 m
访问dashboard
kubernetes-dashboard 服务暴露了 NodePort,可以使用 http://NodeIP:nodePort 地址访问 dashboard
通过 kube-apiserver 访问 dashboard
通过 kubectl proxy 访问 dashboard
dashboard ui
由于缺少 Heapster 插件,当前 dashboard 不能展示 Pod、Nodes 的 CPU、内存等 metric 图形
注意:如果你的后端apiserver是高可用的集群模式的话,那么Dashboard的apiserver-host最好手动指定,不然,当你apiserver某个节点挂了的时候,Dashboard可能不能正常访问,如下配置
1 2 3 4 5 6 image : gcr.io/google_containers/kubernetes-dashboard-amd64:v1.7.1 ports : - containerPort: 9090 protocol : TCP args : - --apiserver-host=http://<api_server_ha_addr>:8080
11. 部署Heapster 插件 到heapster release 页面下载最新版的heapster
1 2 $ wget https://gi thub.com/kubernetes/ heapster/archive/ v1.4.3 .tar.gz $ tar -xzvf v1.4.3 .tar.gz
部署相关文件目录:/home/ych/k8s-repo/heapster-1.4.3/deploy/kube-config
1 2 3 $ ls influxdb/ && ls rbac/grafana.yaml heapster.yaml influxdb.yaml heapster-rbac .yaml
为方便测试访问,将grafana.yaml下面的服务类型设置为type=NodePort
执行所有文件 1 2 3 4 5 6 7 8 9 10 $ kubectl create -f rbac/heapster-rbac .yamlclusterrolebinding "heapster" created $ kubectl create -f influxdbdeployment "monitoring-grafana" created service "monitoring-grafana" created serviceaccount "heapster" created deployment "heapster" created service "heapster" created deployment "monitoring-influxdb" created service "monitoring-influxdb" created
检查执行结果 检查 Deployment
1 2 3 4 $ kubectl get deployments -n kube-system | grep -E 'heapster|monitoring' heapster 1 1 1 1 2m monitoring-grafana 1 1 1 0 2m monitoring-influxdb 1 1 1 1 2m
检查 Pods
1 2 3 4 $ kubectl get pods - n kube- system | grep - E 'heapster|monitoring' heapster-7 cf895f48f- p98tk 1 / 1 Running 0 2 m monitoring- grafana- c9d5cd98d- gb9xn 0 / 1 CrashLoopBackOff 4 2 m monitoring- influxdb-67 f8d587dd- zqj6p 1 / 1 Running 0 2 m
我们可以看到monitoring-grafana的POD 是没有执行成功的,通过查看日志可以看到下面的错误信息:
Failed to parse /etc/grafana/grafana.ini, open /etc/grafana/grafana.ini: no such file or directory
要解决这个问题(heapster issues)我们需要将grafana 的镜像版本更改成:gcr.io/google_containers/heapster-grafana-amd64:v4.0.2,然后重新执行,即可正常。
访问 grafana 上面我们修改grafana 的Service 为NodePort 类型:
1 2 3 $ kubectl get svc -n kube-system NAME TYPE CLUSTER -IP EXTERNAL -IP PORT(S) AGEmonitoring-grafana NodePort 10.254 .34 .89 <none > 80 :30191 /TCP 28 m
则我们就可以通过任意一个节点加上上面的30191端口就可以访问grafana 了。
grafana ui
heapster 正确安装后,我们便可以回去看我们的dashboard 是否有图表出现了:
dashboard
12. 安装Ingress Ingress其实就是从kuberenets集群外部访问集群的一个入口,将外部的请求转发到集群内不同的Service 上,其实就相当于nginx、apache 等负载均衡代理服务器,再加上一个规则定义,路由信息的刷新需要靠Ingress controller来提供
Ingress controller可以理解为一个监听器,通过不断地与kube-apiserver打交道,实时的感知后端service、pod 等的变化,当得到这些变化信息后,Ingress controller再结合Ingress的配置,更新反向代理负载均衡器,达到服务发现的作用。其实这点和服务发现工具consul的consul-template非常类似。
部署traefik Traefik是一款开源的反向代理与负载均衡工具。它最大的优点是能够与常见的微服务系统直接整合,可以实现自动化动态配置。目前支持Docker、Swarm、Mesos/Marathon、 Mesos、Kubernetes、Consul、Etcd、Zookeeper、BoltDB、Rest API等等后端模型。
traefik
创建rbac 创建文件:ingress-rbac.yaml,用于service account验证
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 apiVersion: v1 kind: ServiceAccount metadata: name: ingress namespace: kube-system --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: ingress subjects: - kind: ServiceAccount name: ingress namespace: kube-system roleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.io
DaemonSet 形式部署traefik 创建文件:traefik-daemonset.yaml,为保证traefik 总能提供服务,在每个节点上都部署一个traefik,所以这里使用DaemonSet 的形式
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 kind: ConfigMapapiVersion: v1metadata: name: traefik-conf namespace: kube-system data: traefik-config: |- defaultEntryPoints = ["http" ,"https" ] [entryPoints] [entryPoints.http] address = ":80" [entryPoints.http.redirect] entryPoint = "https" [entryPoints.https] address = ":443" [entryPoints.https.tls] [[entryPoints.https.tls.certificates]] CertFile = "/ssl/ssl.crt" KeyFile = "/ssl/ssl.key" - -- kind: DaemonSet apiVersion: extensions/v1beta1 metadata: name: traefik-ingress namespace: kube-system labels: k8s-app: traefik-ingress spec: template: metadata: labels: k8s-app: traefik-ingress name: traefik-ingress spec: terminationGracePeriodSeconds: 60 restartPolicy: Always serviceAccountName: ingress containers: - image: traefik:latest name: traefik-ingress ports: - name: http containerPort: 80 hostPort: 80 - name: https containerPort: 443 hostPort: 443 - name: admin containerPort: 8080 args: - - -configFile= /etc/traefik/traefik.toml - - d - - -web - - -kubernetes - - -logLevel= DEBUG volumeMounts: - name: traefik-config-volume mountPath: /etc/traefik - name: traefik-ssl-volume mountPath: /ssl volumes: - name: traefik-config-volume configMap: name: traefik-conf items: - key: traefik-config path: traefik.toml - name: traefik-ssl-volume secret: secretName: traefik-ssl
注意上面的yaml 文件中我们添加了一个名为traefik-conf的ConfigMap,该配置是用来将http 请求强制跳转成https,并指定https 所需CA 文件地址,这里我们使用secret的形式来指定CA 文件的路径:
1 2 3 4 $ ls ssl.crt ssl.key $ kubectl create secret generic traefik-ssl --from-file =ssl.crt --from-file =ssl.key --namespace =kube-system secret "traefik-ssl" created
创建ingress 创建文件:traefik-ingress.yaml,现在可以通过创建ingress文件来定义请求规则了,根据自己集群中的service 自己修改相应的serviceName 和servicePort
1 2 3 4 5 6 7 8 9 10 11 12 13 apiVersion: extensions/v1beta1 kind: Ingressmetadata: name: traefik-ingress spec: rules: - host: traefik.nginx.io http: paths: - path: / backend: serviceName: my-nginx servicePort: 80
执行创建命令:
1 2 3 4 5 6 7 8 $ kubectl create -f ingress-rbac .yamlserviceaccount "ingress" created clusterrolebinding "ingress" created $ kubectl create -f traefik-daemonset .yamlconfigmap "traefik-conf" created daemonset "traefik-ingress" created $ kubectl create -f traefik-ingress .yamlingress "traefik-ingress" created
Traefik UI 创建文件:traefik-ui.yaml,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 apiVersion: v1 kind: Service metadata: name: traefik-ui namespace: kube-system spec: selector: k8s-app: traefik-ingress ports: - name: web port: 80 targetPort: 8080 --- apiVersion: extensions/v1beta1 kind: Ingress metadata: name: traefik-ui namespace: kube-system spec: rules: - host: traefik-ui.local http: paths: - path: / backend: serviceName: traefik-ui servicePort: web
测试 部署完成后,在本地/etc/hosts添加一条配置:
1 2 xx.xx.xx.xx master03 traefik.nginx.io traefik-ui.local
配置完成后,在本地访问:traefik-ui.local,则可以访问到traefik的dashboard页面:
traefik dashboard
同样的可以访问traefik.nginx.io,得到正确的结果页面:
nginx
上面配置完成后,就可以将我们的所有节点加入到一个SLB中,然后配置相应的域名解析到SLB即可。
13. 问题汇总 1. dashboard无法显示监控图 dashboard 和heapster influxdb都部署完成后 dashboard依旧无法显示监控图 通过排查 heapster log有超时错误
1 2 3 $ kubectl logs -f pods/heapster-2882613285-58d9r -n kube-system E0630 17:23:47.339987 1 reflector.go:203] k8s.io/heapster/metrics/sources/kubelet/kubelet.go:342: Failed to list *api.Node: Get http://kubernetes.default/api/v1/nodes?resourceVersion=0: dial tcp: i/o timeout E0630 17:23:47.340274 1 reflector.go:203] k8s.io/heapster/metrics/heapster.go:319: Failed to list *api.Pod: Get http://kubernetes.default/api/v1/pods?resourceVersion=0: dial tcp: i/o timeout E0630 17:23:47.340498 1 reflector.go:203] k8s.io/heapster/metrics/processors/namespace_based_enricher.go:84: Failed to list *api.Namespace: Get http://kubernetes.default/api/v1/namespaces?resourceVersion=0: dial tcp: lookup kubernetes.default on 10.254.0.2:53: dial udp 10.254.0.2:53: i/o timeout E0630 17:23:47.340563 1 reflector.go:203] k8s.io/heapster/metrics/heapster.go:327: Failed to list *api.Node: Get http://kubernetes.default/api/v1/nodes?resourceVersion=0: dial tcp: lookup kubernetes.default on 10.254.0.2:53: dial udp 10.254.0.2:53: i/o timeout E0630 17:23:47.340623 1 reflector.go:203] k8s.io/heapster/metrics/processors/node_autoscaling_enricher.go💯 Failed to list *api.Node: Get http://kubernetes.default/api/v1/nodes?resourceVersion=0: dial tcp: lookup kubernetes.default on 10.254.0.2:53: dial udp 10.254.0.2:53: i/o timeout E0630 17:23:55.014414 1 influxdb.go:150] Failed to create infuxdb: failed to ping InfluxDB server at "monitoring-influxdb:8086" - Get http://monitoring-influxdb:8086/ping: dial tcp: lookup monitoring-influxdb on 10.254.0.2:53: read udp 172.30.45.4:48955->10.254.0.2:53: i/o timeout `
我是docker的systemd Unit文件忘记添加
1 ExecStart =/root/local/bin/dockerd --log-level =error $DOCKER_NETWORK_OPTIONS
后边的$DOCKER_NETWORK_OPTIONS,导致docker0的网段跟flannel.1不一致。
2. kube-proxy报错kube-proxy[2241]: E0502 15:55:13.889842 2241 conntrack.go:42] conntrack returned error: error looking for path of conntrack: exec: “conntrack”: executable file not found in $PATH 导致现象:kubedns启动成功,运行正常,但是service 之间无法解析,kubernetes中的DNS解析异常
解决方法:CentOS中安装conntrack-tools包后重启kubernetes 集群即可。
3. Unable to access kubernetes services: no route to host 导致现象: 在POD 内访问集群的某个服务的时候出现no route to host
1 2 $ curl my-nginx.nx.svc.cluster .local curl: (7 ) Failed connect to my-nginx.nx.svc.cluster .local :80 ; No route to host
解决方法:清除所有的防火墙规则,然后重启docker 服务
1 2 $ iptables --flush && iptables -tnat --flush $ systemctl restart docker
4. 使用NodePort 类型的服务,只能在POD 所在节点进行访问 导致现象: 使用NodePort 类型的服务,只能在POD 所在节点进行访问,其他节点通过NodePort 不能正常访问
解决方法: kube-proxy 默认使用的是proxy_model就是iptables,正常情况下是所有节点都可以通过NodePort 进行访问的,我这里将阿里云的安全组限制全部去掉即可,然后根据需要进行添加安全限制。
参考资料 和我一步步部署 kubernetes 集群 keepalived 配置 kubernetes issue kubernetes heapster issue
全文完