目标
部署一个docker集群,使用k8s进行管理。
环境
三台机器进行集群。
192.168.254.129 master 虚拟机 内存4G
192.168.254.130 node-1 虚拟机 内存2G
120.77.205.xxx node-3 阿里云 内存2G
环境准备
1、所有节点,关闭防火墙规则,关闭selinux,关闭swap交换,打通所有服务器网络,进行ping测试
关闭防火墙
systemctl stop firewalld
#关闭防火墙
systemctl disable firewalld
#禁止防火墙启动
关闭selinux增强机制
setenforce 0
#关闭selinux
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
#永久关闭swap分区,&符号在sed命令中代表上次匹配的结果
配置iptables的ACCEPT规则
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
关闭swap
swapoff -a
#交换分区必须要关闭
sed -ri 's/.*swap.*/#&/' /etc/fstab
#永久关闭swap分区,&符号在sed命令中代表上次匹配的结果
所有节点,修改各个主机名,并写入所有服务器的hosts中
hostnamectl set-hostname master
#修改主机名
hostnamectl set-hostname node-1
hostnamectl set-hostname node-2
hostnamectl set-hostname node-3
bash
#每个节点修改完后记得刷新重启加载
给hosts文件,并增加ip映射
cat >> /etc/hosts << EOF
192.168.254.129 master
192.168.254.130 node-1
192.168.0.230 node-2
120.77.205.xxx node-3
EOF
所有节点,设置系统参数,调整内核参数,将桥接的 IPV4 流量传递到 iptables 链
#调整内核参数
cat > /etc/sysctl.d/kubernetes.conf << EOF
#开启网桥模式,可将网桥的流量传递给iptables链
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-iptables=1
#关闭ipv6协议
net.ipv6.conf.all.disable_ipv6=1
net.ipv4.ip_forward=1
EOF
sysctl --system #加载参数
2、所有节点,安装Docker
安装依赖
yum install -y yum-utils device-mapper-persistent-data lvm2 #安装依赖
设置镜像源
yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
#yum-config-manager命令——管理软件仓库--add-repo=<软件仓库从指定文件或URL添加(和启用)软件仓库这就是添加一个docker镜像源
安装docker
yum install -y docker-ce docker-ce-cli containerd.io
#安装docker
设置系统管理docker命令
mkdir /etc/docker
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
}
}
EOF
#使用Systemd管理的Cgroup来进行资源控制与管理,因为相对Cgroupfs而言,Systemd限制CPU、内存等资源更加简单和成熟稳定。
#日志使用json-file格式类型存储,大小为100M,保存在/var/log/containers目录下,方便ELK等日志系统收集和管理日志。
重新加载服务配置文件,并启动docker
systemctl daemon-reload
#重新加载某个服务的配置文件,如果新安装了一个服务,归属于 systemctl 管理,要是新服务的服务程序配置文件生效,需重新加载上方已经让docker归属systemctl管理了。
systemctl restart docker.service
#重启docker
systemctl enable docker.service
#设置开机自启
docker info | grep "Cgroup Driver"
#显示Cgroup Driver: system为正确
所有节点,配置k8s镜像源
cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
#定义kubernetes源
所有节点安装kubeadm,kubelet和kubectl
yum install -y kubelet-1.21.3 kubeadm-1.21.3 kubectl-1.21.3
systemctl enable kubelet.service #开机自启kubelet
systemctl start kubelet.service #启动
#K8S通过kubeadm安装出来以后都是以Pod方式存在,即底层是以容器方式运行,所以kubelet必须设置开机自启
master节点操作(master节点)
官网参考:https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm/
kubeadm init \
--apiserver-advertise-address=192.168.254.129 \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version v1.21.3 \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16
【记住初始化时候生成的token信息,各节点凭token加入docker集群】
【若要重新初始化集群状态:kubeadm reset,然后再进行上述操作】
参数说明
kubeadm init \
--apiserver-advertise-address=10.0.0.116 \
#指定master监听的地址,修改为自己的master地址
--image-repository registry.aliyuncs.com/google_containers \
#指定为aliyun的下载源,最好用国内的
--kubernetes-version v1.18.0 \
#指定k8s版本,1.18.0版本比较稳定
--service-cidr=10.96.0.0/12 \
#设置集群内部的网络
--pod-network-cidr=10.244.0.0/16
#设置pod的网络
#service-cidr 和 pod-network-cidr 最好就用这个,不然需要修改后面的 kube-flannel.yaml 文件
如果忘记token,重新生成token
kubeadm token create --print-join-command
#利用kubeadm命令重新生成token,node 节点加入集群需要token
node节点操作(所有node节点)
kube join将node节点加入k8s集群
kubeadm join 192.168.254.129:6443 --token xxxxxx --discovery-token-ca-cert-hash sha256:xxxxxxxxxxxxxxxxxxxxxxxxx
#获取前面创建好的token令牌,所有node节点都执行 此处用xxx代替部分信息
安装部署网络插件(master节点上)
必须安装网络组件否则k8s各个节点都是Noready状态,这里安装安装使用的是flannel网路插件。
将下面这代码放在kube-flannel.yam文件里,再kubectl apply -f发布一下就好了
下载配置文件
wget https://gitcode.net/mirrors/flannel-io/flannel/-/blob/v0.20.2/Documentation/kube-flannel.yml
#下载文件
编辑配置文件 vi kube-flannel.yml,修改为如下配置,主要关注Network",配置为初始化master节点一样就行
---
kind: Namespace
apiVersion: v1
metadata:
name: kube-flannel
labels:
pod-security.kubernetes.io/enforce: privileged
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: flannel
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
- apiGroups:
- "networking.k8s.io"
resources:
- clustercidrs
verbs:
- list
- watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-flannel
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: flannel
namespace: kube-flannel
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-flannel
labels:
tier: node
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{
"Network": "10.244.0.0/16", ####初始化master节点时设置一样就行
"Backend": {
"Type": "vxlan"
}
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds
namespace: kube-flannel
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
hostNetwork: true
priorityClassName: system-node-critical
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni-plugin
#image: flannelcni/flannel-cni-plugin:v1.1.2 #for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.2
command:
- cp
args:
- -f
- /flannel
- /opt/cni/bin/flannel
volumeMounts:
- name: cni-plugin
mountPath: /opt/cni/bin
- name: install-cni
#image: flannelcni/flannel:v0.20.2 #for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel:v0.20.2
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
#image: flannelcni/flannel:v0.20.2 #for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel:v0.20.2
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN", "NET_RAW"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: EVENT_QUEUE_DEPTH
value: "5000"
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
- name: xtables-lock
mountPath: /run/xtables.lock
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni-plugin
hostPath:
path: /opt/cni/bin
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: FileOrCreate
部署插件
kubectl apply -f kube-flannel.yml
部署完成验证结果是否正常执行如下命令,这里显示node-3异常,可能是阿里云服务器问题,具体后面看,
[root@node-3 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node-3 NotReady control-plane,master 23h v1.21.3
查看部署的pod信息
确认状态
kubectl get pods -n kube-system
# 查看所有名称空间kube-system下的pod
结果:
[root@node-3 ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-59d64cd4d4-9c5bj 0/1 Pending 0 23h
coredns-59d64cd4d4-jpt44 0/1 Pending 0 23h
kube-proxy-g9g6r 1/1 Running 1 23h
如果出现Running则表示正常,如果出现Init:0/1执行以下命令
sed -i -r "s#quay.io/coreos/flannel:.*-amd64#lizhenliang/flannel:v0.12.0-amd64#g" kube-flannel.yaml
#修改 flannel 插件文件,这个版本比较低,高版本 k8s 尽量选择一些高一些的版本 flannel
检查各个节点是否正常
kubectl get cs
#查询master是否正常
#若状态为unhealthy
vi /etc/kubernetes/manifests/kube-scheduler.yaml
vi /etc/kubernetes/manifests/kube-controller-manager.yaml
#将- --port=0注释掉
kubectl get pods -A
#检查容器状态节点状态
kubectl get nodes
#查询node节点是否ready
遗留问题:
1、阿里云服务器节点异常? 待定
2、pod状态还有0/1异常情况?
确认修改 vi /etc/kubernetes/manifests/kube-scheduler.yaml vi /etc/kubernetes/manifests/kube-controller-manager.yaml #将- --port=0注释掉 再次查看master状态已经正常
[root@node-3 ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}
pod状态也是正常
[root@node-3 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node-3 Ready control-plane,master 11m v1.21.3
3、node节点异常
[root@node-3 ~]# kubectl get pods -A
W0319 11:01:29.365987 3624 loader.go:221] Config not found: /etc/kubernetes/admin.conf
The connection to the server localhost:8080 was refused - did you specify the right host or port?
解决方式参考:https://blog.csdn.net/wuxingpu5/article/details/126605168
4、node加入master的时候,虚拟机拷贝kubectl数据,导致机器名称冲突,需要重新修改后再加入master
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase kubelet-start: a Node with name "node-3" and status "Ready" already exists in the cluster. You must delete the existing Node or change the name of this new joining Node
To see the stack trace of this error execute with --v=5 or higher
解决方法
#重命名node机器,再次执行加入master命令
hostnamectl set-hostname node-1
显示加入成功
[root@node-3 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node-1 Ready <none> 10m v1.21.3
node-3 Ready control-plane,master 21m v1.21.3