本文主要是介绍docker registry罕见原因导致的故障dial tcp 127.0.0.1:5000: connect: connection refused,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
背景
系统环境:k8s+docker+cri-dockerd
因为我不想把镜像通过Docker hub公开,以及将来在不联网的生产环境部署,自己运行一个docker存储库,在k8s部署工作负载时从中拉取镜像。
相关命令形如:
docker run -d -p 5000:5000 --restart=always --name registry registry:2docker push localhost:5000/user/user-image
问题
没有修改环境配置,进行了一些k8s和docker相关操作后,再推送镜像时突然发生错误。
Get "http://localhost:5000/v2/": dial tcp 127.0.0.1:5000: connect: connection refused
解决
一开始我按一般排查故障的方法,检查 registry 容器日志,docker 服务日志,重启docker服务,重新部署 registry 容器等等,均未解决问题,百思不得其解。
后续我进行k8s操作,部署时发现问题,大概是在k8s部署的容器可以分配一个Node端口,同一个Node的同一个端口只能分配一次,导致只有一个Node时不能部署第二份。表现如下:
root@vmi1640551:~# kubectl -n test-cinema-2 get po
NAME READY STATUS RESTARTS AGE
a-bookings-1-756694bb6b-sdqbg 0/1 Pending 0 8m4s
a-movies-1-66785d95ff-6jp27 0/1 Pending 0 8m4s
a-showtimes-1-fcb9d8bc6-9txh5 0/1 Pending 0 8m4s
a-users-1-59bb6845cf-zb7xw 0/1 Pending 0 8m4s
proxy 1/1 Running 0 8m14s
root@vmi1640551:~# kubectl -n test-cinema-2 describe po a-bookings-1-756694bb6b-sdqbg
Name: a-bookings-1-756694bb6b-sdqbg
Namespace: test-cinema-2
Priority: 0
Service Account: default
Node: <none>
Labels: app=a-bookings-1pod-template-hash=756694bb6b
Annotations: kompose.cmd: kompose --file docker-compose.yml convertkompose.version: 1.32.0 (HEAD)
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/a-bookings-1-756694bb6b
Containers:bookings:Image: localhost:5050/cinema-2/bookingsPort: 5003/TCPHost Port: 5003/TCPLimits:cpu: 100mRequests:cpu: 100mReadiness: http-get http://:5003/health-check delay=0s timeout=1s period=3s #success=1 #failure=2Environment: <none>Mounts:/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pmx4b (ro)
Conditions:Type StatusPodScheduled False
Volumes:kube-api-access-pmx4b:Type: Projected (a volume that contains injected data from multiple sources)TokenExpirationSeconds: 3607ConfigMapName: kube-root-ca.crtConfigMapOptional: <nil>DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300snode.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:Type Reason Age From Message---- ------ ---- ---- -------Warning FailedScheduling 3m9s (x2 over 8m20s) default-scheduler 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
root@vmi1640551:~#
我猜测可能是有k8s中容器占用了5000端口。修改 registry 绑定的本地端口后(比如改为5050),推送成功了。
检查确实如此,有一个工作负载的配置如下:
apiVersion: apps/v1
kind: Deployment
metadata:annotations:kompose.cmd: kompose --file docker-compose.yml convertkompose.version: 1.32.0 (HEAD)labels:io.kompose.service: usersname: users
spec:replicas: 1selector:matchLabels:io.kompose.service: userstemplate:metadata:annotations:kompose.cmd: kompose --file docker-compose.yml convertkompose.version: 1.32.0 (HEAD)labels:io.kompose.network/cinema-2-default: "true"io.kompose.service: usersspec:containers:- image: localhost:5000/cinema-2/usersname: usersports:- containerPort: 5000hostPort: 5000 # ! 注意这里 !protocol: TCPreadinessProbe:httpGet:path: /health-checkport: 5000periodSeconds: 3 # 默认 10failureThreshold: 2 # 默认 3successThreshold: 1timeoutSeconds: 1restartPolicy: Always
删除实际未使用的 hostPort
后恢复正常。
后续疑问
有些疑问还没来得及解决:
- k8s、docker的网络原理是怎样的?
- 特别的,由k8s pull镜像、从主机
docker push
镜像和curl localhost:5000
的请求会被如何路由?是否有区别? - registry 和 k8s中部署的工作负载应该只有一个能监听唯一Node的5000端口,为什么看起来似乎都部署成功了,看不到错误?
这篇关于docker registry罕见原因导致的故障dial tcp 127.0.0.1:5000: connect: connection refused的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!