Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod IP not removed from Service EndPoint when ReadinessProbe failed #3725

Open
bob2204 opened this issue Aug 30, 2024 · 16 comments
Open

Pod IP not removed from Service EndPoint when ReadinessProbe failed #3725

bob2204 opened this issue Aug 30, 2024 · 16 comments

Comments

@bob2204
Copy link

bob2204 commented Aug 30, 2024

Hello

With Kind 0.24 and Node 1.31.0 the Pod IP is not removed from Service EndPoint when ReadinessProbe failed, although noticed NotReadyAddress in EndPoint !

This was fine wih kind 0.23 and Node 1.30.2

Is this normal ?

Best Regards

@bob2204 bob2204 changed the title Node IP not removed from Service EndPoint when ReadunessProbe failed Node IP not removed from Service EndPoint when ReadinessProbe failed Aug 30, 2024
@aojea
Copy link
Contributor

aojea commented Aug 30, 2024

You have to add more details and a reproducer, is not easy to understand from the comments what can be failing there

@bob2204
Copy link
Author

bob2204 commented Aug 30, 2024

I apologize, what I wish to say is that the Pod IP was not remove from the service endpoint.

I use a Nginx Deployment with a ReadinessProbe with this container :

containers:
      - image: nginx:1.26
        name: nginx
        readinessProbe:
          httpGet:
            path: /livez
            port: 80
          periodSeconds: 3
          failureThreshold: 2

and a service like :

apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx
  type: LoadBalancer

and when this ReadinessProbe failed, the Pod IP is shown "NotReadyAddress" in the EndPoint :

kubectl describe endpoints nginx 
Name:         lemp
Namespace:    default
Labels:       app=nginx
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2024-08-30T15:41:43Z
Subsets:
  Addresses:          <none>
  NotReadyAddresses:  10.32.204.60
  Ports:
    Name     Port  Protocol
    ----     ----  --------
    <unset>  80    TCP

Events:  <none>

BUT the Pod IP 10.32.204.60 was not removed from de Service Endpoints :

kubectl describe svc nginx 
Name:                     nginx
Namespace:                default
Labels:                   app=nginx
Annotations:              <none>
Selector:                 app=nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       172.16.42.218
IPs:                      172.16.42.218
LoadBalancer Ingress:     172.18.0.9 (Proxy)
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  31693/TCP
Endpoints:                10.32.204.60:80
Session Affinity:         None
External Traffic Policy:  Cluster
Internal Traffic Policy:  Cluster
Events:                   <none>

With Kind 0.23 and kindest/node:1.30.2, everything is OK, the Pod IP is removed from the Service EndPoints when the ReadinessProbe failed
AND with a K8S Cluster with 3 VMs and 1.31.0 everything is OK too !

Is my english clear ?

@bob2204 bob2204 changed the title Node IP not removed from Service EndPoint when ReadinessProbe failed Pod IP not removed from Service EndPoint when ReadinessProbe failed Aug 30, 2024
@aojea
Copy link
Contributor

aojea commented Aug 31, 2024

Just to understand, this works in kubernetes versions 1.30 and 1.31, only fails with Node 1.31.0 ?

@bob2204
Copy link
Author

bob2204 commented Aug 31, 2024

After further investigations, I found that whatever kubernetes version is, the problem seems to be Virtualbox environnement.
I've to identical kind installations -- kind 0.24.0, kindest/node:1.31.0, calico-3.28.0 --, one on physical machine, one on Virtualbox VM :

An explanation ?

@aojea
Copy link
Contributor

aojea commented Aug 31, 2024

Is the kubectl the same version?

What difference make for kind running on top of virtual box or a VM, it just used docker container?

Are you doing something out of the ordinary? Adding custom nodes or different kind configuration?

@bob2204
Copy link
Author

bob2204 commented Aug 31, 2024

Kubectl is the same version
The two install are identical.
The both have the same Calico CNI version 3.28.
In both installs there is Docker.
The only difference is Physical Machine/Virtual Machine.

@BenTheElder
Copy link
Member

Do you observe this without calico? We don't really provide support for third party CNI (it's supported to be possible to install it, but we're not tracking down bugs with all of them).

@bob2204
Copy link
Author

bob2204 commented Sep 3, 2024

With calico/cilium/kindnet i've the same behavior
With Virtualbox/VmWare/kvm the same.
With killercoda everything is fine ! For me it's like a witness.

I've tried this simple

kind create cluster --config=config.yml

with one Control-Plane and three Workers.

@aojea
Copy link
Contributor

aojea commented Sep 3, 2024

can you upload a tarball with the logs of the cluster that has the issue with kind export logs and indicate the name of the Service and the time (more or less) when the problem happens?

@bob2204
Copy link
Author

bob2204 commented Sep 3, 2024

full-logs.tar.gz
Service name: nginx
UTC Time: 2024-09-03T18:43:56Z

@bob2204
Copy link
Author

bob2204 commented Sep 3, 2024

Manifest used

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx
        ports:
          - name: http
            containerPort: 80
        readinessProbe:
          httpGet:
            path: /healthz
            port: http
          periodSeconds: 2
          failureThreshold: 2
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: http
  selector:
    app: nginx

Alternatively I create/destroy /usr/share/nginx/html/healthz to act on ReadinessProbe.

@aojea
Copy link
Contributor

aojea commented Sep 5, 2024

full-logs.tar.gz Service name: nginx UTC Time: 2024-09-03T18:43:56Z

that does not adds up, the ngninx container starts at 18:44

Sep 03 18:44:01 stage-worker2 containerd[185]: time="2024-09-03T18:44:01.642418279Z" level=info msg="StartContainer for "0f8fa2821ddca5ce36b9ee686d36e60cf6ffa18b665585c663fe9f4baef699d0" returns successfully"

and there is no more logs after that, you have period 2 and threshold 2, so it should start failing at 18:44:05 but there are no logs there

I noticed that your environment has only 2 GB of ram in the VM, it would not be surprising that the problem is that your VMs are constrained and everything is slower on that environment

@bob2204
Copy link
Author

bob2204 commented Sep 5, 2024

I'm so sorry to waste your time, but the problem remains the same with 8GB !
This is the new dump.
full-log-2.tar.gz

The time was around 11:40/11:50 UTC.

k describe ep,svc nginx 
Name:         nginx
Namespace:    default
Labels:       <none>
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2024-09-05T11:48:23Z
Subsets:
  Addresses:          <none>
  NotReadyAddresses:  10.244.2.3       <<<< This shows that the IP is not Ready 
  Ports:
    Name     Port  Protocol
    ----     ----  --------
    <unset>  80    TCP

Events:  <none>


Name:                     nginx
Namespace:                default
Labels:                   <none>
Annotations:              <none>
Selector:                 app=nginx
Type:                     ClusterIP
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.96.88.94
IPs:                      10.96.88.94
Port:                     <unset>  80/TCP
TargetPort:               http/TCP
Endpoints:                10.244.2.3:80         <<<< Should NOT be here because the IP is not Ready
Session Affinity:         None
Internal Traffic Policy:  Cluster
Events:                   <none>

@aojea
Copy link
Contributor

aojea commented Sep 5, 2024

@bob2204 is like the kubelet is continuously restarting ... if you have the cluster running can you verify that?

@bob2204
Copy link
Author

bob2204 commented Sep 6, 2024

None of the three kubelets is continuously restarting.
This the log of systemctl status kubelet of one node. The others are the same :

root@stage-worker2:/# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf, 11-kind.conf
     Active: active (running) since Thu 2024-09-05 11:44:26 UTC; 14h ago
       Docs: http://kubernetes.io/docs/
    Process: 197 ExecStartPre=/bin/sh -euc if [ -f /sys/fs/cgroup/cgroup.controllers ]; then /kind/bin/create-kubelet-cgroup-v2.sh; fi (code=exited, status=0/SUCCESS)
    Process: 198 ExecStartPre=/bin/sh -euc if [ ! -f /sys/fs/cgroup/cgroup.controllers ] && [ ! -d /sys/fs/cgroup/systemd/kubelet ]; then mkdir -p /sys/fs/cgroup/systemd/kubelet; fi (code=exited, status=0/SUCCESS)
   Main PID: 199 (kubelet)
      Tasks: 12 (limit: 9425)
     Memory: 43.2M
        CPU: 7min 5.993s
     CGroup: /kubelet.slice/kubelet.service
             └─199 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-ip=172.18.0.3 --node-labels= --pod-infra-container-image=registry.k8s.io/pause:3.10 --provider-id=kind://docker/stage/stage-worker2 --runtime-cgroups=/system.slice/containerd.service

@faisalkamilansari
Copy link

@bob2204

I am also having same problem , is your problem solved ??

kubernetes version : v1.31.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants