Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Swarm Setup Issue: memberlist: Was able to connect to 002.prod.if-0a35b2d92350 but other probes failed, network may be misconfigured #729

Closed
anantanandgupta opened this issue Jan 6, 2025 · 2 comments

Comments

@anantanandgupta
Copy link

anantanandgupta commented Jan 6, 2025

Hi i am not able to setup the agents to communicate with each other in s multi node Swarm. i have spent almost 3 days in trying to troubleshoot the issue but not able to figure out the reason and gone through almost all the forums and blogs and issue tracker.

Here the setup detail:
Servers:
Operating System:

$ lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.5 LTS
Release:        22.04
Codename:       jammy

Kernel Version:

5.15.0-130-generic

Nodes:

001.prod.if - 192.168.3.111
002.prod.if - 192.168.3.112
003.prod.if - 192.168.3.113
004.prod.if - 192.168.3.114
005.prod.if - 192.168.3.115

Docker Version:

Docker version 27.4.1, build b9d17ea

other docker relevant packages:

containerd.io:amd64 (1.7.24-1)
docker-compose-plugin:amd64 (2.32.1-1~ubuntu.22.04~jammy)
docker-ce-cli:amd64 (5:27.4.1-1~ubuntu.22.04~jammy)
docker-buildx-plugin:amd64 (0.19.3-1~ubuntu.22.04~jammy)
docker-ce:amd64 (5:27.4.1-1~ubuntu.22.04~jammy)
docker-ce-rootless-extras:amd64 (5:27.4.1-1~ubuntu.22.04~jammy)

Agent Version:

2.21.5

created an overlay network as

$ docker network create --driver overlay --attachable --scope swarm proxy-network

output of docker inspect proxy-network:

[
    {
        "Name": "proxy-network",
        "Id": "f32ckmy6wqgbo5iitd4yy69wv",
        "Created": "2025-01-06T10:43:47.998633153Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.1.0/24",
                    "Gateway": "10.0.1.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": null,
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4101"
        },
        "Labels": null
    }
]

compose file for the agent deployment:

networks:
  default:
    name: proxy-network
    external: true

services:
  agent:
    image: portainer/agent:latest
    environment:
      LOG_LEVEL: DEBUG
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/lib/docker/volumes:/var/lib/docker/volumes
      - /:/host
    deploy:
      mode: global
      placement:
        constraints: 
          - node.platform.os == linux

output of docker stack ps if-portainer-agent:

ID             NAME                                                 IMAGE                    NODE          DESIRED STATE   CURRENT STATE            ERROR     PORTS
2eykt9qo1mye   if-portainer-agent_agent.84fqhoo3uitfpq1brkxpo7cfy   portainer/agent:latest   005.prod.if   Running         Running 35 seconds ago             
tifutmofkehy   if-portainer-agent_agent.aa69h9coypvd708c6tbj04fj3   portainer/agent:latest   004.prod.if   Running         Running 30 seconds ago             
poro11iqiuce   if-portainer-agent_agent.i1h0miqdxnrxmfc717w2aopci   portainer/agent:latest   001.prod.if   Running         Running 35 seconds ago             
1j23cdhlkc2y   if-portainer-agent_agent.twlsj8ir51kvew8p70101ksik   portainer/agent:latest   003.prod.if   Running         Running 28 seconds ago             
1le1o5lozb9u   if-portainer-agent_agent.yuyiqfdsd8frwvud158ofxqvp   portainer/agent:latest   002.prod.if   Running         Running 36 seconds ago

output of docker ps on 001.prod.if:

CONTAINER ID   IMAGE                    COMMAND     CREATED         STATUS         PORTS     NAMES
d84eeec985ac   portainer/agent:latest   "./agent"   2 minutes ago   Up 2 minutes             if-portainer-agent_agent.i1h0miqdxnrxmfc717w2aopci.poro11iqiucexsffdbzuov3ew

output of docker logs d84eeec985ac:

2025/01/06 10:47AM INF ./main.go:87 > agent running on Docker platform |
2025/01/06 10:47AM DBG ./main.go:97 > member_tags="&{AgentPort:9001 EdgeKeySet:false NodeName:001.prod.if DockerConfiguration:{EngineStatus:2 Leader:true NodeRole:1} KubernetesConfiguration:{}}"
2025/01/06 10:47AM INF ./main.go:102 > agent running on a Swarm cluster node. Running in cluster mode |
2025/01/06 10:47AM DBG github.com/portainer/agent/docker/docker.go:105 > retrieving IP address from container network | ip_address=10.0.1.3 network_name=proxy-network
2025/01/06 10:47AM DBG github.com/portainer/agent/net/lookup.go:22 > host=tasks.if-portainer-agent_agent ip=10.0.1.7 result=1
2025/01/06 10:47AM DBG github.com/portainer/agent/net/lookup.go:22 > host=tasks.if-portainer-agent_agent ip=10.0.1.3 result=2
2025/01/06 10:47AM DBG github.com/portainer/agent/net/lookup.go:22 > host=tasks.if-portainer-agent_agent ip=10.0.1.5 result=3
2025/01/06 10:47AM DBG github.com/portainer/agent/serf/cluster.go:79 > advertise_address=10.0.1.3 join_address=["10.0.1.7","10.0.1.3","10.0.1.5"]
2025/01/06 10:47:08 [INFO] serf: EventMemberJoin: 001.prod.if-d84eeec985ac 10.0.1.3
2025/01/06 10:47:08 [INFO] serf: EventMemberJoin: 002.prod.if-3084b268ea07 10.0.1.5
2025/01/06 10:47:08 [INFO] serf: EventMemberJoin: 005.prod.if-c110b0b5d74c 10.0.1.7
2025/01/06 10:47AM DBG github.com/portainer/agent/serf/cluster.go:91 > contacted_nodes=3
2025/01/06 10:47AM DBG ./main.go:156 > advertise_address=10.0.1.3 agent_port=9001 cluster_address=tasks.if-portainer-agent_agent probe_interval=1s probe_timeout=500ms
2025/01/06 10:47AM INF github.com/portainer/agent/edge/registry/server.go:101 > starting registry credential server |
2025/01/06 10:47AM INF github.com/portainer/agent/http/server.go:99 > starting Agent API server | api_version=2.21.5 server_addr=0.0.0.0 server_port=9001 use_tls=true
2025/01/06 10:47:10 [WARN] memberlist: Was able to connect to 005.prod.if-c110b0b5d74c but other probes failed, network may be misconfigured
2025/01/06 10:47:11 [WARN] memberlist: Was able to connect to 002.prod.if-3084b268ea07 but other probes failed, network may be misconfigured
2025/01/06 10:47:12 [WARN] memberlist: Was able to connect to 002.prod.if-3084b268ea07 but other probes failed, network may be misconfigured
2025/01/06 10:47:13 [INFO] serf: EventMemberJoin: 004.prod.if-10cd6b6a5f48 10.0.1.6
2025/01/06 10:47:13 [WARN] memberlist: Was able to connect to 005.prod.if-c110b0b5d74c but other probes failed, network may be misconfigured
2025/01/06 10:47:14 [WARN] memberlist: Was able to connect to 004.prod.if-10cd6b6a5f48 but other probes failed, network may be misconfigured
2025/01/06 10:47:15 [WARN] memberlist: Was able to connect to 002.prod.if-3084b268ea07 but other probes failed, network may be misconfigured
2025/01/06 10:47:16 [INFO] serf: EventMemberJoin: 003.prod.if-a4a2608d704e 10.0.1.4
2025/01/06 10:47:16 [WARN] memberlist: Was able to connect to 005.prod.if-c110b0b5d74c but other probes failed, network may be misconfigured
2025/01/06 10:47:17 [WARN] memberlist: Was able to connect to 004.prod.if-10cd6b6a5f48 but other probes failed, network may be misconfigured
2025/01/06 10:47:18 [WARN] memberlist: Was able to connect to 005.prod.if-c110b0b5d74c but other probes failed, network may be misconfigured
2025/01/06 10:47:19 [WARN] memberlist: Was able to connect to 003.prod.if-a4a2608d704e but other probes failed, network may be misconfigured
2025/01/06 10:47:20 [WARN] memberlist: Was able to connect to 004.prod.if-10cd6b6a5f48 but other probes failed, network may be misconfigured
2025/01/06 10:47:21 [WARN] memberlist: Was able to connect to 002.prod.if-3084b268ea07 but other probes failed, network may be misconfigured
2025/01/06 10:47:22 [WARN] memberlist: Was able to connect to 004.prod.if-10cd6b6a5f48 but other probes failed, network may be misconfigured
2025/01/06 10:47:23 [WARN] memberlist: Was able to connect to 002.prod.if-3084b268ea07 but other probes failed, network may be misconfigured
...
...

please help me ... this my production environment and it is down from 3 days because of portainer. the only way forward, I believe is to mange the stacks using the CLI and go without portainer

@anantanandgupta
Copy link
Author

here is the output of ip a from 2 of the nodes just for the sake of MTU related diagnostics:

from 001.prod.if:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether b0:0c:d1:53:40:b9 brd ff:ff:ff:ff:ff:ff
    altname enp1s0
    inet 192.168.3.111/24 metric 100 brd 192.168.3.255 scope global dynamic eno0
       valid_lft 58401sec preferred_lft 58401sec
    inet6 fe80::b20c:d1ff:fe53:40b9/64 scope link 
       valid_lft forever preferred_lft forever
3: wlp2s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether a0:a4:c5:08:62:09 brd ff:ff:ff:ff:ff:ff
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:d3:1a:cf:0a brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
210: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:4c:a0:ff:ee brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 brd 172.18.255.255 scope global docker_gwbridge
       valid_lft forever preferred_lft forever
    inet6 fe80::42:4cff:fea0:ffee/64 scope link 
       valid_lft forever preferred_lft forever
1253: vethafdabf1@if1252: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default 
    link/ether 32:a5:5e:19:eb:40 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::30a5:5eff:fe19:eb40/64 scope link 
       valid_lft forever preferred_lft forever
1785: veth6dd392b@if1784: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default 
    link/ether 46:d7:d7:f0:b2:16 brd ff:ff:ff:ff:ff:ff link-netnsid 4
    inet6 fe80::44d7:d7ff:fef0:b216/64 scope link 
       valid_lft forever preferred_lft forever

from 002.prod.if:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether b0:0c:d1:53:3e:d7 brd ff:ff:ff:ff:ff:ff
    altname enp1s0
    inet 192.168.3.112/24 metric 100 brd 192.168.3.255 scope global dynamic eno0
       valid_lft 62421sec preferred_lft 62421sec
    inet6 fe80::b20c:d1ff:fe53:3ed7/64 scope link 
       valid_lft forever preferred_lft forever
3: wlp2s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether a0:a4:c5:09:47:82 brd ff:ff:ff:ff:ff:ff
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:f0:27:7e:41 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
1677: veth2f17fc2@if1676: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default 
    link/ether 2e:e8:0d:8d:c7:0e brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::2ce8:dff:fe8d:c70e/64 scope link 
       valid_lft forever preferred_lft forever
167: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:c0:2a:f0:e4 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 brd 172.18.255.255 scope global docker_gwbridge
       valid_lft forever preferred_lft forever
    inet6 fe80::42:c0ff:fe2a:f0e4/64 scope link 
       valid_lft forever preferred_lft forever
1709: veth140e0d4@if1708: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default 
    link/ether 96:98:00:94:2a:98 brd ff:ff:ff:ff:ff:ff link-netnsid 4
    inet6 fe80::9498:ff:fe94:2a98/64 scope link 
       valid_lft forever preferred_lft forever

@anantanandgupta
Copy link
Author

moved to discussion in this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant