-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate ability to detect out-of-memory-related restart looping #1318
Comments
2nd incident - @snowthe18raw (5835) Logs -
|
@sitaram-kalluri @murali-shris can you add your comments on this ticket |
@athandle : @sitaram-kalluri has made good progress in profiling memory usage on #1303 and has PR #1428 which reduces memory consumption during startup and in steady state However, this ticket is about detecting out-of-memory restarts at the swarm level. See "Describe the solution you'd like" in this ticket's description above |
Moving to the next sprint |
Moving to PR72. @kumarnarendra701 can you please make sure this one's on your list for this sprint. |
Nice find @kumarnarendra701 Swarmprom looks really nice. Let's get it set up on staging and see how we get on with it (and whether it can solve the problem we're looking at here). |
@cpswan - I set up Swarmprom in our staging environment, and I'm currently checking out the UI to explore more about it and moving it to the next sprint for more work. |
@cpswan - Didn't get a chance to work on this tool and move to next sprint. |
@cpswan - Moving to the next sprint. I'm seeing the issue in Swarmprom UI. I'll update further progress on the ticket. |
@cpswan |
@cpswan - Portainer UI setup completed and facing some issues in agent connectivity and working on this. |
@cconstab I know that you tried Portainer a while ago, so it would be good to get your feedback on it? |
@cpswan - I used Portainer in my staging Swarm cluster, but I noticed it's mainly for managing Docker Swarm itself and doesn't focus much on monitoring. Also, it doesn't show more visibility of stacks that are created outside of Portainer. |
I found it worked ok in small setups like mybhome lab but did not scale well to our setup. My take was in the end use the cli and if we needed tools look else where. The portainer team also got "k8s" pretty bad and that started to pull the project away from Swarm mode. This was 2 years back so things may well have changed. |
Bumping to PR78 so that @kumarnarendra701 can continue. I've suggested:
|
@cpswan - I tried to setup Swarmform on a staging cluster, and while all services seem to be working fine, I'm unable to view all cluster data on the Swarm node data and only show one node. I've tried to find a solution for this, but it's proving to be very difficult to debug due to the limited blogs available online. Swarm UI: Setup Informations: Can you please quickly review this and let me know if you notice any issues with the setup? |
@kumarnarendra701 looks like the mon_dockerd-exporter containers are unable to send their data:
My fault finding process:
I'd call out that 172.18.x addresses aren't in the LAN range for that Swarm. |
@cpswan - Thanks for your input. I tried running "swarmprom" in the secondary Docker network, but it failed. Although I can ping the IP from the container, I cannot connect to port 9323. Errors -
The IP it trying to connect is docker network
cc: @athandle |
Reduced SP and moved to next sprint |
@kumarnarendra701 can you please try to get back into this and see if you can resolve the network issues. |
@cpswan - I've tried using Swarmprom several times, but it looks like the repository was archived 4 years ago and there are very few blog posts about it. It seems like we might need to consider using other monitoring tools, but most tools are designed for Kubernetes with very few options for Docker swarm monitoring. If you know of any tools that can monitor a swarm cluster, please suggest them so that I can start implementing them. |
I've started running |
I did look at the output and all interesting events are being logged. There weren't any "too much memory being used" restarts when last I looked after a couple of days; I will look again at the weekend |
|
I will create a script during this sprint and do some testing via my atServer to verify it |
Is your feature request related to a problem? Please describe.
We need a way to detect out-of-memory-related restart looping
Describe the solution you'd like
Have a tool which listens for events such as when the docker swarm manager has killed a container. Such a tool could then check if the container was killed due to container memory usage exceeding its cap, and could also check if it was previously killed within N (e.g. 10) minutes also due to container memory usage exceeding its cap
See https://docs.docker.com/engine/reference/commandline/events/
Describe alternatives you've considered
No response
Additional context
Linked to #1303
The text was updated successfully, but these errors were encountered: