-
Notifications
You must be signed in to change notification settings - Fork 3
Loads-broker stalls with message: No instances running, collection done #44
Comments
File "/Users/rpappalardo/git/loads-broker4/lib/python3.5/site-packages/docker_py-1.6.0-py3.5.egg/docker/client.py", line 110, in _get |
This log message usually appears after 10 mins of waiting for the stack to spin up. |
Since @pjenvey fixed ap-loadtester, I don't see this issue as much anymore. Seems like this usually happens now when there are too many attack nodes attached to 1 broker. I assume all the E.T. phone home activity probably causes a timeout so the broker thinks there are no instances running. Would be useful if the default timeout could be set higher, parameterized and documented. |
happens approx. 1/3 of the time? |
|
@pjenvey OK, i ran a long-running test last night and after a few hours I got the same error below (normally I only see this when trying to initiate a new attack cluster, but not in the middle of a test). NOTE:
|
Here's a full log of the previous error happening after hours of running: |
dumps a verbose docker ps like output to help debug intermittent/unexpected "No instances running" issue #44
@pjenvey running loads-broker w/ latest PRs (including updated debug logging). |
TEST RUN 2: looks like the run generated some traffic for a bit, though i hopped on a few attack nodes and saw no docker activity.
|
TEST RUNS 3,4,5: |
@tarekziade @pjenvey Is there a reason why the loads-broker log says: "working"/"populate" in 3 other regions(us-west-1, us-west-2, eu-west-1) as well?
|
It's populating AMI information for all regions. It's actually always done this, it's just more verbose about it now. AFAIK it's pointless to load it for the other regions in this case, but #55 will improve how that's being done anyway |
I'm wondering if the issue w/ test run #3 is due to the attempt to recover 10 instances? |
it's possible. i have made a habit of killing all the instances manually on each run. perhaps the broker assumes those resources would never be terminated that way (or always left behind for re-use)? |
@rpappalax says: it's possible. i have made a habit of killing all the instances manually on each run by the way, you can do this with loads now, and have their status updated in the broker. $ loads terminate_all |
@tarekziade $ loads terminate_all works like a charm |
@pjenvey is the new debugging output helpful in debugging this? let me know if i can do some further runs here |
here's what a working run looks like now with the new changes: |
we made sure the docker image is now in place in dockerhub and even tried pulling down a fresh image and running it locally (works OK). however, when we run it with loads-broker, we get this returning collections error. |
What I can tell from the extra debug output is that ailoads died immediately and returned an error code 127. If you can get into these instances, you can run docker logs on the container and hopefully see some output. Possibly the ailoads command isn't configured correctly? It attempted to run:
|
@chartjes those vars need to match what's being passed in from loads.json. I believe we changed them to: $TEST_DURATION & $CONNECTIONS and perhaps also in the Kinto Dockerfile. It could be that the image didn't get properly updated? |
@pjenvey Installed the latest PR and able to repro. log here: |
@penjenvy: was able to throw error when trying to run a large connection test: |
When starting a new test run with loads-broker, oftentimes the attack cluster spins up, but then loads-broker returns the message:
as if it can no longer see the attack cluster.
This also occasionally happens in the middle of a run.
The text was updated successfully, but these errors were encountered: