Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s-dns e2e test suite failing with exit status 1 at HEAD #646

Closed
DamianSawicki opened this issue Oct 6, 2024 · 8 comments
Closed

k8s-dns e2e test suite failing with exit status 1 at HEAD #646

DamianSawicki opened this issue Oct 6, 2024 · 8 comments

Comments

@DamianSawicki
Copy link
Collaborator

pull-kubernetes-dns-test fails at HEAD (verified for the no-op PR #645) as below:

...
2024/10/06 16:17:58 test | 2024/10/06 16:17:53 sidecar started
2024/10/06 16:17:58 test | 2024/10/06 16:17:53 running `dig`
2024/10/06 16:17:58 test | 2024/10/06 16:17:53 Waiting for hits to be reported to be greater than 100
2024/10/06 16:17:58 test | 
2024/10/06 16:17:58 All tests passed
2024/10/06 16:17:58 docker [rmi -f k8s-dns-sidecar-e2e-test]
Running Suite: k8s-dns e2e test suite
=====================================
Random Seed: 1728231478
Will run 5 of 5 specs
2024/10/06 16:18:20 exit status 1
Ginkgo ran 1 suite in 21.764852525s
Test Suite Failed

This (most probably) blocks a vulnerability-fix PR #638 open since July for which tests are failing identically.

For the last merged PR #635 the test pull-kubernetes-dns-test passed, so apparently the tests or test infra must have changed in the meantime. For #638, the test failed identically on July 23rd, July 29th, and September 14th, so the issue seems to predate the August 2024 Prow migration.

@DamianSawicki
Copy link
Collaborator Author

I think the failing test is defined in test/e2e/e2e_test.go in the present repo. This means it has not been modified since #635, so it is more of an infra thing.

When I tried to run the test locally, I got the message 2024/10/06 21:08:39 e2e test requires `sudo` to be active. Run `sudo -v` before running the e2e test., so perhaps it is a matter of permissions?

Also, in artifacts of the failed run, in the file podinfo.json, I've found the following:

				{
					"name": "test",
					"state": {
						"terminated": {
							"exitCode": 1,
							"reason": "Error",
							"message": " test | \n2024/10/06 16:17:58 All tests passed\n2024/10/06 16:17:58 docker [rmi -f k8s-dns-sidecar-e2e-test]\nRunning Suite: k8s-dns e2e test suite\n=====================================\nRandom Seed: \u001b[1m1728231478\u001b[0m\nWill run \u001b[1m5\u001b[0m of \u001b[1m5\u001b[0m specs\n\n2024/10/06 16:18:20 exit status 1\n\nGinkgo ran 1 suite in 21.764852525s\nTest Suite Failed\n\n\u001b[38;5;228mGinkgo 2.0 is coming soon!\u001b[0m\n\u001b[38;5;228m==========================\u001b[0m\n\u001b[1m\u001b[38;5;10mGinkgo 2.0\u001b[0m is under active development and will introduce several new features, improvements, and a small handful of breaking changes.\nA release candidate for 2.0 is now available and 2.0 should GA in Fall 2021.  \u001b[1mPlease give the RC a try and send us feedback!\u001b[0m\n  - To learn more, view the migration guide at \u001b[38;5;14m\u001b[4mhttps://github.com/onsi/ginkgo/blob/ver2/docs/MIGRATING_TO_V2.md\u001b[0m\n  - For instructions on using the Release Candidate visit \u001b[38;5;14m\u001b[4mhttps://github.com/onsi/ginkgo/blob/ver2/docs/MIGRATING_TO_V2.md#using-the-beta\u001b[0m\n  - To comment, chime in at \u001b[38;5;14m\u001b[4mhttps://github.com/onsi/ginkgo/issues/711\u001b[0m\n\nTo \u001b[1m\u001b[38;5;204msilence this notice\u001b[0m, set the environment variable: \u001b[1mACK_GINKGO_RC=true\u001b[0m\nAlternatively you can: \u001b[1mtouch $HOME/.ack-ginkgo-rc\u001b[0m\n+ EXIT_VALUE=1\n+ set +o xtrace\nCleaning up after docker in docker.\n================================================================================\nWaiting 30 seconds for pods stopped with terminationGracePeriod:30\nCleaning up after docker\nWaiting for docker to stop for 30 seconds\nStopping Docker: dockerProgram process in pidfile '/var/run/docker-ssd.pid', 1 process(es), refused to die.\n================================================================================\nDone cleaning up after docker in docker.\n{\"component\":\"entrypoint\",\"error\":\"wrapped process failed: exit status 1\",\"file\":\"sigs.k8s.io/prow/pkg/entrypoint/run.go:84\",\"func\":\"sigs.k8s.io/prow/pkg/entrypoint.Options.internalRun\",\"level\":\"error\",\"msg\":\"Error executing test process\",\"severity\":\"error\",\"time\":\"2024-10-06T16:19:10Z\"}\n",
							"startedAt": "2024-10-06T15:55:53Z",
							"finishedAt": "2024-10-06T16:19:10Z",
							"containerID": "containerd://302c6068cdfb4c64dd8aafb8b56a4f61083e252a3c594e89249c2a568e443000"
						}
					},
					"lastState": {},
					"ready": false,
					"restartCount": 0,
					"image": "gcr.io/k8s-staging-test-infra/kubekins-e2e:v20240923-c8645c1a17-master",
					"imageID": "gcr.io/k8s-staging-test-infra/kubekins-e2e@sha256:c5cf57a29e78a568ecf90a3b5b4df6b2afd5245c97edda91759e3e07f2330ba7",
					"containerID": "containerd://302c6068cdfb4c64dd8aafb8b56a4f61083e252a3c594e89249c2a568e443000",
					"started": false
				}

which mentions kubekins-e2e, which seems to be deprecated.

@DamianSawicki
Copy link
Collaborator Author

Hey @BenTheElder, I found you among the owners of kubekins-e2e mentioned above. Would you be able to look at the comments above and possibly share some advice?

@BenTheElder
Copy link
Member

I don't work in this repo, but kubekins-e2e is an image we use currently to run some CI in the kubernetes project. It has a grab bag of tools like docker. Any other usage is best-effort.

podinfo.json is the pod in which we executed the PR tests. for more see https://docs.prow.k8s.io/docs/jobs/ and https://github.com/kubernetes/test-infra (config/)

@BenTheElder
Copy link
Member

unless this project opted into it, the pod most likely ran as root, but it's hard to know without tracing the job specifics, e.g. you may have scheduled the test into the cluster under test (Which is NOT the cluster we use to run CI, that just executes the CI workloads, which then create disposable test clusters)

seems to predate the August 2024 Prow migration.

that migration was for the control plane. migrating the workloads was done prior to this, and varies by workload.

you can find this job's definition in the test-infra repo and see the git history there.

we're currently approach KEP Freeze, and I will be out for a few days after that, so time is tight this week 😅

@DamianSawicki
Copy link
Collaborator Author

Ben, thank you very much for your responses!

@VikashLNU @zhangguanzhang You can have a look at the comments above to try to unblock the PR #638 you're interested in.

@zhangguanzhang
Copy link

Ben, thank you very much for your responses!

@VikashLNU @zhangguanzhang You can have a look at the comments above to try to unblock the PR #638 you're interested in.

I don't see how to resolve the issue, but once someone fixes the CI build problem, I can rebase my code onto the master branch and push it.

@dereknola
Copy link
Contributor

We should be good to close this issue now. #651 Addressed it.

@DamianSawicki
Copy link
Collaborator Author

Yeah, thank you very much again, @dereknola!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants