-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use informers for pod events instead of Listing #2178
Conversation
Hi @hakuna-matatah. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/easycla |
/ok-to-test |
@hakuna-matatah Your commit is not associated with your account properly. I guess you need to amend it and push again. |
/easycla |
b07f0f7
to
b723471
Compare
/assign @tosi3k |
@@ -91,7 +94,7 @@ func (p *podStartupLatencyMeasurement) Execute(config *measurement.Config) ([]me | |||
if err != nil { | |||
return nil, err | |||
} | |||
|
|||
schedulerName, err := util.GetStringOrDefault(config.Params, "schedulerName", defaultSchedulerName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please copy the error checking here from under the case "gather"
as well.
|
||
p.stopSchedCh = make(chan struct{}) | ||
|
||
e := informer.NewInformer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use Controller
, a slightly lower-lever primitive than Informer
and pass a large (e.g. 10k) WatchListPageSize
in the Config?
The reason I'm asking this is that in large clusters there's a tendency to have O(hundreds of thousands) events and listing them using default page size (500) may result in informer's initial list getting timed out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
listing them using default page size (500) may result in informer's initial list getting timed out.
Oh! how will it timeout IIUC ? I have ensured we are not relying on client side timeout defined here for this use-case, instead I'm calling directly Run
method here . Am i misinterpreting what you are trying to imply here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see... we deliberately set the timeout in our wrappers around informers to make sure that the initial list (the one responsible for the cache's sync) completes in a reasonable time. In a clusters with O(xxx k) events this will take ages (O(a few minutes)) if using the default page size and because of that possibly run into the "too old resource version" during the initial list. I'd strongly suggest using the larger page size and hence the Controller
primitive for the List+Watch pattern for events like we do in https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/pkg/measurement/common/loadbalancer_nodesync_latency.go.
CC @mborsz for his thoughts as I'll be OOO for the rest of the week.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hakuna-matatah The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hakuna-matatah The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
95959a1
to
c3bd24d
Compare
277ebcc
to
da63d7a
Compare
6ed1e1a
to
087976f
Compare
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind bug
/kind failing-test
What this PR does / why we need it:
It's a quick fix to help calculate pod_startup_latencies effectively for large clusters and not worry about running into Apiserver side ttl issues nor worry about events being expired after 1h for larger clusters.
Which issue(s) this PR fixes:
Fixes #
It fixes these issues
Special notes for your reviewer: