Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nfd-topology-updater: Detect E/P cores and expose through attributes #1737

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ozhuraki
Copy link
Contributor

@ozhuraki ozhuraki commented Jun 7, 2024

Detect which CPUs are which types of the cores (P-cores or E-cores) and expose IDs through labels.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 7, 2024
@k8s-ci-robot k8s-ci-robot requested review from kad and zvonkok June 7, 2024 09:53
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 7, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @ozhuraki. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

netlify bot commented Jun 7, 2024

Deploy Preview for kubernetes-sigs-nfd ready!

Name Link
🔨 Latest commit 2946157
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-nfd/deploys/67599acefb726600082bb9a9
😎 Deploy Preview https://deploy-preview-1737--kubernetes-sigs-nfd.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 7, 2024
Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @ozhuraki. I have hard time seeing the practical usefulness of this this information. I mean how is it possible to meaningfully consume these properties – I think they definitely should not be exposed as built-in/default node labels.

A more useful strategy could be to add this information as attributes in the NodeResourceTopology object that the nfd-topology-updater handles. This way, the E/P core information could be consumed by a Kubernetes (topology-aware) scheduler extension.

ping @kad

@kad
Copy link
Contributor

kad commented Jul 12, 2024

I have several concerns regarding this PR:

  1. way of detecting cores. We should use /sys/devices/cpu_*/*cpus for getting information about P/E cores. not something else.
  2. Those subdirs cpu_* are present only on hybrid CPUs, so if we have only one subdir /sys/devices/cpu/, it means only one type of the cores exists in the package, thus no need to expose those attributes at all. In other words: labels should be exposed only if CPU is hybrid.
  3. Even on hybrid CPUs, those labels should not be by default populated, only when configuration requests that. I agree with Markus that those labels are not directly consumable, so extra step to enable them on hybrid CPUs is fine.
  4. Exposing it as attributes in topology exporter is fine. There we can put them without configuration option if running on hybrid CPUs.
  5. Names of the types of the cores should not be hard-coded, but taken from sysfs. e.g.
$ grep "" /sys/devices/cpu_*/*cpus
/sys/devices/cpu_atom/cpus:16-19
/sys/devices/cpu_core/cpus:0-15
$

so, the labels should have "foo..atom" and "foo...core" names. @marquiz please suggest some good prefix instead of "foo..."

@ozhuraki
Copy link
Contributor Author

@marquiz @kad

Thanks for the helpful input. Updated, please take a look

@ozhuraki
Copy link
Contributor Author

ozhuraki commented Nov 4, 2024

@marquiz

Thanks for the useful input! Updated, please take a look

@ozhuraki
Copy link
Contributor Author

ozhuraki commented Nov 6, 2024

@marquiz @kad

Thanks for the help! Updated, please take a look

Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. A few more comments below.
/ok-to-test
/retitle nfd-topology-updater: Detect E/P cores and expose through attributes

pkg/nfd-topology-updater/nfd-topology-updater.go Outdated Show resolved Hide resolved
pkg/nfd-topology-updater/nfd-topology-updater.go Outdated Show resolved Hide resolved
pkg/nfd-topology-updater/nfd-topology-updater.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Nov 7, 2024
@k8s-ci-robot k8s-ci-robot changed the title source/cpu: Detect E/P cores and expose IDs through labels nfd-topology-updater: Detect E/P cores and expose through attributes Nov 7, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 7, 2024
@ozhuraki
Copy link
Contributor Author

ozhuraki commented Nov 8, 2024

@marquiz

Thanks, updated, please take a look

@ozhuraki
Copy link
Contributor Author

@marquiz

Thanks, updated, please take a look

Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ozhuraki for the update, I think we could merge this.
/assign @PiotrProkop

One general note about documentation that it would be nice to have e.g. a table listing all the information that we expose as attributes in the NRT. Maybe we should create an issue about this(?)

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 28, 2024
@marquiz
Copy link
Contributor

marquiz commented Nov 28, 2024

/retest

@PiotrProkop
Copy link
Contributor

One question and sorry if I misunderstand how hybrid cpus can be consumed in K8s. Wouldn't be more useful to just modify ResourceAggregator to advertise new resources for each cpu type in each Zone (NUMA node) by mapping generic k8s cpu resource to each cpu_type? But I may be wrong as I don't know how currently those E/P cores are assigned and consumed in Kubernetes.

@ozhuraki
Copy link
Contributor Author

@PiotrProkop

There's nothing special about it, just one way of exposing such information.

We can also modify ResourceAggregator to advertise new resources for each cpu type too.

I will make an issue and will try to add it with a sepatate PR.

@PiotrProkop
Copy link
Contributor

I just don't know how this information provided via attributes can be useful to custom scheduler without mapping to already used cpus. NRT object only exposes info about allocated/free cpus in given Zone without an info which cpu id is used and which is free.

Copy link
Contributor

@uniemimu uniemimu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found typo

@ozhuraki
Copy link
Contributor Author

@PiotrProkop

Thanks and sorry, missed you comment.

Yes, it's merely exposing an information for particuar kind of cores for workloads that just want particular kind of cores on hybrid cpus.

I made an issue #1964 to expose it through ResourceAggregator (so the custom scheduler could track allocated/free cores) and will try to add such PR too.

@uniemimu

Thanks, updated.

@PiotrProkop
Copy link
Contributor

I still feel like this information should be then exposed via labels and just say that given node has e or p cores without the numbers. And then no special scheduler plugin is needed but I may be wrong on what's the usecase here. I'll ask @ffromani to chime in.

@ozhuraki
Copy link
Contributor Author

@PiotrProkop

We initially had it through labels but later switched to annotations since labels were restrictive on exposing IDs.

I can additionaly expose via labels and just say that given node has E or P, as you suggest.

@ozhuraki
Copy link
Contributor Author

@PiotrProkop

Added labels too, please take a look.

@k8s-ci-robot
Copy link
Contributor

@ozhuraki: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-node-feature-discovery-build-image-cross-generic 2946157 link true /test pull-node-feature-discovery-build-image-cross-generic

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ffromani
Copy link
Contributor

/cc

@ffromani
Copy link
Contributor

got the ping, will review ASAP

Copy link
Contributor

@uniemimu uniemimu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this is fine, but that covers just the looks of it, not the usefulness.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: marquiz, ozhuraki, uniemimu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment on lines +380 to +382
attrList := discoverCpuCores()
updateAttributes(&nrt.Attributes, attrList)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL #1964 (comment)
rather than constrain ourselves to the key/value attribute list, I'd explore the option to add new zones describing the E/P split and relationship

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants