Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-5018: DRA: AdminAccess for ResourceClaims and ResourceClaimTemplates #5019

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ritazh
Copy link
Member

@ritazh ritazh commented Jan 3, 2025

  • One-line PR description: DRAAdminAccess: allow creations of ResourceClaims and ResourceClaimTemplates in privileged mode to grant access to devices that are in use by other users for admin tasks like monitor health or status of the device.
  • Other comments:

NOTE: The DRAAdminAccess feature was initially part of https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters In 1.32, the DRAAdminAccess feature gate was added to keep the adminAccess field in alpha while promoting structured parameters to beta. It was discussed this feature should be part of a separate KEP to push it forward. Most references to DRAAdminAccess are removed from the original KEP and moved to this KEP.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 3, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ritazh
Once this PR has been reviewed and has the lgtm label, please assign derekwaynecarr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 3, 2025
//
// +required
// +optional
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we update any stale API definitions in a separate PR so there isn't a suggesting that KEP-5018 is responsible for the change from required to optional?

Signed-off-by: Rita Zhang <[email protected]>
@kubernetes kubernetes deleted a comment from k8s-ci-robot Jan 3, 2025
@ritazh
Copy link
Member Author

ritazh commented Jan 3, 2025

/assign @pohly

@ritazh
Copy link
Member Author

ritazh commented Jan 3, 2025

/sig auth

@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Jan 3, 2025
@ritazh
Copy link
Member Author

ritazh commented Jan 6, 2025

/sig node


As the Dynamic Resource Allocation (DRA) feature evolves, cluster administrators require a privileged mode to grant access to devices already in use by other users. This feature, referred to as DRAAdminAccess, allows administrators to perform tasks such as monitoring device health or status while maintaining device security and integrity.

This KEP proposes a mechanism for cluster administrators to mark a request in a ResourceClaim or ResourceClaimTemplate with an admin access flag. This flag allows privileged access to devices, enabling administrative tasks without compromising security. Access to this mode is restricted to users authorized to create ResourceClaim or ResourceClaimTemplate objects in namespaces marked with the DRA admin label, ensuring that non-administrative users cannot misuse this feature.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "without compromising security" too strong here? Is it more correct to say something like "This flag allows conditional, privileged access to devices. Conditional access to this mode is restricted..."


## Summary

As the Dynamic Resource Allocation (DRA) feature evolves, cluster administrators require a privileged mode to grant access to devices already in use by other users. This feature, referred to as DRAAdminAccess, allows administrators to perform tasks such as monitoring device health or status while maintaining device security and integrity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we distinguish between "adminstrators" and "regular users" here to clarify that the security story? like "This feature, referred to as DRAAdminAccess, allows administrators to perform tasks such as monitoring device health or status across all devices while ensuring that regular uses only have access to run containers on the devices their workloads are scheduled onto." ?


* Potential conflicts or misuse of shared hardware.

As the adoption of DRA expands, the lack of privileged administrative access becomes a bottleneck for cluster operations, particularly in shared environments where devices are critical resources.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"As the adoption of DRA expands, the inability of administrators to perform privileged device introspection becomes a bottlenect for cluster operations,..."

Something like the above gets rid of the word "access", which may be confusing ("lack of privileged admistrative access" is usually something we are trying to ensure! 😄)

resourceClassName: admin-resource-class
adminAccess: true
```
1. Namespace Label for DRA Admin Mode:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As specified (thus far, I haven't yet read the whole doc 😛) it seems that labelded namespace creation must happen before the DRA resources can refer to adminAccess: true. Should we put this step as the first step in this overview to reflect its serial place in the required order?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see in the workflow doc below that you describe the create namespace as the first step, so this comment is less important.


1. Grants privileged access to the requested device:

For requests with `adminAccess: true`, the DRA controller bypasses standard allocation checks and allows administrators to access devices already in use. This ensures privileged tasks like monitoring or diagnostics can be performed without disrupting existing allocations. The controller also logs and audits admin-access requests for security and traceability.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an assumption that monitoring/diagnostics processes are already running on the underlying host OS, and so any CPU/memory allocation can be guaranteed? And/or do we assume that any new operational headroom required by these privileged tasks is already pre-accounted for, no chance for container allocation to take up 100% of the available headroom of a node's host OS?


1. No impact on availability of claims:

The scheduler ignores claims with `adminAccess: true`, normal usage is not impacted as claims in other namespaces can still be allocated using the same devices that are also accessed by workloads in the admin namespace.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this assume that privileged tasks are either (1) not using the specialized hardware (e.g., GPU) or (2) are using specialized hardware in a cooperative way that is non-invasive to assumptions that kubernetes workloads have? For example, if my workload declares expression: "device.attributes['gpu.nvidia.com'].profile == '1g.10gb'" then I expect to be able to use the entire 10GB of that single GPU. What happens if my k8s workload container needs all 10GB and the entire GPU processing while a privileged task simultaneously has access to that GPU+memory?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if this assumption is something we can make generally across vendors and devices. I imagine some device eventually may come along that requires exclusive access (i.e. no other allocated claims) for some specific admin task. Could that work if an admin creates two requests in a claim: where one has adminAccess: true and another that allocates the entire device?

Overall I think the idea that a ResourceClaim isn't claiming any resources in this case is a little confusing. I suppose it is "claiming" some administrative domain though, so is that something that could be represented in a ResourceSlice? e.g. What if a GPU DRA driver also declared a "metrics" device alongside partitions like MIGs, where that "metrics" device would be marked as requiring admin access somehow in the ResourceSlice? That might remove the need to treat admin access specially in some of the resource accounting changes here too.


1. A cluster administrator labels a namespace with `kubernetes.io/dra-admin-access`.

1. Authorized users create `ResourceClaim` or `ResourceClaimTemplate` objects with `adminAccess: true`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to be clear that the ResourceClaim or ResourceClaimTemplate needs to be created in the admin namespace.


1. Authorized users create `ResourceClaim` or `ResourceClaimTemplate` objects with `adminAccess: true`.

1. Only users with access to the admin namespace can use them in their pod spec.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this clearer in spite of being longer?

"Only users with access to the admin namespace can reference these ResourceClaims or ResourceClaimTemplates in new pod or deployment specs."

The `DRAAdminAccess` feature gate controls whether users can set the `adminAccess` field to
true when requesting devices. That is checked in the apiserver. In addition,
the scheduler will not allocate claims with admin access when the feature is
turned off, or if the field was set prior to the feature gate was introduced (for example, set in 1.31 when it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand the "or if the field was set prior..." part.

}
}
if adminRequested {
logger.V(5).Info("ResourceClaim", klog.KRef(claim.Namespace, claim.Name), "has admin access, bypass standard allocation checks")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably not the place for a code review 😜, but I'd want to see this at v=2

```
### ResourceQuota
Requests asking for `adminAccess` contribute to the quota. In practice,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more assertive with our language here? Because we are describing something new, we don't know what folks are yet doing in practice. Can we say that we don't recommend enforcing resource quotas or DRA AdminAccess namespaces based on the unique nature of how these workloads co-exist without competing for user resources?


1. No impact on availability of claims:

The scheduler ignores claims with `adminAccess: true`, normal usage is not impacted as claims in other namespaces can still be allocated using the same devices that are also accessed by workloads in the admin namespace.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if this assumption is something we can make generally across vendors and devices. I imagine some device eventually may come along that requires exclusive access (i.e. no other allocated claims) for some specific admin task. Could that work if an admin creates two requests in a claim: where one has adminAccess: true and another that allocates the entire device?

Overall I think the idea that a ResourceClaim isn't claiming any resources in this case is a little confusing. I suppose it is "claiming" some administrative domain though, so is that something that could be represented in a ResourceSlice? e.g. What if a GPU DRA driver also declared a "metrics" device alongside partitions like MIGs, where that "metrics" device would be marked as requiring admin access somehow in the ResourceSlice? That might remove the need to treat admin access specially in some of the resource accounting changes here too.


### API Changes

Add `adminAccess` field to `DeviceRequest` which is part of `ResourceClaim` and `ResourceClaimTemplate`:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, isn't this field already a part of the API, albeit behind an alpha feature gate?

// omitted
```
In pkg/controller/resourceclaim/controller.go, process requests in `handleClaim` functino to prevent creation of
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
In pkg/controller/resourceclaim/controller.go, process requests in `handleClaim` functino to prevent creation of
In pkg/controller/resourceclaim/controller.go, process requests in `handleClaim` function to prevent creation of

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.

5 participants