-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-5018: DRA: AdminAccess for ResourceClaims and ResourceClaimTemplates #5019
base: master
Are you sure you want to change the base?
Conversation
ritazh
commented
Jan 3, 2025
- One-line PR description: DRAAdminAccess: allow creations of ResourceClaims and ResourceClaimTemplates in privileged mode to grant access to devices that are in use by other users for admin tasks like monitor health or status of the device.
- Issue link: DRA: AdminAccess for ResourceClaims and ResourceClaimTemplates #5018
- Other comments:
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ritazh The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
// | ||
// +required | ||
// +optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we update any stale API definitions in a separate PR so there isn't a suggesting that KEP-5018 is responsible for the change from required to optional?
Signed-off-by: Rita Zhang <[email protected]>
910fe6f
to
677a7ed
Compare
/assign @pohly |
/sig auth |
/sig node |
|
||
As the Dynamic Resource Allocation (DRA) feature evolves, cluster administrators require a privileged mode to grant access to devices already in use by other users. This feature, referred to as DRAAdminAccess, allows administrators to perform tasks such as monitoring device health or status while maintaining device security and integrity. | ||
|
||
This KEP proposes a mechanism for cluster administrators to mark a request in a ResourceClaim or ResourceClaimTemplate with an admin access flag. This flag allows privileged access to devices, enabling administrative tasks without compromising security. Access to this mode is restricted to users authorized to create ResourceClaim or ResourceClaimTemplate objects in namespaces marked with the DRA admin label, ensuring that non-administrative users cannot misuse this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "without compromising security" too strong here? Is it more correct to say something like "This flag allows conditional, privileged access to devices. Conditional access to this mode is restricted..."
|
||
## Summary | ||
|
||
As the Dynamic Resource Allocation (DRA) feature evolves, cluster administrators require a privileged mode to grant access to devices already in use by other users. This feature, referred to as DRAAdminAccess, allows administrators to perform tasks such as monitoring device health or status while maintaining device security and integrity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we distinguish between "adminstrators" and "regular users" here to clarify that the security story? like "This feature, referred to as DRAAdminAccess, allows administrators to perform tasks such as monitoring device health or status across all devices while ensuring that regular uses only have access to run containers on the devices their workloads are scheduled onto." ?
|
||
* Potential conflicts or misuse of shared hardware. | ||
|
||
As the adoption of DRA expands, the lack of privileged administrative access becomes a bottleneck for cluster operations, particularly in shared environments where devices are critical resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"As the adoption of DRA expands, the inability of administrators to perform privileged device introspection becomes a bottlenect for cluster operations,..."
Something like the above gets rid of the word "access", which may be confusing ("lack of privileged admistrative access" is usually something we are trying to ensure! 😄)
resourceClassName: admin-resource-class | ||
adminAccess: true | ||
``` | ||
1. Namespace Label for DRA Admin Mode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As specified (thus far, I haven't yet read the whole doc 😛) it seems that labelded namespace creation must happen before the DRA resources can refer to adminAccess: true
. Should we put this step as the first step in this overview to reflect its serial place in the required order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see in the workflow doc below that you describe the create namespace as the first step, so this comment is less important.
|
||
1. Grants privileged access to the requested device: | ||
|
||
For requests with `adminAccess: true`, the DRA controller bypasses standard allocation checks and allows administrators to access devices already in use. This ensures privileged tasks like monitoring or diagnostics can be performed without disrupting existing allocations. The controller also logs and audits admin-access requests for security and traceability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an assumption that monitoring/diagnostics processes are already running on the underlying host OS, and so any CPU/memory allocation can be guaranteed? And/or do we assume that any new operational headroom required by these privileged tasks is already pre-accounted for, no chance for container allocation to take up 100% of the available headroom of a node's host OS?
|
||
1. No impact on availability of claims: | ||
|
||
The scheduler ignores claims with `adminAccess: true`, normal usage is not impacted as claims in other namespaces can still be allocated using the same devices that are also accessed by workloads in the admin namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this assume that privileged tasks are either (1) not using the specialized hardware (e.g., GPU) or (2) are using specialized hardware in a cooperative way that is non-invasive to assumptions that kubernetes workloads have? For example, if my workload declares expression: "device.attributes['gpu.nvidia.com'].profile == '1g.10gb'"
then I expect to be able to use the entire 10GB of that single GPU. What happens if my k8s workload container needs all 10GB and the entire GPU processing while a privileged task simultaneously has access to that GPU+memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if this assumption is something we can make generally across vendors and devices. I imagine some device eventually may come along that requires exclusive access (i.e. no other allocated claims) for some specific admin task. Could that work if an admin creates two requests in a claim: where one has adminAccess: true
and another that allocates the entire device?
Overall I think the idea that a ResourceClaim isn't claiming any resources in this case is a little confusing. I suppose it is "claiming" some administrative domain though, so is that something that could be represented in a ResourceSlice? e.g. What if a GPU DRA driver also declared a "metrics" device alongside partitions like MIGs, where that "metrics" device would be marked as requiring admin access somehow in the ResourceSlice? That might remove the need to treat admin access specially in some of the resource accounting changes here too.
|
||
1. A cluster administrator labels a namespace with `kubernetes.io/dra-admin-access`. | ||
|
||
1. Authorized users create `ResourceClaim` or `ResourceClaimTemplate` objects with `adminAccess: true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to be clear that the ResourceClaim
or ResourceClaimTemplate
needs to be created in the admin namespace.
|
||
1. Authorized users create `ResourceClaim` or `ResourceClaimTemplate` objects with `adminAccess: true`. | ||
|
||
1. Only users with access to the admin namespace can use them in their pod spec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this clearer in spite of being longer?
"Only users with access to the admin namespace can reference these ResourceClaims
or ResourceClaimTemplates
in new pod or deployment specs."
The `DRAAdminAccess` feature gate controls whether users can set the `adminAccess` field to | ||
true when requesting devices. That is checked in the apiserver. In addition, | ||
the scheduler will not allocate claims with admin access when the feature is | ||
turned off, or if the field was set prior to the feature gate was introduced (for example, set in 1.31 when it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand the "or if the field was set prior..." part.
} | ||
} | ||
if adminRequested { | ||
logger.V(5).Info("ResourceClaim", klog.KRef(claim.Namespace, claim.Name), "has admin access, bypass standard allocation checks") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably not the place for a code review 😜, but I'd want to see this at v=2
``` | ||
### ResourceQuota | ||
Requests asking for `adminAccess` contribute to the quota. In practice, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be more assertive with our language here? Because we are describing something new, we don't know what folks are yet doing in practice. Can we say that we don't recommend enforcing resource quotas or DRA AdminAccess namespaces based on the unique nature of how these workloads co-exist without competing for user resources?
|
||
1. No impact on availability of claims: | ||
|
||
The scheduler ignores claims with `adminAccess: true`, normal usage is not impacted as claims in other namespaces can still be allocated using the same devices that are also accessed by workloads in the admin namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if this assumption is something we can make generally across vendors and devices. I imagine some device eventually may come along that requires exclusive access (i.e. no other allocated claims) for some specific admin task. Could that work if an admin creates two requests in a claim: where one has adminAccess: true
and another that allocates the entire device?
Overall I think the idea that a ResourceClaim isn't claiming any resources in this case is a little confusing. I suppose it is "claiming" some administrative domain though, so is that something that could be represented in a ResourceSlice? e.g. What if a GPU DRA driver also declared a "metrics" device alongside partitions like MIGs, where that "metrics" device would be marked as requiring admin access somehow in the ResourceSlice? That might remove the need to treat admin access specially in some of the resource accounting changes here too.
|
||
### API Changes | ||
|
||
Add `adminAccess` field to `DeviceRequest` which is part of `ResourceClaim` and `ResourceClaimTemplate`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, isn't this field already a part of the API, albeit behind an alpha feature gate?
// omitted | ||
``` | ||
In pkg/controller/resourceclaim/controller.go, process requests in `handleClaim` functino to prevent creation of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
In pkg/controller/resourceclaim/controller.go, process requests in `handleClaim` functino to prevent creation of | |
In pkg/controller/resourceclaim/controller.go, process requests in `handleClaim` function to prevent creation of |