Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split Intel AI Tools CPU and GPU images. #521

Conversation

sharvil10
Copy link
Contributor

This PR splits the Intel AI Tools images into CPU and GPU images for Intel TensorFlow and Intel PyTorch.

Description

This PR will split the Intel AI Tools CPU & GPU images into separate images. Th exact changes are described below.

  1. Split Intel PyTorch into Intel PyTorch CPU & XPU images.
  2. Split Intel TensorFlow into Intel TensorFlow CPU & XPU images.
  3. Change base image of Intel Jupyter images to be the ubi9 base images instead of the runtime images.

How Has This Been Tested?

This was tested by running make commands to build containers, deploy K8 resources and testing images.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@openshift-ci openshift-ci bot requested review from atheo89 and harshad16 May 8, 2024 22:03
Copy link
Contributor

openshift-ci bot commented May 8, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vaishnavihire for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

openshift-ci bot commented May 8, 2024

Hi @sharvil10. Thanks for your PR.

I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sharvil10
Copy link
Contributor Author

OpenShift Release PR: openshift/release#51822

@sharvil10 sharvil10 force-pushed the sharvils/intel_cpu_gpu_separate branch from d211b84 to 5c7c4d5 Compare May 9, 2024 02:46
@atheo89
Copy link
Member

atheo89 commented May 9, 2024

/ok-to-test

@sharvil10
Copy link
Contributor Author

Seems like it failed because the env variables were wrong for the jupyter images. I fixed it in this PR on OpenShift Release CI #51853.

@sharvil10
Copy link
Contributor Author

/retest

@sharvil10
Copy link
Contributor Author

Is it okay to request more resources(CPU and Memory) in the statefulset of jupyter containers to test them? Also, the tests seem to fail arbitrarily locally as well. It works sometimes and sometimes it doesn't with the same error as seen here.

@sharvil10
Copy link
Contributor Author

/retest

Copy link
Contributor

openshift-ci bot commented May 13, 2024

@sharvil10: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/notebooks-e2e-tests 5c7c4d5 link true /test notebooks-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@sharvil10 sharvil10 closed this May 13, 2024
@harshad16
Copy link
Member

@sharvil10 Is it okay to request more resources(CPU and Memory) in the statefulset of jupyter containers to test them? Also, the tests seem to fail arbitrarily locally as well. It works sometimes and sometimes it doesn't with the same error as seen here.

In the Opendatahub, user would have the option to pick different resource limits,
if the question about the resource limits in the testings, we would have to look it up.

Was there a reason to close this PR ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants