Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Component rules in Prometheus #1503

Merged
merged 4 commits into from
Jan 18, 2025

Conversation

VaishnaviHire
Copy link
Member

@VaishnaviHire VaishnaviHire commented Jan 16, 2025

Description

The scope of this PR is reduced to only watch and update prometheus configmap for components.
jira : RHOAIENG-14787

How Has This Been Tested?

  1. Deploy Managed Instance:

    • Create dummy secrets - (Refer olm install script in gitlab)
    • Deploy Managed Catalogsource: quay.io/vhire/opendatahub-operator-catalog:v2.17.0-rc4
  2. Expect alerts only for components that are in Managed state

Screenshot 2025-01-17 at 1 10 11 AM Screenshot 2025-01-17 at 1 11 10 AM

Screenshot or short clip

Merge criteria

  • You have read the contributors guide.
  • Commit messages are meaningful - have a clear and concise summary and detailed explanation of what was changed and why.
  • Pull Request contains a description of the solution, a link to the JIRA issue, and to any dependent or related Pull Request.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

# Copy monitoring config
COPY config/monitoring/ /opt/manifests/monitoring
# Copy partners config
COPY config/partners/ /opt/manifests/partners
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can skip partners one, with #1504

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can skip partners one, with #1504

Why for building it cannot copy just everything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the question about why not make it as COPY config/ /opt/manifests/ in Dockefile?
if so, we have more under config folder which no need to get into final image

@VaishnaviHire VaishnaviHire force-pushed the add_monitoring_ctrl branch 2 times, most recently from 60d1f2c to ef677d0 Compare January 16, 2025 19:52
@VaishnaviHire VaishnaviHire marked this pull request as ready for review January 16, 2025 19:53
@openshift-ci openshift-ci bot requested review from grdryn and ugiordan January 16, 2025 19:53
@VaishnaviHire VaishnaviHire changed the title Implement Monitoring Controller Update Component rules in Prometheus Jan 16, 2025
@VaishnaviHire VaishnaviHire force-pushed the add_monitoring_ctrl branch 2 times, most recently from 9da2003 to b9d91bd Compare January 17, 2025 03:48
@VaishnaviHire
Copy link
Member Author

/test opendatahub-operator-e2e

Copy link

codecov bot commented Jan 17, 2025

Codecov Report

Attention: Patch coverage is 0.79681% with 249 lines in your changes missing coverage. Please review.

Project coverage is 19.69%. Comparing base (e838877) to head (29a8c29).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
controllers/services/monitoring/monitoring.go 0.00% 88 Missing ⚠️
...rvices/monitoring/monitoring_controller_actions.go 0.00% 67 Missing ⚠️
.../dscinitialization/dscinitialization_controller.go 4.87% 39 Missing ⚠️
pkg/controller/predicates/resources/resources.go 0.00% 23 Missing ⚠️
pkg/controller/handlers/handlers.go 0.00% 15 Missing ⚠️
pkg/controller/predicates/component/component.go 0.00% 11 Missing ⚠️
...llers/services/monitoring/monitoring_controller.go 0.00% 3 Missing ⚠️
controllers/dscinitialization/monitoring.go 0.00% 2 Missing ⚠️
pkg/services/monitoring/prometheus.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1503      +/-   ##
==========================================
- Coverage   20.04%   19.69%   -0.35%     
==========================================
  Files         160      161       +1     
  Lines       10900    11102     +202     
==========================================
+ Hits         2185     2187       +2     
- Misses       8483     8683     +200     
  Partials      232      232              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

controllers/dscinitialization/monitoring.go Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
}),
)
err := cr.ForEach(func(ch cr.ComponentHandler) error {
enabled := cr.IsManaged(ch, dsc) && dsc.Status.Phase == status.ReadySuffix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure about this part "dsc.Status.Phase == status.ReadySuffix"
do we need a special case on odh-model-controller?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Status is for Dsc, so we should not need any additional condition for odh-model-controller

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get the actual component status, a better way would be:

ci := component.NewCRObject(instance)

// read the component instance to get tha actual status
err = cli.Get(ctx, client.ObjectKeyFromObject(ci), ci)
switch {
case k8serr.IsNotFound(err):
    enabled = false
case err != nil:
    enabled = false
    // TODO: error handling
default:     
    enabled = meta.IsStatusConditionTrue(ci.GetStatus().Conditions, status.ConditionTypeReady) 
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking a little bit more about this comment, we probably want to distinguish between enabled and ready. If I recall the old implementation, we were supposed to install the rule after the first time the component become ready, and we remove the rules of the component gets removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like also mention this issue red-hat-data-services#316 which was from my opinion hidden by changing of timings.

@VaishnaviHire VaishnaviHire force-pushed the add_monitoring_ctrl branch 6 times, most recently from 71a8618 to 9db60ba Compare January 17, 2025 13:50
@openshift-ci openshift-ci bot removed the lgtm label Jan 17, 2025
@openshift-ci openshift-ci bot added the lgtm label Jan 17, 2025
@VaishnaviHire
Copy link
Member Author

/test opendatahub-operator-e2e

return nil
}
dsc := &dscList.Items[0]
componentRules := map[string]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this map can probably be moved to a global var

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

}

m.Status.Phase = "Ready"
m.Status.ObservedGeneration = m.GetObjectMeta().GetGeneration()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we should also have a Ready condition

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an NoReady case, only set to Ready when Monitoring Prom deployment is ready

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a ready Condition, phase is not that much in use

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, did not see this, let me get a new one next week for this comment

@openshift-ci openshift-ci bot removed the lgtm label Jan 17, 2025
@zdtsw zdtsw force-pushed the add_monitoring_ctrl branch from 96b545c to cab3a3a Compare January 17, 2025 17:21
switch {
case err != nil:
enabled = false
if !k8serr.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could have been a switch case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as here, will have the new commit to follow

@openshift-ci openshift-ci bot added the lgtm label Jan 18, 2025
Copy link

openshift-ci bot commented Jan 18, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zdtsw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zdtsw zdtsw merged commit 9b062f6 into opendatahub-io:main Jan 18, 2025
9 of 10 checks passed
MarianMacik pushed a commit to MarianMacik/opendatahub-operator that referenced this pull request Jan 22, 2025
…flux/component-updates/odh-trustyai-service-operator-v2-17

chore(deps): update odh-trustyai-service-operator-v2-17 to db6a9f5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants