-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-15937 test: Automate metadata duplicate rpc detection time consuming #14473
base: master
Are you sure you want to change the base?
Conversation
Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-nlt: true Skip-build-leap15-rpm: true Skip-scan-leap15-rpms: true Allow-unstable-test: true Doc-only: false Test-tag: sec_basic Required-githooks: true Signed-off-by: Ding Ho [email protected]
Test only, please do not merge. Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Test-tag: pr daily_regression full_regression Skip-func-hw-test-medium-md-on-ssd: false Skip-func-hw-test-medium-verbs-provider-md-on-ssd: false Skip-func-hw-test-medium-ucx-provider: false Skip-func-hw-test-large-md-on-ssd: false Allow-unstable-test: true Doc-only: false Required-githooks: true Signed-off-by: Ding Ho [email protected] Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Allow-unstable-test: true Doc-only: false Test-nvme: auto_md_on_ssd Test-tag: pr daily_regression full_regression Required-githooks: true Signed-off-by: Ding Ho [email protected]
Required-githooks: true
Required-githooks: true
Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-nlt: true Skip-build-leap15-rpm: true Skip-scan-leap15-rpms: true Allow-unstable-test: true Doc-only: false Test-tag: sec_basic Required-githooks: true Signed-off-by: Ding Ho [email protected]
Test only, please do not merge. Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Test-tag: pr daily_regression full_regression Skip-func-hw-test-medium-md-on-ssd: false Skip-func-hw-test-medium-verbs-provider-md-on-ssd: false Skip-func-hw-test-medium-ucx-provider: false Skip-func-hw-test-large-md-on-ssd: false Allow-unstable-test: true Doc-only: false Required-githooks: true Signed-off-by: Ding Ho [email protected] Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Allow-unstable-test: true Doc-only: false Test-nvme: auto_md_on_ssd Test-tag: pr daily_regression full_regression Required-githooks: true Signed-off-by: Ding Ho [email protected]
Required-githooks: true
Required-githooks: true
Required-githooks: true
…ming Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-large-md-on-ssd: false Test-tag: test_metadata_dup_rpc Doc-only: false Required-githooks: true Signed-off-by: Ding Ho <[email protected]>
Errors are Unable to load ticket data |
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-large-md-on-ssd: false Test-tag: test_metadata_dup_rpc Doc-only: false Required-githooks: true Signed-off-by: Ding Ho <[email protected]>
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14473/2/testReport/ |
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium-md-on-ssd: false Test-tag: test_metadata_dup_rpc Doc-only: false Required-githooks: true Signed-off-by: Ding Ho <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first pass review
:avocado: tags=server,metadata | ||
:avocado: tags=DuplicateRpcDetection,test_metadata_dup_rpc | ||
""" | ||
self.dmg.server_set_logmasks("DEBUG", raise_exception=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will raise engine log_mask to DEBUG, which can be useful during setup phases. But it should be restored to the original log_mask: ERR that the .yaml file specifies) beforestarting timing loops.
i.e., probably need another call to server_set_logmasks to restore the original setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. server log mask defined by test yaml.
self.log_step("Create containers by ThreadManager.") | ||
container_manager = ThreadManager( | ||
self.metadata_workload_test, self.get_remaining_time() - 30) | ||
container_manager.add( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the .yaml file specifies multiple threads, wouldn't the code here need to perform that number of container_manager.add() calls, one per thread?
Also, passing cont_num here maybe isn't so useful for the thread function metadata_workload_test() - maybe it can be removed from the arguments to that method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment.
When multiple threads specified (in testcase-2, 2, 4, or 8 threads), each thread will create it's own container and perform their individual metadata workload.
Have updated. It should be number_of_container (instead of cont_num).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the yaml file specifies number_thread: 2 (anything larger than 1), then this code will need to change so that it has a loop to call container_manager.add() calls so that all of the threads get created, is that right?
==>when yaml file specifies number_thread: 2 (4 or 8), script will generate different number of container-thread (2, 4 or 8) on the same pool (thread). Script does not need to change. This has been tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @kccain here. If num_of_cont
is increased, this still only runs a single thread because the code only calls container_manager.add
one time
script will generate different number of container-thread (2, 4 or 8) on the same pool (thread)
Where? What line(s) of code does this?
==> Please see the new commit. I missed a loop for adding the container-thread.
Thanks for catching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
container_manager = ThreadManager(
self.metadata_workload_test, self.get_remaining_time() - 30)
container_manager.add(
pool=self.pool, num_of_cont=num_of_cont, workload_cycles=w_cycles, test_loops=t_loops)
That is one thread, no matter what num_of_cont
is
self.log.info("pool1 results = %s", results[0].result) | ||
self.log.info("baseline results = %s", base_results[0].result) | ||
self.log.info("average baseline result= %s", average_time) | ||
for result in results[0].result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test steps documentation for step 4 should not specify that it is calculating an average for those loops following svc_ops_entry_age time.
This verification seems OK, since we expect the "early" loops before svc_ops_entry_age to be quicker than what the performance will "stabilize" to after svc_ops_entry_age time. i.e., all timings before and after svc_ops_entry_age time should fit underneath the baseline times the threshold_factor. But we particularly care about the iterations after svc_ops_entry_age time has passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the new commit, which should have addressed all the comments.
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium-md-on-ssd: false Test-tag: test_metadata_dup_rpc Doc-only: false Required-githooks: true Signed-off-by: Ding Ho <[email protected]>
Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-nlt: true Skip-build-leap15-rpm: true Skip-scan-leap15-rpms: true Allow-unstable-test: true Doc-only: false Test-tag: sec_basic Required-githooks: true Signed-off-by: Ding Ho [email protected]
Test only, please do not merge. Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Test-tag: pr daily_regression full_regression Skip-func-hw-test-medium-md-on-ssd: false Skip-func-hw-test-medium-verbs-provider-md-on-ssd: false Skip-func-hw-test-medium-ucx-provider: false Skip-func-hw-test-large-md-on-ssd: false Allow-unstable-test: true Doc-only: false Required-githooks: true Signed-off-by: Ding Ho [email protected] Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Allow-unstable-test: true Doc-only: false Test-nvme: auto_md_on_ssd Test-tag: pr daily_regression full_regression Required-githooks: true Signed-off-by: Ding Ho [email protected]
Required-githooks: true
Required-githooks: true
Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-nlt: true Skip-build-leap15-rpm: true Skip-scan-leap15-rpms: true Allow-unstable-test: true Doc-only: false Test-tag: sec_basic Required-githooks: true Signed-off-by: Ding Ho [email protected]
Test only, please do not merge. Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Test-tag: pr daily_regression full_regression Skip-func-hw-test-medium-md-on-ssd: false Skip-func-hw-test-medium-verbs-provider-md-on-ssd: false Skip-func-hw-test-medium-ucx-provider: false Skip-func-hw-test-large-md-on-ssd: false Allow-unstable-test: true Doc-only: false Required-githooks: true Signed-off-by: Ding Ho [email protected] Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Allow-unstable-test: true Doc-only: false Test-nvme: auto_md_on_ssd Test-tag: pr daily_regression full_regression Required-githooks: true Signed-off-by: Ding Ho [email protected]
Required-githooks: true
Required-githooks: true
Required-githooks: true
Skip-unit-tests: true Skip-fault-injection-test: true Skip-build-ubuntu20-rpm: true Skip-func-hw-test-medium-md-on-ssd: false Test-tag: test_metadata_dup_rpc Doc-only: false Required-githooks: true Signed-off-by: Ding Ho <[email protected]>
Test only, please do not merge. Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Test-tag: pr daily_regression full_regression Skip-func-hw-test-medium-md-on-ssd: false Skip-func-hw-test-medium-verbs-provider-md-on-ssd: false Skip-func-hw-test-medium-ucx-provider: false Skip-func-hw-test-large-md-on-ssd: false Allow-unstable-test: true Doc-only: false Required-githooks: true Signed-off-by: Ding Ho [email protected] Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Allow-unstable-test: true Doc-only: false Test-nvme: auto_md_on_ssd Test-tag: pr daily_regression full_regression Required-githooks: true Signed-off-by: Ding Ho [email protected]
Required-githooks: true
Required-githooks: true
Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-nlt: true Skip-build-leap15-rpm: true Skip-scan-leap15-rpms: true Allow-unstable-test: true Doc-only: false Test-tag: sec_basic Required-githooks: true Signed-off-by: Ding Ho [email protected]
Test only, please do not merge. Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Test-tag: pr daily_regression full_regression Skip-func-hw-test-medium-md-on-ssd: false Skip-func-hw-test-medium-verbs-provider-md-on-ssd: false Skip-func-hw-test-medium-ucx-provider: false Skip-func-hw-test-large-md-on-ssd: false Allow-unstable-test: true Doc-only: false Required-githooks: true Signed-off-by: Ding Ho [email protected] Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Allow-unstable-test: true Doc-only: false Test-nvme: auto_md_on_ssd Test-tag: pr daily_regression full_regression Required-githooks: true Signed-off-by: Ding Ho [email protected]
Required-githooks: true
Required-githooks: true
Required-githooks: true
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium-md-on-ssd: false Test-tag: test_metadata_dup_rpc Test-repeat: 10 Doc-only: false Required-githooks: true Signed-off-by: Ding Ho <[email protected]> Signed-off-by: Ding Ho <[email protected]>
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14473/14/testReport/ |
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14473/14/testReport/ |
The job.log has 200,000 lines of output, 100,000 pairs of items like this, that are output during the timing loops. We may want to look at making the output quiet. I think I had experimented with this at some point, but not sure if I can find the prototype code I used since it has been a while.
Failure reason From the job.log, the first 10 test-loops with svc_ops_enabled:1
I have an experimental PR #14997 to try to establish a baseline timing expectation for pools with svc_ops_enabled:1 and svc_ops_entry_age:60, on a variety of HW cluster configurations, by running the existing C code daos_test -c --subtests="34" (co_op_dup_timing). That test is run daily, but is done on Functional Hardware Medium Verbs Provider, and Functional Hardware Medium Verbs Provider MD on SSD configurations (whereas this PR executes on Functional Hardware Medium, and Functional Hardware Medium MD on SSD). Experimental PR 14997 results:
|
Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-nlt: true Skip-build-leap15-rpm: true Skip-scan-leap15-rpms: true Allow-unstable-test: true Doc-only: false Test-tag: sec_basic Required-githooks: true Signed-off-by: Ding Ho [email protected]
Test only, please do not merge. Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Test-tag: pr daily_regression full_regression Skip-func-hw-test-medium-md-on-ssd: false Skip-func-hw-test-medium-verbs-provider-md-on-ssd: false Skip-func-hw-test-medium-ucx-provider: false Skip-func-hw-test-large-md-on-ssd: false Allow-unstable-test: true Doc-only: false Required-githooks: true Signed-off-by: Ding Ho [email protected] Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Allow-unstable-test: true Doc-only: false Test-nvme: auto_md_on_ssd Test-tag: pr daily_regression full_regression Required-githooks: true Signed-off-by: Ding Ho [email protected]
Required-githooks: true
Required-githooks: true
Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-nlt: true Skip-build-leap15-rpm: true Skip-scan-leap15-rpms: true Allow-unstable-test: true Doc-only: false Test-tag: sec_basic Required-githooks: true Signed-off-by: Ding Ho [email protected]
Test only, please do not merge. Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Test-tag: pr daily_regression full_regression Skip-func-hw-test-medium-md-on-ssd: false Skip-func-hw-test-medium-verbs-provider-md-on-ssd: false Skip-func-hw-test-medium-ucx-provider: false Skip-func-hw-test-large-md-on-ssd: false Allow-unstable-test: true Doc-only: false Required-githooks: true Signed-off-by: Ding Ho [email protected] Skip-fnbullseye: false Skip-bullseye: false Skip-python-bandit: true Skip-build-EL9-rpm: true Allow-unstable-test: true Doc-only: false Test-nvme: auto_md_on_ssd Test-tag: pr daily_regression full_regression Required-githooks: true Signed-off-by: Ding Ho [email protected]
Required-githooks: true
Required-githooks: true
Required-githooks: true
Required-githooks: true
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium-md-on-ssd: false Test-tag: test_metadata_dup_rpc Test-repeat: 10 Doc-only: false Required-githooks: true Signed-off-by: Ding Ho <[email protected]>
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14473/15/testReport/ |
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14473/15/testReport/ |
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium-md-on-ssd: false Test-tag: test_metadata_dup_rpc Test-repeat: 2 Doc-only: false Required-githooks: true Signed-off-by: Ding Ho <[email protected]>
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14473/16/testReport/ |
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14473/16/testReport/ |
updated my comment from a few days ago, after updating the debug PR to get a proper 1000 container open/close loop timing in the daos_test -c co_op_dup_timing() with svc_ops_enabled:1 and svc_ops_entry_age:60, and preceding the 1000 loop timing with a 60 second warmup loop. The timings measured were 2.969 seconds (Functional HW Medium) and 1.617 seconds (Functional HW Medium MD on SSD) |
Automate metadata duplicate rpc detection time consuming
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-large-md-on-ssd: false
Test-tag: test_metadata_dup_rpc
Doc-only: false
Required-githooks: true
Signed-off-by: Ding Ho [email protected]
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: