-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16251 pool: DEBUG patch, IV pool map buf investigation #14929
base: release/2.6
Are you sure you want to change the base?
DAOS-16251 pool: DEBUG patch, IV pool map buf investigation #14929
Conversation
Adds debug logging in IV code, to examine pool map buffer corruption scenarios: - possible prevention of uninitialized d_sg_list_t in crt_hdlr_iv_sync_aux() and call_pre_sync_cb() which could theoretically impact pool buffer map contents from IV communication. And, adds some associated logging. - crt_ivsync_issue_rpc() explicitly log if bulk or inline corpc will be used. To correspond to the crt_hdlr_iv_sync_aux() and call_pre_sync_cb() logging. And, in case it becomes needed during investigation, this change also contains a cherry-pick of PR 14702: DAOS-16164 pool: Update target status to UPIN for no_data_sync mode Allow-unstable-test: true faults-enabled: false Co-authored-by: Alexander A Oganezov <[email protected]> Signed-off-by: Kenneth Cain <[email protected]>
Ticket title is 'DAOS 2.4.2-4: Errored DAOS engine 0 exited unexpectedly on daos_user' |
for more detailed information. Add RPC_INFO() macro and use in IV sync code path logging rather than D_INFO(). Signed-off-by: Kenneth Cain <[email protected]>
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14929/2/testReport/ |
Allow-unstable-test: true faults-enabled: false Signed-off-by: Kenneth Cain <[email protected]>
In crt_hg_unpack_header(), log when the RPC header is known to have been transferred via bulk. Allow-unstable-test: true faults-enabled: false Signed-off-by: Kenneth Cain <[email protected]>
- ivc_on_get stores random entry_priv_val into priv_entry for many ivc_ent_get implementations. Although not used, this should be avoided. - ds_iv_done stores pointer to stack variable rc in cb_info->future, which outlives the stack frame of ds_iv_done. Although not used, this pointer is confusing. - ds_pool_iv_map_update associates the input map buffer with the map version from ds_pool, rather than the input map version. Although this may be fine, we should really not ask for unnecessary trouble/concern. Signed-off-by: Li Wei <[email protected]> Signed-off-by: Kenneth Cain <[email protected]>
- Switch rpc headers to transfer deadline instead of a timeout - Add checks at the start and end of bulk transfer to ensure deadline has not expired. - Add deadline expiration checks in all places where rpc_priv timeout is initialized Allow-unstable-test: true faults-enabled: false Signed-off-by: Alexander A Oganezov <[email protected]> Signed-off-by: Kenneth Cain <[email protected]>
Allow-unstable-test: true faults-enabled: false Skip-nlt: true Skip-fault-injection-test: true Signed-off-by: Kenneth Cain <[email protected]>
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14929/6/execution/node/1270/log |
This reverts commit d01278b.
…bug_and_reint_no_data_sync Signed-off-by: Kenneth Cain <[email protected]>
Allow-unstable-test: true faults-enabled: false Signed-off-by: Kenneth Cain <[email protected]>
…bug_and_reint_no_data_sync Signed-off-by: Kenneth Cain <[email protected]>
Allow-unstable-test: true faults-enabled: false Signed-off-by: Kenneth Cain <[email protected]>
Allow-unstable-test: true faults-enabled: false Signed-off-by: Kenneth Cain <[email protected]>
Adds debug logging in IV code, to examine pool map buffer corruption scenarios:
And, in case it becomes needed during investigation, this change also contains a cherry-pick of PR 14702:
DAOS-16164 pool: Update target status to UPIN for no_data_sync mode
Finally, includes a manual cherry pick of PR 14971, aaoganez/rpc-bulk-deadlines
Allow-unstable-test: true
faults-enabled: false
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: