Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11784. Add missing parent directories for abort MPU and abort expired MPU requests #7566

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

sokui
Copy link
Contributor

@sokui sokui commented Dec 12, 2024

What changes were proposed in this pull request?

HDDS-11784 adding missing parent directories for MPU abort and expired abort request

We observed lots of open key (files) in our FSO enabled ozone cluster. And these are all incomplete MPU keys.

When I tried to abort MPU by using s3 cli as below, I got the exception complaining about the parent directory is not found.

aws s3api abort-multipart-upload --endpoint 'xxxx' --bucket '2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2' --key 'CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob' --upload-id '4103c881-24fa-4992-b7b2-5474f8a7fbaf-113204926929050074'

An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: The specified multipart upload does not exist. The upload ID might be invalid, or the multipart upload might have been aborted or completed.

Exceptions in the log

NO_SUCH_MULTIPART_UPLOAD_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Abort Multipart Upload Failed: volume: s3v, bucket: 2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2, key: CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob
at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:148)
at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:402)
at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:39)
at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:398)
at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:587)
at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:375)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: DIRECTORY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Failed to find parent directory of CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob
at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:1038)
at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:988)
at org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKeyFSO(OMMultipartUploadUtils.java:122)
at org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKey(OMMultipartUploadUtils.java:99)
at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.getMultipartOpenKey(S3MultipartUploadAbortRequest.java:256)
at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:145)
... 9 more

This issue is similar as the issue HDDS-10630. This PR brings the same mechanism which adds the missing parent directories to the table cache before we abort the MPU. The PR adds this mechanism to both S3ExpiredMultipartUploadsAbortRequest and S3ExpiredMultipartUploadsAbortRequest. Also the code is refactored so multiple places reuse the same mechanism

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11784

How was this patch tested?

It tested by the CI. Also we validate it in our cluster (on going).

@adoroszlai
Copy link
Contributor

It tested by the CI

This is usually expected to mean "CI completed successfully with these changes". However, in this case, as you have noticed, some tests are failing:

https://github.com/sokui/ozone/actions/runs/12286612359/job/34288154427#step:6:759

Unfortunately CI does not provide logs, because individual test cases after the failing one keep timing out, eventually reaching overall CI timeout. I'll make a note to improve that.

You can run the same test locally.

First, reduce test cases to minimum:

  1. Edit hadoop-ozone/dist/src/main/compose/ozone-ha/test.sh:
    • delete everything after start_docker_env 5
    • add execute_robot_test ${SCM} -v BUCKET_LAYOUT:FILE_SYSTEM_OPTIMIZED s3/MultipartUpload.robot
  2. Edit hadoop-ozone/dist/src/main/smoketest/s3/MultipartUpload.robot
    • delete all test cases after Test abort Multipart upload with invalid uploadId
mvn -DskipTests -DskipShade clean package
cd hadoop-ozone/dist/target/ozone-2.0.0-SNAPSHOT/compose/ozone-ha
./test.sh

I tried this locally, it seems OM terminates when aborting unknown MPU. Can you please check?

@sokui
Copy link
Contributor Author

sokui commented Dec 13, 2024

I fixed the test error. But the new CI run has another test failure org.apache.hadoop.fs.ozone.TestHSync . However, when I run this test locally, it has no issues. I am not sure if this test is already flaky. See the CI run on my repo here

@adoroszlai
Copy link
Contributor

Thanks @sokui for fixing the test, will take a look. Don't worry about TestHSync, it's unrelated and flaky.

@adoroszlai
Copy link
Contributor

@sokui
Copy link
Contributor Author

sokui commented Dec 13, 2024

@sokui Please try to avoid force-push when updating the PR. Here are some great articles that explain why:

https://developers.mattermost.com/blog/submitting-great-prs/#4-avoid-force-pushing https://www.freecodecamp.org/news/optimize-pull-requests-for-reviewer-happiness#request-a-review

Sure thing. I was try to cherry pick my change as one commit and put it above 1.4.1, and use it as our internal build for testing. That is why I do the force push. I will not force push for my future commits. Thanks for reminder.

@adoroszlai adoroszlai changed the title HDDS-11784 adding missing parent directories for MPU abort and expired abort request HDDS-11784. Add missing parent directories for abort MPU and abort expired MPUE requests Dec 14, 2024
@adoroszlai adoroszlai changed the title HDDS-11784. Add missing parent directories for abort MPU and abort expired MPUE requests HDDS-11784. Add missing parent directories for abort MPU and abort expired MPU requests Dec 14, 2024
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sokui for the patch.

Comment on lines 1380 to 1385
protected void addMissingParentsToCache(OmBucketInfo omBucketInfo,
List<OmDirectoryInfo> missingParentInfos,
OMMetadataManager omMetadataManager,
long volumeId,
long bucketId,
long transactionLogIndex) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not format method signature like this. Whenever visibility / return type / method name / other modifiers are changed, we would have to reindent all parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any code style format file that I can import to my IntelliJ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/ozone/blob/master/hadoop-ozone/dev-support/intellij/ozone-style.xml

I'm not sure if it has anything about this particular preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines -549 to -563
protected void addMissingParentsToCache(OmBucketInfo omBucketInfo,
List<OmDirectoryInfo> missingParentInfos,
OMMetadataManager omMetadataManager,
long volumeId, long bucketId, long transactionLogIndex
) throws IOException {
// FSO is disabled. Do nothing.
}

protected void addMultiPartToCache(
OMMetadataManager omMetadataManager, String multipartOpenKey,
OMFileRequest.OMPathInfoWithFSO pathInfoFSO, OmKeyInfo omKeyInfo,
String keyName, long transactionLogIndex
) throws IOException {
// FSO is disabled. Do nothing.
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to do nothing for non-FSO, and was overridden to add the missing parents only for FSO. I may be missing something, but it seems the patch does not distinguish between these two cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. Seems for all the S3 request, they pair with a FSO request. My new change only apply this logic to FSO requests. There is one request which does not pair with FSO is the S3ExpiredMultipartUploadsAbortRequest. I applied logic to this request as well since my issue is this request.

Comment on lines 1443 to 1447
String multipartOpenKey = omMetadataManager
.getMultipartKey(volumeId, bucketId,
pathInfoFSO.getLastKnownParentId(),
pathInfoFSO.getLeafNodeName(),
keyArgs.getMultipartUploadID());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please indent continuations with 4 spaces, not 8. (Applies to the patch as a whole, this is only an example.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reformatted

Comment on lines 24 to 32
import java.io.IOException;
import java.nio.file.InvalidPathException;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.concurrent.atomic.AtomicReference;
import java.util.function.BiFunction;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not reorder imports unnecessarily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is auto formatted by my IDE. Let me try to reverse this order change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

@adoroszlai adoroszlai added the s3 S3 Gateway label Dec 18, 2024
@sokui
Copy link
Contributor Author

sokui commented Dec 19, 2024

@adoroszlai pls have another look. Thank you!

@adoroszlai adoroszlai requested a review from ivandika3 December 19, 2024 02:28
@ivandika3
Copy link
Contributor

ivandika3 commented Dec 29, 2024

@SaketaChalamchala Could you help to take a look as well since it's related to the logic in HDDS-10630?

Copy link
Contributor

@ivandika3 ivandika3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sokui Thanks for working on this. Just an initial review as I'm trying to understand the related logic.

Also I noticed in #6496 that the DIRECTORY_TABLE is not added in the @CleanupTableInfo in the related OM response. Please help to add the DIRECTORY_TABLE for the OM responses which add the missing directories to the directory table.

Could you also help to add some tests? I'm not entirely sure in what conditions will missing parent directories might happen.

Comment on lines +202 to +204
if (bucketInfo == null) {
throw new IOException("bucketInfo is null");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? validateBucketAndVolume will check the bucket existence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do hit this condition: String bucketKey is not null, but 'CacheValue bucketInfo' is null. Pls let me know if my validation here is reasonable.

Copy link
Contributor

@ivandika3 ivandika3 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, validateBucketAndVolume is already checking the cache as well. Moreover, the OM request was done under a BUCKET_LOCK, so the bucket should still exists during the request processing. Bucket not found check should be earlier before updating any OM state. Moreover, it should be an OMException thrown with BUCKET_NOT_FOUND ResultCodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think I saw this condition happened and that is why I added this block later on. But let me delete it and re-test

@@ -259,6 +289,8 @@ private void updateTableCache(OzoneManager ozoneManager,

String multipartOpenKey;
try {
KeyArgs keyArgs = buildKeyArgs(multipartUpload);
addOrGetMissingDirectories(ozoneManager, keyArgs, trxnLogIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a corresponding OM response to add the missing directories to the DB batch similar to #6496?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know your thoughts on my reply related to #6496

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the DIRECTORY_TABLE to @CleanupTableInfo for S3ExpiredMultipartUploadsAbortResponse, S3MultipartUploadAbortResponseWithFSO, and S3MultipartUploadCompleteResponseWithFSO now

@sokui
Copy link
Contributor Author

sokui commented Dec 29, 2024

@sokui Thanks for working on this. Just an initial review as I'm trying to understand the related logic.

Also I noticed in #6496 that the DIRECTORY_TABLE is not added in the @CleanupTableInfo in the related OM response. Please help to add the DIRECTORY_TABLE for the OM responses which add the missing directories to the directory table.

Could you also help to add some tests? I'm not entirely sure in what conditions will missing parent directories might happen.

Hi @ivandika3 ,

Correct me if I'm wrong. Here, we only add the missing directories to the cache of the directory table (since we need to delete this key), not the real directory table. In this case, do we need to add DIRECTORY_TABLE to @CleanupTableInfo?

Copy link
Contributor

@ivandika3 ivandika3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong. Here, we only add the missing directories to the cache of the directory table (since we need to delete this key), not the real directory table. In this case, do we need to add DIRECTORY_TABLE to @CleanupTableInfo?

At least in #6496, since the OM response will add the missing directories to the real directory table, @CleanupTableInfo should be added to clean the OM directory cache entries since they are already persisted in OM DB.

As for abort and expired abort MPU. I think it's OK to just add to the dir cache since the directories are just to avoid the exceptions. However, IMO we should also add the @CleanupTableInfo for the corresponding OM response so that the cache entries will not be in the OM dir cache forever.

That said, AFAIK this is the only special case where OM request update to cache does not translate to persisting the entry in the OM DB. Hence, I'm not entire sure what might be the possible impact. For example, one case I can think of is that if the subsequent key is created using that missing directories created by this patch, but later when the cache is cleaned up or OM is restarted, the cached directories disappear and the newly created key will be orphaned.

Comment on lines +202 to +204
if (bucketInfo == null) {
throw new IOException("bucketInfo is null");
}
Copy link
Contributor

@ivandika3 ivandika3 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, validateBucketAndVolume is already checking the cache as well. Moreover, the OM request was done under a BUCKET_LOCK, so the bucket should still exists during the request processing. Bucket not found check should be earlier before updating any OM state. Moreover, it should be an OMException thrown with BUCKET_NOT_FOUND ResultCodes.

Comment on lines +1388 to +1389
checkBucketQuotaInNamespace(omBucketInfo, missingParentInfos.size());
omBucketInfo.incrUsedNamespace(missingParentInfos.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, if we are only adding a directories in cache which will be cleaned up afterwards (for abort and expired abort), I don't think we should increased the usedNamespace of the parent bucket since this will cause usedNamespace to never be 0 since the correspodning decrUsedNamespace will not be called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is copied from https://github.com/apache/ozone/pull/6496/files#diff-28870b5ea5ae6d1f8207b99595dc84f585becd8f2c53ee1d852d6ae34b7229f4R86-R87. Pls let me know if this logic is needed in MPU complete request. So my essential question is should we increase the used namespace at initiate time, or the complete time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the #7566 (comment) for another approach where we don't need to create missing parent directories, and therefore no need to update the bucket namespace.

Comment on lines +1437 to +1439
if (!addMissingDirectoriesToCacheEnabled()) {
return missingParentInfos;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know what is the purpose of this? Are there any OM requests that called addOrGetMissingDirectories but will not actually add the missing directories to the cache?

If there is, please add a corresponding log. If there isn't, I think we can just remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to differentiate the behaviors between FSO request and non-FSO request. As @adoroszlai stated here: #7566 (comment), the PR should be only related to FSO requests.

You see in this PR: by default, this addMissingDirectoriesToCacheEnabled() will return false. But for FSO request, it overrides it as true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok understood, unlike in S3InitiateMultipartUploadRequestWithFSO, validateAndUpdateCache for S3MultipartUploadCompleteRequestWithFSO and S3MultipartUploadAbortRequestWithFSO are implemented by the parent instead, requiring the distinction with the flag.

@sokui
Copy link
Contributor Author

sokui commented Jan 8, 2025

Could you also help to add some tests? I'm not entirely sure in what conditions will missing parent directories might happen.

Hi @ivandika3 , to be honest, I am not sure what the root cause. As stated in #6496, the files (S3InitiateMultipartUploadRequestWithFSO.java, S3InitiateMultipartUploadResponseWithFSO.java) contains this logic.I can understand we are missing the parent directories initially, but why we missing these directories in the complete, abort and exprire phases? Is it possible that between adding the open key to the table and adding the missing parent directories, ozone (om) got interrupted, leaving the open key exists but parent directories not there?

@ivandika3
Copy link
Contributor

ivandika3 commented Jan 9, 2025

@sokui Ok, I was thinking a bit more about this.

why we missing these directories in the complete, abort and exprire phases?

I can think one possible case, when between the initiate and complete / abort / expire, the parent directory is deleted and all the child directories and files are garbage collected by DirectoryDeletingService (runs every 60s).

For example, there is a directory "/a" and there is a pending MPU key with path "/a/mpu_key" that was initiated but haven't been completed / aborted yet. After the MPU key was initiated, the directory "/a" is deleted, and since mpu_key has not been completed yet and does not exist in fileTable, the DIRECTORY_NOT_EMPTY will not be thrown in OMKeyDeleteRequestWithFSO#validateAndUpdateCache. Therefore mpu_key is orphaned and when it's completed / aborted, it will fail in OMFileRequest#getParentID since the parent directory has been deleted.

Now I can understand that initiate and complete MPU will need to add the missing parent directories, since the directories will be accessed again. However, for abort (expired) MPUs, I don't think #6496 applies since we don't really want to add the missing parent directories for a key that is going to be deleted anyway. Even if we add the parent directories, adding it in cache only might result in unpredictable behavior where the missing directories created can suddenly disappear since they are not persisted in the OM DB. So, I believe either we should persist the missing parent directories or we should not create them in the first place.

I think the original intention is that we want to ensure the MPU abort will succeed regardless whether the parent exists or not. I can think of one way, how about we first get the OmMultipartKeyInfo from the multipartInfoTable since it does not require parentId information. Afterwards, we use the OmMultipartKeyInfo#getParentId to derive the MPU open key, without needing to call OMFileRequest#getParentId. That way, we don't need to create the missing parent directories.

In the future, for preventive measures, we can try to take into account incomplete MPUs for directory deletions. For example:

  • OMFileRequest#hasChildren might need to return true if there is a incomplete MPU key underneath it.
  • DirectoryDeletingService can also add some logic to abort the incomplete MPU
    However, these can be done in the future tickets.

Note: I'm not too well-versed about FSO logic, so I could be wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s3 S3 Gateway
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants