Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add class FragmentID. #4607

Merged
merged 4 commits into from
Jan 29, 2024
Merged

Add class FragmentID. #4607

merged 4 commits into from
Jan 29, 2024

Conversation

bekadavis9
Copy link
Contributor

@bekadavis9 bekadavis9 commented Jan 9, 2024

Add class FragmentID to validate, parse and handle components of a fragment uri.


TYPE: NO_HISTORY
DESC: Add class FragmentID.

Copy link

This pull request has been linked to Shortcut Story #38659: Implement class FragmentURI for succinct fragment parsing.

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of code change here because of the removal of get_timestamp_range. We don't want to enlarge the scope of the PR unnecessarily, but there's one deficiency that we need to correct in the PR: there are no tests with invalid URI inputs. Parsers can fail in more ways than they can succeed. On cursory examination, the parser is going to need work ensuring that it's tight. We don't need to do all that work in this PR, but we do need to make sure that we have test infrastructure now.

  • Fix the defect that is assert in the constructor.
  • Write a table-driven test that ensure that invalid inputs fail.
  • Populate the table with invalid inputs.
    • There's one class of errors that the assert will find.
    • The empty string should be an input, as well as one-character strings of all the punctuation characters.
    • It will be very ease to make fragment version 1 and 2 strings that sscanf chokes on.
  • Populate a second table with invalid inputs that the current parser does not correctly throw an exception for. Instead of fixing the parser in this PR, call this the "known failure" table and test accordingly.

I've been thinking for a while that using the term "URI" everywhere has been something of a conflation of meaning and representation. We're using URI as the means by which we're implementing identifiers, and that the result is an identifier. The upshot is some name recommendations.

  • class FragmentID in namespace tiledb::sm
  • fragment_identifier.h

We want to have FragmentID stand alone without need for a separate URI object.

  • Construction from URI would have to call the copy constructor of class URI to initialize the base class.
  • A more typical usage would construct directly from std::string_view.

We need both version accessors.

tiledb/storage_format/uri/parse_uri.cc Outdated Show resolved Hide resolved
tiledb/storage_format/uri/parse_uri.h Outdated Show resolved Hide resolved
tiledb/storage_format/uri/parse_uri.h Outdated Show resolved Hide resolved
tiledb/storage_format/uri/parse_uri.h Outdated Show resolved Hide resolved
test/src/unit-capi-array.cc Outdated Show resolved Hide resolved

if (fragment_version < 12) {
utils::parse::FragmentURI frag_uri{fragment_uri};
if (frag_uri.version() < 12) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of how the array format version is being used to stand in place of the fragment URI version. What we want in this PR is the function that we'll use to rewrite with. We don't need to rewrite this function at the time.

Also note that the function that this is used in, get_commit_uri, itself belongs somewhere in the storage_format directory. When we rewrite this function to use the more appropriate version number, we can move it to that directory at the same time.

Note that this function doesn't use any member variables of ArrayDirectory, which is one of those signs that the function would be better defined elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does use ArrayDirectory::uri_, but that could be passed in from its accessor function.

tiledb/storage_format/uri/parse_uri.cc Outdated Show resolved Hide resolved
(std::string("%") + std::string(PRId64)).c_str(),
(long long int*)&timestamp_range->first);
timestamp_range->second = timestamp_range->first;
(long long int*)&timestamp_range_.first);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sscanf does not allow any decent error handling. Probably need to rewrite this.

tiledb/storage_format/uri/test/unit_uri_format.cc Outdated Show resolved Hide resolved
tiledb/storage_format/uri/test/unit_uri_format.cc Outdated Show resolved Hide resolved
@bekadavis9 bekadavis9 changed the title Add class FragmentURI Add class FragmentID Jan 19, 2024
Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new failure tests are exactly what's necessary. We need some success tests as well and then we'll be done.

tiledb/sm/fragment/fragment_identifier.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with two small caveats.

  • The problem with the definition with stats needs to be addressed in the dependency. It should not be patched over at the point of use.
  • Issue with optional array-format version number in fragment-format version 3.

Both of these should be addressed with followup stories and future PR.

this_target_object_libraries(baseline vfs)
if(TILEDB_STATS)
this_target_compile_definitions(-DTILEDB_STATS)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition certainly does not belong here. Fixing it here is masking a defect elsewhere in the build.

This should be on the baseline directory. It will then propagate through the interface properties of the object library.

std::pair{1, 2},
3,
5},
}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a test vector for a fragment-format version 3 identifier with an optional array-format identifier at the end. At least it's documented as optional.
Not making this a blocker, but it's an inconsistency between the documentation and the tests.

@@ -16,4 +16,4 @@ _Note_: The presence of `[]` is indicative of an optional parameter.
| :-: | :-: | :-: |
| 1 - 2 | 1.4 - 1.5 | `__uuid_t1[_t2]` |
| 3 - 4 | 1.6 - 1.7 | `__t1_t2_uuid` |
| 5+ | 2.0+ | `__t1_t2_uuid_[v]` |
| 5+ | 2.0+ | `__t1_t2_uuid[_v]` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment about inconsistent/missing unit test below.

@bekadavis9 bekadavis9 changed the title Add class FragmentID Add class FragmentID. Jan 26, 2024
@KiterLuc KiterLuc merged commit 74ebc44 into dev Jan 29, 2024
67 checks passed
@KiterLuc KiterLuc deleted the rd/fragment_uri branch January 29, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants