forked from TileDB-Inc/TileDB
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] dev from TileDB-Inc:dev #152
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This PR implements an alternate view for var length data. Instead of maintaining an offsets range, as in the first `var_length_view`, this class materializes all of the individual subranges over the var length data. The associated unit tests for this PR include tests for sorting. Advantages: * The `alt_var_length_view` has the advantage of not returning a proxy when accessed and, as a result, can be directly sorted. Also, not having to create the proxy on access can result in more efficiency. * The `alt_var_length_view` class can be constructed with either the "Arrow" format for offsets or the "TileDB" format. In the latter case, the constructor takes the offsets ranges, as well as the final value. Disadvanatages: * The `alt_var_length_view` requires more storage. An individual `std::ranges::subrange` consumes 16 bytes, whereas a `size_t` consumes 8 bytes. On the other hand, if sorting of the original `var_length_view` is of interest, the offsets range plus a permutation range will also result in 16 bytes storage per var length element. Notes: * The `alt_var_length_view` class ends up being a fairly thin wrapper around `std::vector<std::ranges::subrange>>`. However, it is considered not to be good practice to inherit from `std::vector`, so it was kept as member data. * Because this view does maintain internal data that is not O(1), the copy constructor and copy assignment operator are disabled. --- TYPE: IMPROVEMENT DESC: Implement alt var length view for external sort. <long description> --- TYPE: NO_HISTORY | FEATURE | BUG | IMPROVEMENT | DEPRECATION | C_API | CPP_API | BREAKING_BEHAVIOR | BREAKING_API | FORMAT DESC: <short description>
[sc-46464] --- TYPE: NO_HISTORY DESC: Fix nightlies after external sort merges.
This fixes the fallback cell order for the Hilbert order on writes. When the domain is too large and that cells cannot be differentiated by the Hilbert number because we don't have enough bits per dimensions, we should fallback to row major cell ordering. A missing condition made us fallback to column order ordering. [sc-44758] --- TYPE: BUG DESC: Fix wrong fallback cell order for Hilbert.
This PR introduces timing tests that run with mocked time. During the test all calls to acquire timestamp, either by using posix `time()` or modern C++ `std::chrono::clock::now()` are mocked and should return the same timestamp. This way we should be able to test sub-millisecond writes/reads. This PR adds: 1. libfaketime port 2. timing tests target 3. modified timing tests command with proper environment variables These tests fail (as expected) on TileDB version 2.20.0: https://github.com/dudoslav/TileDB/actions/runs/8535232007/job/23381209269#step:14:97 --- TYPE: NO_HISTORY DESC: Add faketime tests
Add function `tiledb::api::to_string_view()`. This function validates a candidate C-style string and converts it to a `std::string_view`. It's specific to the C API, with a template argument that allows an error message to identify an erroneous argument. Converted C API functions `tiledb_query_get_est_result_size*` to use this function. Changed relevant functions in `class Query` and `class Subarray` to use `string_view` and not a raw `char *`. This is a followup to #4891, where this work was out of scope. Had the conversion function already existed before that PR, it would have used there. --- TYPE: NO_HISTORY DESC: C API support - to_string_view --------- Co-authored-by: KiterLuc <[email protected]>
--- TYPE: NO_HISTORY DESC: Update dev history with content from 2.22, 2.18.5 and 2.21.2. --------- Co-authored-by: Theodore Tsirpanis <[email protected]>
--- TYPE: NO_HISTORY DESC: Bump dev version to 2.24.0.
This PR is stacked on #4925. It adds additional constructors for the two view classes for var length data. The previous implementations had constructors that just took input ranges, and constructed a var length view over the data specified by those ranges. However, in general, we may have a range that is not completely filled with data, and so may need to specify the data to be viewed in some other way. The complete set of constructors take * ranges for data and offsets (arrow format) - the size of the resulting view is one less than the size of the offset range * ranges plus sizes for data and offsets (arrow format) - the size of the resulting view is one less than the size given for the offset range * iterator pairs for data and offsets (arrow format) - the size of the resulting view is one less than the difference between end and begin of the offsets pair * iterator pairs for data and offsets, with sizes (arrow format) - the size of the resulting view is one less than the size given for the offset range * ranges for data and offsets (tilledb format) - the size of the resulting view is equal to the size of the offset range (takes an extra argument for last offset value) * ranges plus sizes for data and offsets (tiledb format) - the size of the resulting view is equal to the size given for the offset range (takes an extra argument for last offset value) * iterator pairs for data and offsets (tiledb format) - the size of the resulting view is equal to the difference between end and begin of the offsets pair (takes an extra argument for last offset value) * iterator pairs for data and offsets, with sizes (tiledb format) - the size of the resulting view is equal than the size given for the offset range (takes an extra argument for last offset value) [sc-44576] --- TYPE: NO_HISTORY DESC: Additional constructors for external sort classes.
…ex. (#4909) Migrate `Array` APIs out of `StorageManager`: `non_empty_domain_*_from_index`. This includes 3 separate APIs: * `non_empty_domain_from_index` * `non_empty_domain_var_size_from_index` * `non_empty_domain_var_from_index` --- [sc-44989] [sc-44991] [sc-44993] --- TYPE: NO_HISTORY DESC: Migrate Array APIs out of StorageManager: non_empty_domain_*_from_index.
This fixes ``` unit_iterator_facade.cc:625:22: error: missing 'typename' prior to dependent type name 'std::iterator_tra its<iterator>::value_type' using value_type = std::iterator_traits<iterator>::value_type; ``` which I got building units on my macos. [sc-46539] --- TYPE: NO_HISTORY DESC: Fix broken unit build on macos.
…e. (#4910) Migrate `Array` APIs out of `StorageManager`: `non_empty_domain_*_from_name`. This includes 3 separate APIs: * `non_empty_domain_from_name` * `non_empty_domain_var_size_from_name` * `non_empty_domain_var_from_name` --- [sc-44990] [sc-44992] [sc-44994] --- TYPE: IMPROVEMENT DESC: Migrate Array APIs out of StorageManager: non_empty_domain_*_from_name.
[SC-46300](https://app.shortcut.com/tiledb-inc/story/46300) [SC-46301](https://app.shortcut.com/tiledb-inc/story/46301) This PR moves the following C APIs (and their C++ equivalents) to stable: * `tiledb_group_*` * `tiledb_array_delete` * `tiledb_array_delete_array` * `tiledb_array_upgrade_version` The changes are simply moving code around. --- TYPE: C_API DESC: Experimental APIs related to groups, deleting arrays and upgrading the format version of arrays were moved to stable. You can use them without including `<tiledb_experimental.h>`. --- TYPE: CPP_API DESC: The experimental `Group` class was moved to stable. You can use it without including `<tiledb_experimental>`.
This change removes any confusion that in its present state the `array_schema` object library is not properly specified as a freestanding library capable of resolving all its symbols without additional external dependencies. [sc-46557] --- TYPE: NO_HISTORY DESC: Fix the documentation for the `array_schema` object library
This PR implements a `zip_view` for zipping together a set of ranges. It is intended to implement the `std::ranges::zip_view` as defined for C++23. From https://en.cppreference.com/w/cpp/ranges/zip_view: 1. A zip_view is a range adaptor that takes one or more views, and produces a view whose ith element is a tuple-like value consisting of the ith elements of all views. The size of produced view is the minimum of sizes of all adapted views. 2. `zip()` is a customization point object that constructs a `zip_view` Currently, the `zip_view` only supports zipping together ranges that are `random_access_range`s. In addition, the `size()` and `end()` functions are only provided if all of the ranges are `sized_range`s The iterator from a `zip_view` is essentially a tuple of pointers to the beginning of each of the zipped ranges, plus an index that keeps track of the iterator's location in the zipped ranges. The size of a `zip_view` is the size of the smallest range. The end iterator of a zip view is the begin iterator plus the size of the zip_view. Unit tests have similar coverage to the tests for the var length views. Tests also include zipping a var length view and iterating through with `std::for_each` and `for`. [sc-43639] --- TYPE: IMPROVEMENT DESC: Implement zip_view for external sort.
…4924) (#4940) Backport 30b8ad0 from #4924. --- TYPE: BUG DESC: Fix wrong fallback cell order for Hilbert. Co-authored-by: KiterLuc <[email protected]>
[sc-46464] TYPE: NO_HISTORY DESC: Fix nightlies after external sort merges.
This PR stacks on #4930. The actual code for implementing the PR is fairly small, but with extensive unit tests. The PR contains `swap` function for `zip_view` in order to be able to use `std::sort` on a `zip_view`. The difficulty is that dereferencing a `zip_view` iterator returns a proxy, specifically a tuple of references. In order to enable `swap` to be found for this via ADL, we write ```c++ template <class... Ts> requires(std::is_swappable<Ts>::value && ...) void swap(std::tuple<Ts...>&& x, std::tuple<Ts...>&& y) { using std::get; using std::swap; [&]<std::size_t... i>(std::index_sequence<i...>) { (swap(get<i>(x), get<i>(y)), ...); }(std::make_index_sequence<sizeof...(Ts)>()); } ``` and then put it into the `std` namespace: ```c++ namespace std { using ::swap; } ``` The provided unit tests exercise `swap`, `iter_swap`, and `sort`, including on `zip_view`s containing `alt_var_length_view`s. (NOTE: I didn't specifically write an `item_swap` as `std::iter_swap` falls back to `swap`.) The PR also changes `alt_var_length_view` and `zip_view` to derive from `std::ranges::view_interface` rather than `std::ranges::view_base` in order to automagically get `operator[]` for the view. NOTES: 1. Not all implementations of `std::sort` use `std::swap` for all problem sizes, but will use insertion sort instead for small sizes. 2. For insertion sort to work properly with a tuple of references, i.e., a proxy, it is important to get all of the details of `value_type` and `reference` correct. There was a bug that took a while to track down of `value_type` not being defined in the `zip_iterator`. The `iterator_facade` will create the wrong default thing. --- TYPE: IMPROVEMENT DESC: Write iter_swap for zip_view for external sort.
This PR implements proxy sorting for `permutation_view`. The goal is to be able to efficiently sort a `zip_view` containing a number of ranges. With the PR, we can sort a zipped `permutation_view` as follows: ```c++ auto z = zip(u, v, w, x); auto p = permutation_view(z); sort(p); ``` resulting in `z` being sorted along `u` (then `v`, etc). Because the iterator for `permutation_view` only returns a value, with no information about the index, we can't sort a `permutation_view` directly with, e.g., `std::sort`. To enable sorting, we instead provide sorting member functions to the `permutation_view` and have free functions that are overloaded on `permutation_view` invoke the member functions. The functions added include * proxy sort member functions in `permutation_view` with `_no_init`, `stable_`, and comparison function variants. * proxy sort free functions overloaded on `permutation_view` with variants as above * Overloads of `std::sort` and `std::stable_sort`. These invoke the `no_init` member function, the assumption being that we just want to stack permutations on top of each other. Since the `proxy_sort` functions mutate the indexes in the `permutation_view`, the interfaces to the existing `proxy_sort` function were updated to take a forwarding reference. A few notes: * Sorting a `permutation_view` will mutate the underlying permutation index range. Right now, the permutation view is just a view over two containers (the one to be permuted and the permutation itself). Sorting the `permutation_view` will change the ordering of the container passed in to the constructor. * When sorting a `permutation_view` with a tuple value type, the sorting will be done with the built-in tuple comparator. We may want to add variants that let us specify which members to sort on (using indexes passed in as variadic template parameters). However, using a specialized comparator is probably good enough. [sc-44172] --- TYPE: IMPROVEMENT DESC: Implement proxy sort permutation for external sort. --------- Co-authored-by: Luc Rancourt <[email protected]>
This should fix the windows nightlies. --- TYPE: NO_HISTORY DESC: Fix bad type casts in unit_permutation_sort.cc.
This removed deprecated APIs from tests. It will be split into multiple PRs to facilitate code reviewing. [sc-46307] --- TYPE: NO_HISTORY DESC: Remove deprecated APIs from tests, part 1.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )