Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] dev from TileDB-Inc:dev #152

Merged
merged 22 commits into from
May 7, 2024
Merged

[pull] dev from TileDB-Inc:dev #152

merged 22 commits into from
May 7, 2024

Conversation

pull[bot]
Copy link

@pull pull bot commented May 2, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

lums658 and others added 16 commits May 1, 2024 09:41
This PR implements an alternate view for var length data. Instead of
maintaining an offsets range, as in the first `var_length_view`, this
class materializes all of the individual subranges over the var length
data. The associated unit tests for this PR include tests for sorting.

Advantages:
* The `alt_var_length_view` has the advantage of not returning a proxy
when accessed and, as a result, can be directly sorted. Also, not having
to create the proxy on access can result in more efficiency.
* The `alt_var_length_view` class can be constructed with either the
"Arrow" format for offsets or the "TileDB" format. In the latter case,
the constructor takes the offsets ranges, as well as the final value.

Disadvanatages:
* The `alt_var_length_view` requires more storage. An individual
`std::ranges::subrange` consumes 16 bytes, whereas a `size_t` consumes 8
bytes. On the other hand, if sorting of the original `var_length_view`
is of interest, the offsets range plus a permutation range will also
result in 16 bytes storage per var length element.

Notes:
* The `alt_var_length_view` class ends up being a fairly thin wrapper
around `std::vector<std::ranges::subrange>>`. However, it is considered
not to be good practice to inherit from `std::vector`, so it was kept as
member data.
* Because this view does maintain internal data that is not O(1), the
copy constructor and copy assignment operator are disabled.

---
TYPE: IMPROVEMENT
DESC: Implement alt var length view for external sort.

<long description>

---
TYPE: NO_HISTORY | FEATURE | BUG | IMPROVEMENT | DEPRECATION | C_API |
CPP_API | BREAKING_BEHAVIOR | BREAKING_API | FORMAT
DESC: <short description>
…4926)

get_end_of_cell_slab is dead code and can be removed.
stride should not be called for Hilbert layout.

[sc-46462]

---
TYPE: NO_HISTORY
DESC: Ensure correct cell order in stride and remove dead code in
domain.
[sc-46464]

---
TYPE: NO_HISTORY
DESC: Fix nightlies after external sort merges.
This fixes the fallback cell order for the Hilbert order on writes. When
the domain is too large and that cells cannot be differentiated by the
Hilbert number because we don't have enough bits per dimensions, we
should fallback to row major cell ordering. A missing condition made us
fallback to column order ordering.

[sc-44758]

---
TYPE: BUG
DESC: Fix wrong fallback cell order for Hilbert.
This PR introduces timing tests that run with mocked time. During the test all calls to acquire timestamp, either by using posix `time()` or modern C++ `std::chrono::clock::now()` are mocked and should return the same timestamp. This way we should be able to test sub-millisecond writes/reads.

This PR adds:

1. libfaketime port
2. timing tests target
3. modified timing tests command with proper environment variables

These tests fail (as expected) on TileDB version 2.20.0:

https://github.com/dudoslav/TileDB/actions/runs/8535232007/job/23381209269#step:14:97

---
TYPE: NO_HISTORY
DESC: Add faketime tests
Add function `tiledb::api::to_string_view()`. This function validates a
candidate C-style string and converts it to a `std::string_view`. It's
specific to the C API, with a template argument that allows an error
message to identify an erroneous argument.

Converted C API functions `tiledb_query_get_est_result_size*` to use
this function. Changed relevant functions in `class Query` and `class
Subarray` to use `string_view` and not a raw `char *`.

This is a followup to #4891, where this work was out of scope. Had the
conversion function already existed before that PR, it would have used
there.

---
TYPE: NO_HISTORY
DESC: C API support - to_string_view

---------

Co-authored-by: KiterLuc <[email protected]>
---
TYPE: NO_HISTORY
DESC: Update dev history with content from 2.22, 2.18.5 and 2.21.2.

---------

Co-authored-by: Theodore Tsirpanis <[email protected]>
---
TYPE: NO_HISTORY
DESC: Bump dev version to 2.24.0.
This PR is stacked on #4925. It adds additional constructors for the two
view classes for var length data.

The previous implementations had constructors that just took input
ranges, and constructed a var length view over the data specified by
those ranges. However, in general, we may have a range that is not
completely filled with data, and so may need to specify the data to be
viewed in some other way. The complete set of constructors take
* ranges for data and offsets (arrow format) - the size of the resulting
view is one less than the size of the offset range
* ranges plus sizes for data and offsets (arrow format) - the size of
the resulting view is one less than the size given for the offset range
* iterator pairs for data and offsets (arrow format) - the size of the
resulting view is one less than the difference between end and begin of
the offsets pair
* iterator pairs for data and offsets, with sizes (arrow format) - the
size of the resulting view is one less than the size given for the
offset range
* ranges for data and offsets (tilledb format) - the size of the
resulting view is equal to the size of the offset range (takes an extra
argument for last offset value)
* ranges plus sizes for data and offsets (tiledb format) - the size of
the resulting view is equal to the size given for the offset range
(takes an extra argument for last offset value)
* iterator pairs for data and offsets (tiledb format) - the size of the
resulting view is equal to the difference between end and begin of the
offsets pair (takes an extra argument for last offset value)
* iterator pairs for data and offsets, with sizes (tiledb format) - the
size of the resulting view is equal than the size given for the offset
range (takes an extra argument for last offset value)

[sc-44576]

---
TYPE: NO_HISTORY
DESC: Additional constructors for external sort classes.
…ex. (#4909)

Migrate `Array` APIs out of `StorageManager`:
`non_empty_domain_*_from_index`.

This includes 3 separate APIs:
* `non_empty_domain_from_index`
* `non_empty_domain_var_size_from_index`
* `non_empty_domain_var_from_index`

---
[sc-44989]
[sc-44991]
[sc-44993]

---
TYPE: NO_HISTORY
DESC: Migrate Array APIs out of StorageManager:
non_empty_domain_*_from_index.
Use of `std::copy_n`, added in #4915, requires adding the `algorithms`
header under `g++` compilation

[sc-46538]

---
TYPE: NO_HISTORY
DESC: Add `algorithms` header for `std::copy_n`
This fixes
```
unit_iterator_facade.cc:625:22: error: missing 'typename' prior to dependent type name 'std::iterator_tra
its<iterator>::value_type'                                                                                                                                    
  using value_type = std::iterator_traits<iterator>::value_type; 
 ```
which I got building units on my macos.

[sc-46539]

---
TYPE: NO_HISTORY
DESC: Fix broken unit build on macos.
…e. (#4910)

Migrate `Array` APIs out of `StorageManager`:
`non_empty_domain_*_from_name`.

This includes 3 separate APIs:
* `non_empty_domain_from_name`
* `non_empty_domain_var_size_from_name`
* `non_empty_domain_var_from_name`

---
[sc-44990]
[sc-44992]
[sc-44994]

---
TYPE: IMPROVEMENT
DESC: Migrate Array APIs out of StorageManager:
non_empty_domain_*_from_name.
[SC-46300](https://app.shortcut.com/tiledb-inc/story/46300)
[SC-46301](https://app.shortcut.com/tiledb-inc/story/46301)

This PR moves the following C APIs (and their C++ equivalents) to
stable:

* `tiledb_group_*`
* `tiledb_array_delete`
* `tiledb_array_delete_array`
* `tiledb_array_upgrade_version`

The changes are simply moving code around.

---
TYPE: C_API
DESC: Experimental APIs related to groups, deleting arrays and upgrading
the format version of arrays were moved to stable. You can use them
without including `<tiledb_experimental.h>`.

---
TYPE: CPP_API
DESC: The experimental `Group` class was moved to stable. You can use it
without including `<tiledb_experimental>`.
This change removes any confusion that in its present state the
`array_schema` object library is not properly specified as a
freestanding library capable of resolving all its symbols without
additional external dependencies.

[sc-46557]

---
TYPE: NO_HISTORY
DESC: Fix the documentation for the `array_schema` object library
This PR implements a `zip_view` for zipping together a set of ranges. It
is intended to implement the `std::ranges::zip_view` as defined for
C++23. From
 https://en.cppreference.com/w/cpp/ranges/zip_view:
1. A zip_view is a range adaptor that takes one or more views, and
produces a view whose ith element is a tuple-like value consisting of
the ith elements of all views. The size of produced view is the minimum
of sizes of all adapted views.
2. `zip()` is a customization point object that constructs a `zip_view`

Currently, the `zip_view` only supports zipping together ranges that are
`random_access_range`s. In addition, the `size()` and `end()` functions
are only provided if all of the ranges are `sized_range`s

The iterator from a `zip_view` is essentially a tuple of pointers to the
beginning of each of the zipped ranges, plus an index that keeps track
of the iterator's location in the zipped ranges. The size of a
`zip_view` is the size of the smallest range. The end iterator of a zip
view is the begin iterator plus the size of the zip_view.

Unit tests have similar coverage to the tests for the var length views.
Tests also include zipping a var length view and iterating through with
`std::for_each` and `for`.

[sc-43639]

---
TYPE: IMPROVEMENT
DESC: Implement zip_view for external sort.
@pull pull bot added the ⤵️ pull label May 2, 2024
teo-tsirpanis and others added 6 commits May 2, 2024 14:39
…4924) (#4940)

Backport 30b8ad0 from #4924.

---
TYPE: BUG
DESC: Fix wrong fallback cell order for Hilbert.

Co-authored-by: KiterLuc <[email protected]>
[sc-46464]

TYPE: NO_HISTORY
DESC: Fix nightlies after external sort merges.
This PR stacks on #4930. The actual code for implementing the PR is
fairly small, but with extensive unit tests.

The PR contains `swap` function for `zip_view` in order to be able to
use `std::sort` on a `zip_view`. The difficulty is that dereferencing a
`zip_view` iterator returns a proxy, specifically a tuple of references.
In order to enable `swap` to be found for this via ADL, we write
```c++
template <class... Ts>
  requires(std::is_swappable<Ts>::value && ...)
void swap(std::tuple<Ts...>&& x, std::tuple<Ts...>&& y) {
  using std::get;
  using std::swap;
  [&]<std::size_t... i>(std::index_sequence<i...>) {
    (swap(get<i>(x), get<i>(y)), ...);
  }(std::make_index_sequence<sizeof...(Ts)>());
}
```
and then put it into the `std` namespace:
```c++
namespace std {
using ::swap;
}
```

The provided unit tests exercise `swap`, `iter_swap`, and `sort`,
including on `zip_view`s containing `alt_var_length_view`s.

(NOTE: I didn't specifically write an `item_swap` as `std::iter_swap`
falls back to `swap`.)

The PR also changes `alt_var_length_view` and `zip_view` to derive from
`std::ranges::view_interface` rather than `std::ranges::view_base` in
order to automagically get `operator[]` for the view.

NOTES:
1. Not all implementations of `std::sort` use `std::swap` for all
problem sizes, but will use insertion sort instead for small sizes.
2. For insertion sort to work properly with a tuple of references, i.e.,
a proxy, it is important to get all of the details of `value_type` and
`reference` correct. There was a bug that took a while to track down of
`value_type` not being defined in the `zip_iterator`. The
`iterator_facade` will create the wrong default thing.

---
TYPE: IMPROVEMENT
DESC: Write iter_swap for zip_view for external sort.
This PR implements proxy sorting for `permutation_view`. The goal is to
be able to efficiently sort a `zip_view` containing a number of ranges.

With the PR, we can sort a zipped `permutation_view` as follows:
```c++
  auto z = zip(u, v, w, x);
  auto p = permutation_view(z);
  sort(p);
```
resulting in `z` being sorted along `u` (then `v`, etc).

Because the iterator for `permutation_view` only returns a value, with
no information about the index, we can't sort a `permutation_view`
directly with, e.g., `std::sort`. To enable sorting, we instead provide
sorting member functions to the `permutation_view` and have free
functions that are overloaded on `permutation_view` invoke the member
functions.

The functions added include
* proxy sort member functions in `permutation_view` with `_no_init`,
`stable_`, and comparison function variants.
* proxy sort free functions overloaded on `permutation_view` with
variants as above
* Overloads of `std::sort` and `std::stable_sort`. These invoke the
`no_init` member function, the assumption being that we just want to
stack permutations on top of each other.

Since the `proxy_sort` functions mutate the indexes in the
`permutation_view`, the interfaces to the existing `proxy_sort` function
were updated to take a forwarding reference.

A few notes:
* Sorting a `permutation_view` will mutate the underlying permutation
index range. Right now, the permutation view is just a view over two
containers (the one to be permuted and the permutation itself). Sorting
the `permutation_view` will change the ordering of the container passed
in to the constructor.
* When sorting a `permutation_view` with a tuple value type, the sorting
will be done with the built-in tuple comparator. We may want to add
variants that let us specify which members to sort on (using indexes
passed in as variadic template parameters). However, using a specialized
comparator is probably good enough.

[sc-44172]

---
TYPE: IMPROVEMENT
DESC: Implement proxy sort permutation for external sort.

---------

Co-authored-by: Luc Rancourt <[email protected]>
This should fix the windows nightlies.

---
TYPE: NO_HISTORY
DESC: Fix bad type casts in unit_permutation_sort.cc.
This removed deprecated APIs from tests. It will be split into multiple
PRs to facilitate code reviewing.

[sc-46307]

---
TYPE: NO_HISTORY
DESC: Remove deprecated APIs from tests, part 1.
@kokizzu kokizzu merged commit a1d753f into kokizzu:dev May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants