Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] dev from TileDB-Inc:dev #143

Merged
merged 14 commits into from
Mar 12, 2024
Merged

[pull] dev from TileDB-Inc:dev #143

merged 14 commits into from
Mar 12, 2024

Conversation

pull[bot]
Copy link

@pull pull bot commented Mar 7, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Mar 7, 2024
davisp and others added 13 commits March 7, 2024 10:41
My first PR for this either had a weird merge conflict or I totally
whiffed on the commit. Regardless, the point was to remove the default
switch case so that the compiler will enforce a switch case for every
MemoryType and MemoryTrackerType.

---
TYPE: NO_HISTORY
DESC: Actually fix missing memory_tracker_type_to_str
This adds tracking for the various bitmap and other vectors on
ResultTile classes.

---
TYPE: NO_HISTORY
DESC: Add memory tracking for ResultTile bitmaps
RTREE wasn't loaded when processing aggregates on dimensions. We use the
RTREE when we aggregate full tiles so the fix is to load the rtree for
cases where we don't have ranges and we have aggregates on dimensions.

---
TYPE: BUG
DESC: Fix crash in aggregates on dimensions
New `class QueryChannelActual` to replace the previous class that was
not actually a channel. Added methods in `class Query` for it, and
rewrote the channel C API handle class to use it. Correctly initialize
the underlying channel object within the query field C API handle class,
and return a channel handle with the right lifespan. (This fixes the
defect reported.)

Reworked test fixture to use `TemporaryLocalDirectory`. Reworked tests
for isolation with SECTION. Rewrote the two tests with aggegrates to use
correct calling sequence.

Changed `ensure_query_is_not_initialized` to take a reference. Changed
`tiledb_query_get_default_channel` to return an actual channel handle
rather than a stub for one.

Reference:

https://app.shortcut.com/tiledb-inc/story/39123/tiledb-field-channel-returns-the-same-handle-and-is-prone-to-double-freeing

[sc-39123]

---
TYPE: BUG
DESC: Fix defects with query channels

---------

Co-authored-by: Luc Rancourt <[email protected]>
[SC-42734](https://app.shortcut.com/tiledb-inc/story/42734)

This PR moves including `hadoop/hdfs.h` from `hdfs_filesystem.h` to the
`hdfs_filesystem.cc` implementation file. Due to the current structure
of the code, `hdfs_filesystem.h` is included by `vfs.h`, which is
included by many files, resulting in having to specify the path to
`hadoop/hdfs.h` in all dependents, such as unit tests. A similar change
was recently made for the GCS SDK in #4777.

Thanks to @rroelke for finding this issue. Validated by successfully
building all unit tests touched by #4793 with HDFS enabled.

---
TYPE: BUILD
DESC: Fix compiling unit tests when HDFS is enabled.
This change removes all move and copy constructors from WriterTile to
help with the PMR effort. It introduced a new IndexList container that
allows to not have a move constructor but also to access items with an
index.

[sc-42467]

---
TYPE: NO_HISTORY
DESC: Remove copy and move constructors from WriterTile.
…ds (#4774)

The `Dimension` class has 20+ function pointer members which are set to
dynamically dispatch a template instantiation of a function based on the
data type of the dimension. The `Dimension` constructor sets each of
these function pointers, switching on the type for each of them. But
really the same switch case applies to all of them. It is a cleaner
encapsulation of functionality to nest the implementations within a
subclass, which also adds a small benefit of reducing the object size.

Hence this pull request introduces `DimensionDispatch`, which takes the
function pointers we previously had and makes them pure virtual methods
of a base class instead. `template <typename T> DimensionDispatchBase`
implements most of these virtual methods, and defers any exceptions to
the pattern to `DimensionFixedSize` and `DimensionVarSized`. And the
`Dimension` constructor chooses an implementation with a single function
call `set_dimension_dispatch` instead of the previous logic to set each
function pointer individually.

Closes sc-7826

---
TYPE: IMPROVEMENT
DESC: Convert dimension function pointer members to subclass methods
…Block. (#4779)

Add `tdb::pmr::list` wrapper as an alias for `std::list`. Instrument
classes `FilteredData` and `FilteredDataBlock` for memory measurement.

---
TYPE: NO_HISTORY
DESC: Instrumentation for memory measurement: FilteredData and
FilteredDataBlock. Addition of tdb::pmr::list wrapper.

---------

Co-authored-by: Luc Rancourt <[email protected]>
A user reported errors accessing assets over REST after setting
`rest.server_address` with a URL ending in `/`. This updates
`Config::sanity_check` to remove status and return a validated config
value for the given parameter.

---
TYPE: BUG
DESC: Remove trailing slash from rest.server_address.
…4798)

[SC-42786](https://app.shortcut.com/tiledb-inc/story/42786/remote-incomplete-queries-with-aggregates-fail-with-a-vague-message)

Validated with a locally built REST server.

Before: `[TileDB::REST] Error: Error submitting query to REST; server
returned no data. Curl error: Error in libcurl POST operation: libcurl
error message 'CURLE_OK'; HTTP code 400; server response data
'{"code":2208,"message":"Error serializing query: C API: unknown
exception type; no further
information","request_id":"a0f026a8-8507-4c54-b592-5166c7d4f0b6"}'.`

After: `[TileDB::REST] Error: Error submitting query to REST; server
returned no data. Curl error: Error in libcurl POST operation: libcurl
error message 'CURLE_OK'; HTTP code 400; server response data
'{"code":2208,"message":"Error serializing query:
[TileDB::Serialization] Error: Cannot serialize; exception:
[TileDB::Serialization] Error: Aggregates are not currently supported in
incomplete remote
queries","request_id":"c6638565-d80d-4c0c-88a6-52c1a748df1a"}'.`

---
TYPE: BUG
DESC: Provide a clear exception message when performing remote
incomplete queries with aggregates
@shaunrd0 previously added `ls_recursive` support for the S3 file
system. This pull request implements it for Posix, in a
hopefully-portable way (`std::filesystem::recursive_directory_iterator`)
which can be recycled for Windows.

Some notes about input/output:
- while `LsObjects` has a `std::string` component, the string is a URI.
- for posix, we include empty directories in the result set.

The recursion is implemented so that if a directory does not pass the
directory filter, then we do not descend into that directory. I have
added a unit test to check this behavior. Something like this could be
useful for S3 as well if the "list objects" request is done in
hierarchical mode.

---
TYPE: FEATURE
DESC: Support VFS `ls_recursive` API for posix filesystem.

---------

Co-authored-by: Luc Rancourt <[email protected]>
This implements aggregates support with dense dimensions.

---
TYPE: IMPROVEMENT
DESC: Implement dense dimension aggregates.
[SC-38521](https://app.shortcut.com/tiledb-inc/story/38521/update-libmagic-and-use-the-upstream-vcpkg-port)

Split from #4553.

This PR updates libmagic to version 5.45 and switches to using a vcpkg
port closer to the upstream one, which we can easily consume with
find_package(unofficial-libmagic) since microsoft/vcpkg#35274.

One complication is that the upstream port builds libmagic with its
official autotools-based build system, which is significantly slower on
Windows (on Linux it builds pretty fast). I tried to upstream the
CMake-based port I had added in #4119, but the PR was rejected.

Apart from binary caching, there is unfortunately nothing we can do
about the build performance regression. We could maintain the
CMake-based port for our own use, but it will split what we build and
what a future user of TileDB from vcpkg will build, and that port is not
without its problems (it failed for example when I tried cross-compiling
to arm64-windows, because it tried to execute the arm64 file.exe on my
x64 machine).

---
TYPE: BUILD
DESC: Update libmagic to version 5.45
@kokizzu kokizzu merged commit 7753f33 into kokizzu:dev Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants