Fix cagra_hnsw serialization when dataset is not part of index #591

tfeher · 2025-01-20T11:48:28Z

After calling build(), ideally the CAGRA index contains both the dataset and the graph. But when we do not have sufficient device memory, then only the graph is returned. In such case we need to pass the dataset explicitly to the serialization routines.

For serialization in HNSW format, in case we have flat hierarchy, the dataset was not passed. This PR fixes this problem by adding an optional dataset argument to cagra::serialize_to_hnswlib.

Furthermore, to improve execution time, we change from writing a single element to writing a single row of the graph and dataset at time.

Additionally, debug messages for tracking data saving time are added.

cjnolet · 2025-01-24T01:44:28Z

@tfeher changes look good but there's a docs build failure.

…w_serialization

cjnolet · 2025-01-30T16:04:43Z

/merge

Fix cagra_hnsw serialization when dataset is not part of index

ca0d4a9

tfeher requested a review from a team as a code owner January 20, 2025 11:48

tfeher self-assigned this Jan 20, 2025

github-actions bot added the cpp label Jan 20, 2025

tfeher added bug Something isn't working non-breaking Introduces a non-breaking change and removed cpp labels Jan 20, 2025

tfeher requested a review from divyegala January 20, 2025 11:48

Fix dim for hnsw::from_cagra in hierarchy mode

730a864

github-actions bot added the cpp label Jan 20, 2025

divyegala approved these changes Jan 20, 2025

View reviewed changes

tfeher and others added 2 commits January 21, 2025 18:50

Merge branch 'branch-25.02' into bug_cagra_hnsw_serialization

63c56f6

Merge branch 'branch-25.02' into bug_cagra_hnsw_serialization

0cbbb94

cjnolet and others added 5 commits January 25, 2025 12:03

Merge branch 'branch-25.02' into bug_cagra_hnsw_serialization

743a3c9

Merge branch 'branch-25.02' into bug_cagra_hnsw_serialization

fa6fe2b

Merge remote-tracking branch 'origin/branch-25.02' into bug_cagra_hns…

d7ef8dc

…w_serialization

fix docstring

61d013b

Merge remote-tracking branch 'origin/branch-25.02' into bug_cagra_hns…

2e2d0a3

…w_serialization

rapids-bot bot merged commit 0dd7bde into rapidsai:branch-25.02 Jan 30, 2025
61 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cagra_hnsw serialization when dataset is not part of index #591

Fix cagra_hnsw serialization when dataset is not part of index #591

tfeher commented Jan 20, 2025 •

edited

Loading

cjnolet commented Jan 24, 2025

cjnolet commented Jan 30, 2025

Fix cagra_hnsw serialization when dataset is not part of index #591

Fix cagra_hnsw serialization when dataset is not part of index #591

Conversation

tfeher commented Jan 20, 2025 • edited Loading

cjnolet commented Jan 24, 2025

cjnolet commented Jan 30, 2025

tfeher commented Jan 20, 2025 •

edited

Loading