Ignore NULLs (if desired) while scanning keys during index navigation #8446

dyemanov · 2025-02-22T07:05:26Z

See also #8291. This improvement completes the solution by extending the "ignore null" checks to the scan phase. The logic mostly matches btr.cpp (BTR_evaluate() and scan()).

With the same test case, results are:

before this PR:

SELECT count(*) from (SELECT ID FROM T WHERE ID < 3 ORDER BY ID);

Select Expression
    -> Aggregate
        -> Filter
            -> Table "T" Access By ID
                -> Index "IT" Range Scan (upper bound: 1/1)

                COUNT 
===================== 
                    1 

-- Fetches = 1775

after this PR:

-- Fetches = 7

hvlad · 2025-02-23T15:31:35Z

src/jrd/recsrc/IndexTableScan.cpp

+			// If we're walking in a descending index and we need to ignore NULLs
+			// then stop at the first NULL we see (only for single segment!)
+			if (descending && ignoreNulls && node.prefix == 0 &&
+				node.length >= 1 && node.data[0] == 255)


Accordingly to the compress():
// Further a NULL state is always returned as 1 byte 0xFF (descending index).
So, why check for >= 1, not == 1 ?
And, for consistency, please use 0xFF, not 255.

This is a copy-paste from btr.cpp :) where such a check is used twice. I don't mind changing it, but I'd suggest this to be done in btr.cpp too.

I'm also wondering whether we really should check for (node.prefix == 0 && node.length == 1 && node.data[0] == 0xFF) -- i.e. the first NULL encountered -- or better for (node.prefix + node.length == 1 && node.data[0] == 0xFF) -- i.e. any NULL encountered, as more reliable. Is it theoretically possible that we somehow jump to the bucket in the middle of the NULL duplicates chain and start scanning from there, thus skipping the first NULL node?

I'm also wondering whether we really should check for
(node.prefix == 0 && node.length == 1 && node.data[0] == 0xFF) -- i.e. the first NULL encountered -- or better
for
(node.prefix + node.length == 1 && node.data[0] == 0xFF) -- i.e. any NULL encountered, as more reliable.

Second way is not correct: when node.length == 0 we should not access node.data[0] at all.
Instead, we could look at key contents, if necessary.

Is it theoretically possible that we somehow jump to the bucket in the middle of the NULL duplicates chain and start scanning from there, thus skipping the first NULL node?

The position is based on the last key processed, so, we should be safe.

This is a copy-paste from btr.cpp :) where such a check is used twice. I don't mind changing it, but I'd suggest this to be done in btr.cpp too.

Ok, then, let it be this way

pavel-zotov · 2025-02-25T09:37:21Z

QA note: ticket issue is covered by test for 8291 (it was enough to reduce MAX_ALLOWED_IDX_READS threshold, see notes there).

Ignore NULLs (if desired) while scanning keys during index navigation

14530b4

dyemanov added component: engine type: improvement labels Feb 22, 2025

dyemanov requested a review from hvlad February 22, 2025 07:05

dyemanov self-assigned this Feb 22, 2025

hvlad approved these changes Feb 23, 2025

View reviewed changes

Misc

9a0cf7b

dyemanov added the fix-version: 6.0 Alpha 1 label Feb 25, 2025

dyemanov merged commit 5767b9e into master Feb 25, 2025
47 of 48 checks passed

dyemanov deleted the work/index-navigation-skip-nulls branch February 25, 2025 07:25

pavel-zotov added the qa: covered by another tests label Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore NULLs (if desired) while scanning keys during index navigation #8446

Ignore NULLs (if desired) while scanning keys during index navigation #8446

dyemanov commented Feb 22, 2025

hvlad Feb 23, 2025

dyemanov Feb 24, 2025

dyemanov Feb 24, 2025

hvlad Feb 24, 2025

hvlad Feb 24, 2025

pavel-zotov commented Feb 25, 2025

Ignore NULLs (if desired) while scanning keys during index navigation #8446

Ignore NULLs (if desired) while scanning keys during index navigation #8446

Conversation

dyemanov commented Feb 22, 2025

hvlad Feb 23, 2025

Choose a reason for hiding this comment

dyemanov Feb 24, 2025

Choose a reason for hiding this comment

dyemanov Feb 24, 2025

Choose a reason for hiding this comment

hvlad Feb 24, 2025

Choose a reason for hiding this comment

hvlad Feb 24, 2025

Choose a reason for hiding this comment

pavel-zotov commented Feb 25, 2025