Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgstac is not FIPS compliant #348

Open
tariqksoliman opened this issue Feb 6, 2025 · 4 comments
Open

pgstac is not FIPS compliant #348

tariqksoliman opened this issue Feb 6, 2025 · 4 comments

Comments

@tariqksoliman
Copy link

Description

This issue was discovered after a RHEL 9 machine running postgres 16 and titiler-pgstac was upgraded to use FIPS which, among other things, bars the use of md5 hashes. This makes pgstac unusable on some systems.

Error

The superficial error comes from titiler-pgstac:

GET http://domain.com/titilerpgstac/collections/TimeCogs/tiles/WebMercatorQuad/6/17/39?assets=asset&datetime=2024-01-04T14:00:00Z/2025-01-27T22:05:27Z&exitwhenfull=false&skipcovered=false

"could not compute MD5 hash: disabled for FIPS\nCONTEXT: SQL function "search_hash" statement 1\nPL/pgSQL function search_query(jsonb,boolean,jsonb) line 13 during statement block local variable initialization"

I am suspecting the error is referring to this line here and its usage of the md5() function:

SELECT md5(concat(search_tohash($1)::text,$2::text));

Possible Solution

Here are relevant example SQL commands run on our FIPS postgres instance. One to highlight the limitation and the other as a possible alternative:

postgres=# SELECT md5('hello');
ERROR:  could not compute MD5 hash: disabled for FIPS
postgres=# SELECT encode(sha256('hello')::bytea, 'hex');
                              encode                              
------------------------------------------------------------------
 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
(1 row)

Specs

  • pgstac 0.8.6 (I know, not the latest but the md5()s are still in the latest)
  • pypgstac 0.8.6
  • RHEL 9
  • postgres 16
  • titiler-pgstac 1.4.0
@bitner
Copy link
Collaborator

bitner commented Feb 10, 2025

Hmmm, this would need to happen at a major version release as it would modify the keys that are stored in our tracking tables. I'm certainly open to making this very easy change, the only issue will be timing as the migration for this change on an existing stac instance could be quite substantial.

@tariqksoliman
Copy link
Author

Thank you! Should it help, we did use the following as a temporary fix and haven't seen any issues so far. I am also happy to try to make a PR and use the existing PRs as an example and with the caveat that I have never worked on a DB extension before.


Two functions to modify in-place that replaces FIPS non-compliant md5() functions with sha256().

pgstac v0.8.4

1) search_hash

Run

CREATE OR REPLACE FUNCTION pgstac.search_hash(jsonb, jsonb) RETURNS text AS $$
    SELECT encode(sha256(concat(pgstac.search_tohash($1)::text,$2::text)::bytea)::bytea, 'hex');
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;

If terrible things happen, roll it back with:

CREATE OR REPLACE FUNCTION pgstac.search_hash(jsonb, jsonb) RETURNS text AS $$
    SELECT md5(concat(pgstac.search_tohash($1)::text,$2::text));
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;

2) where_stats

Run

CREATE OR REPLACE FUNCTION pgstac.where_stats(inwhere text, updatestats boolean default false, conf jsonb default null) RETURNS pgstac.search_wheres AS $$
DECLARE
    t timestamptz;
    i interval;
    explain_json jsonb;
    partitions text[];
    sw pgstac.search_wheres%ROWTYPE;
    inwhere_hash text := encode(sha256(inwhere::bytea)::bytea, 'hex');
    _context text := lower(pgstac.context(conf));
    _stats_ttl interval := pgstac.context_stats_ttl(conf);
    _estimated_cost float := pgstac.context_estimated_cost(conf);
    _estimated_count int := pgstac.context_estimated_count(conf);
    ro bool := pgstac.readonly(conf);
BEGIN
    IF ro THEN
        updatestats := FALSE;
    END IF;

    IF _context = 'off' THEN
        sw._where := inwhere;
        return sw;
    END IF;

    SELECT * INTO sw FROM pgstac.search_wheres WHERE encode(sha256(_where)::bytea, 'hex')=inwhere_hash FOR UPDATE;

    -- Update statistics if explicitly set, if statistics do not exist, or statistics ttl has expired
    IF NOT updatestats THEN
        RAISE NOTICE 'Checking if update is needed for: % .', inwhere;
        RAISE NOTICE 'Stats Last Updated: %', sw.statslastupdated;
        RAISE NOTICE 'TTL: %, Age: %', _stats_ttl, now() - sw.statslastupdated;
        RAISE NOTICE 'Context: %, Existing Total: %', _context, sw.total_count;
        IF
            (
                sw.statslastupdated IS NULL
                OR (now() - sw.statslastupdated) > _stats_ttl
                OR (context(conf) != 'off' AND sw.total_count IS NULL)
            ) AND NOT ro
        THEN
            updatestats := TRUE;
        END IF;
    END IF;

    sw._where := inwhere;
    sw.lastused := now();
    sw.usecount := coalesce(sw.usecount,0) + 1;

    IF NOT updatestats THEN
        UPDATE pgstac.search_wheres SET
            lastused = sw.lastused,
            usecount = sw.usecount
        WHERE encode(sha256(_where)::bytea, 'hex') = inwhere_hash
        RETURNING * INTO sw
        ;
        RETURN sw;
    END IF;

    -- Use explain to get estimated count/cost and a list of the partitions that would be hit by the query
    t := clock_timestamp();
    EXECUTE format('EXPLAIN (format json) SELECT 1 FROM items WHERE %s', inwhere)
    INTO explain_json;
    RAISE NOTICE 'Time for just the explain: %', clock_timestamp() - t;
    i := clock_timestamp() - t;

    sw.statslastupdated := now();
    sw.estimated_count := explain_json->0->'Plan'->'Plan Rows';
    sw.estimated_cost := explain_json->0->'Plan'->'Total Cost';
    sw.time_to_estimate := extract(epoch from i);

    RAISE NOTICE 'ESTIMATED_COUNT: % < %', sw.estimated_count, _estimated_count;
    RAISE NOTICE 'ESTIMATED_COST: % < %', sw.estimated_cost, _estimated_cost;

    -- Do a full count of rows if context is set to on or if auto is set and estimates are low enough
    IF
        _context = 'on'
        OR
        ( _context = 'auto' AND
            (
                sw.estimated_count < _estimated_count
                AND
                sw.estimated_cost < _estimated_cost
            )
        )
    THEN
        t := clock_timestamp();
        RAISE NOTICE 'Calculating actual count...';
        EXECUTE format(
            'SELECT count(*) FROM items WHERE %s',
            inwhere
        ) INTO sw.total_count;
        i := clock_timestamp() - t;
        RAISE NOTICE 'Actual Count: % -- %', sw.total_count, i;
        sw.time_to_count := extract(epoch FROM i);
    ELSE
        sw.total_count := NULL;
        sw.time_to_count := NULL;
    END IF;

    IF NOT ro THEN
        INSERT INTO pgstac.search_wheres
            (_where, lastused, usecount, statslastupdated, estimated_count, estimated_cost, time_to_estimate, partitions, total_count, time_to_count)
        SELECT sw._where, sw.lastused, sw.usecount, sw.statslastupdated, sw.estimated_count, sw.estimated_cost, sw.time_to_estimate, sw.partitions, sw.total_count, sw.time_to_count
        ON CONFLICT ((encode(sha256(_where)::bytea, 'hex')))
        DO UPDATE
            SET
                lastused = sw.lastused,
                usecount = sw.usecount,
                statslastupdated = sw.statslastupdated,
                estimated_count = sw.estimated_count,
                estimated_cost = sw.estimated_cost,
                time_to_estimate = sw.time_to_estimate,
                total_count = sw.total_count,
                time_to_count = sw.time_to_count
        ;
    END IF;
    RETURN sw;
END;
$$ LANGUAGE PLPGSQL SECURITY DEFINER;

If terrible things happen, roll it back with:

CREATE OR REPLACE FUNCTION pgstac.where_stats(inwhere text, updatestats boolean default false, conf jsonb default null) RETURNS pgstac.search_wheres AS $$
DECLARE
    t timestamptz;
    i interval;
    explain_json jsonb;
    partitions text[];
    sw pgstac.search_wheres%ROWTYPE;
    inwhere_hash text := md5(inwhere);
    _context text := lower(pgstac.context(conf));
    _stats_ttl interval := pgstac.context_stats_ttl(conf);
    _estimated_cost float := pgstac.context_estimated_cost(conf);
    _estimated_count int := pgstac.context_estimated_count(conf);
    ro bool := pgstac.readonly(conf);
BEGIN
    IF ro THEN
        updatestats := FALSE;
    END IF;

    IF _context = 'off' THEN
        sw._where := inwhere;
        return sw;
    END IF;

    SELECT * INTO sw FROM pgstac.search_wheres WHERE md5(_where)=inwhere_hash FOR UPDATE;

    -- Update statistics if explicitly set, if statistics do not exist, or statistics ttl has expired
    IF NOT updatestats THEN
        RAISE NOTICE 'Checking if update is needed for: % .', inwhere;
        RAISE NOTICE 'Stats Last Updated: %', sw.statslastupdated;
        RAISE NOTICE 'TTL: %, Age: %', _stats_ttl, now() - sw.statslastupdated;
        RAISE NOTICE 'Context: %, Existing Total: %', _context, sw.total_count;
        IF
            (
                sw.statslastupdated IS NULL
                OR (now() - sw.statslastupdated) > _stats_ttl
                OR (pgstac.context(conf) != 'off' AND sw.total_count IS NULL)
            ) AND NOT ro
        THEN
            updatestats := TRUE;
        END IF;
    END IF;

    sw._where := inwhere;
    sw.lastused := now();
    sw.usecount := coalesce(sw.usecount,0) + 1;

    IF NOT updatestats THEN
        UPDATE pgstac.search_wheres SET
            lastused = sw.lastused,
            usecount = sw.usecount
        WHERE md5(_where) = inwhere_hash
        RETURNING * INTO sw
        ;
        RETURN sw;
    END IF;

    -- Use explain to get estimated count/cost and a list of the partitions that would be hit by the query
    t := clock_timestamp();
    EXECUTE format('EXPLAIN (format json) SELECT 1 FROM items WHERE %s', inwhere)
    INTO explain_json;
    RAISE NOTICE 'Time for just the explain: %', clock_timestamp() - t;
    i := clock_timestamp() - t;

    sw.statslastupdated := now();
    sw.estimated_count := explain_json->0->'Plan'->'Plan Rows';
    sw.estimated_cost := explain_json->0->'Plan'->'Total Cost';
    sw.time_to_estimate := extract(epoch from i);

    RAISE NOTICE 'ESTIMATED_COUNT: % < %', sw.estimated_count, _estimated_count;
    RAISE NOTICE 'ESTIMATED_COST: % < %', sw.estimated_cost, _estimated_cost;

    -- Do a full count of rows if context is set to on or if auto is set and estimates are low enough
    IF
        _context = 'on'
        OR
        ( _context = 'auto' AND
            (
                sw.estimated_count < _estimated_count
                AND
                sw.estimated_cost < _estimated_cost
            )
        )
    THEN
        t := clock_timestamp();
        RAISE NOTICE 'Calculating actual count...';
        EXECUTE format(
            'SELECT count(*) FROM items WHERE %s',
            inwhere
        ) INTO sw.total_count;
        i := clock_timestamp() - t;
        RAISE NOTICE 'Actual Count: % -- %', sw.total_count, i;
        sw.time_to_count := extract(epoch FROM i);
    ELSE
        sw.total_count := NULL;
        sw.time_to_count := NULL;
    END IF;

    IF NOT ro THEN
        INSERT INTO pgstac.search_wheres
            (_where, lastused, usecount, statslastupdated, estimated_count, estimated_cost, time_to_estimate, partitions, total_count, time_to_count)
        SELECT sw._where, sw.lastused, sw.usecount, sw.statslastupdated, sw.estimated_count, sw.estimated_cost, sw.time_to_estimate, sw.partitions, sw.total_count, sw.time_to_count
        ON CONFLICT ((md5(_where)))
        DO UPDATE
            SET
                lastused = sw.lastused,
                usecount = sw.usecount,
                statslastupdated = sw.statslastupdated,
                estimated_count = sw.estimated_count,
                estimated_cost = sw.estimated_cost,
                time_to_estimate = sw.time_to_estimate,
                total_count = sw.total_count,
                time_to_count = sw.time_to_count
        ;
    END IF;
    RETURN sw;
END;
$$ LANGUAGE PLPGSQL SECURITY DEFINER;

@bitner
Copy link
Collaborator

bitner commented Feb 12, 2025

Yes. The fix is indeed very easy and will work for any new searches. The problem is that anyone who uses saved searches (ie to use with titiler-pgstac) will be looking for the old md5 hash and so this will be a major breaking change.

You can certainly make the change locally on your instance, and I will definitely look to make this change on our next major version release, but I am very leery to make this change right now as I know for certain it would break all instances that have existing searches / search_wheres (which can be in the millions) that they leverage with tools like titiler-pgstac. When we make the change, we will need to create a manual migration to convert the old hash over to the new hash.

@vincentsarago
Copy link
Member

The problem is that anyone who uses saved searches (ie to use with titiler-pgstac) will be looking for the old md5 hash and so this will be a major breaking change.

I would not worry too much about this, search should always be registered or maybe in the migration step we can migrate old hash to the new format 🤷 (even if this has to be a manual step for titiler-pgstac application)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants