-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pgstac is not FIPS compliant #348
Comments
Hmmm, this would need to happen at a major version release as it would modify the keys that are stored in our tracking tables. I'm certainly open to making this very easy change, the only issue will be timing as the migration for this change on an existing stac instance could be quite substantial. |
Thank you! Should it help, we did use the following as a temporary fix and haven't seen any issues so far. I am also happy to try to make a PR and use the existing PRs as an example and with the caveat that I have never worked on a DB extension before. Two functions to modify in-place that replaces FIPS non-compliant md5() functions with sha256(). pgstac v0.8.4 1) search_hashRunCREATE OR REPLACE FUNCTION pgstac.search_hash(jsonb, jsonb) RETURNS text AS $$
SELECT encode(sha256(concat(pgstac.search_tohash($1)::text,$2::text)::bytea)::bytea, 'hex');
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE; If terrible things happen, roll it back with:CREATE OR REPLACE FUNCTION pgstac.search_hash(jsonb, jsonb) RETURNS text AS $$
SELECT md5(concat(pgstac.search_tohash($1)::text,$2::text));
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE; 2) where_statsRunCREATE OR REPLACE FUNCTION pgstac.where_stats(inwhere text, updatestats boolean default false, conf jsonb default null) RETURNS pgstac.search_wheres AS $$
DECLARE
t timestamptz;
i interval;
explain_json jsonb;
partitions text[];
sw pgstac.search_wheres%ROWTYPE;
inwhere_hash text := encode(sha256(inwhere::bytea)::bytea, 'hex');
_context text := lower(pgstac.context(conf));
_stats_ttl interval := pgstac.context_stats_ttl(conf);
_estimated_cost float := pgstac.context_estimated_cost(conf);
_estimated_count int := pgstac.context_estimated_count(conf);
ro bool := pgstac.readonly(conf);
BEGIN
IF ro THEN
updatestats := FALSE;
END IF;
IF _context = 'off' THEN
sw._where := inwhere;
return sw;
END IF;
SELECT * INTO sw FROM pgstac.search_wheres WHERE encode(sha256(_where)::bytea, 'hex')=inwhere_hash FOR UPDATE;
-- Update statistics if explicitly set, if statistics do not exist, or statistics ttl has expired
IF NOT updatestats THEN
RAISE NOTICE 'Checking if update is needed for: % .', inwhere;
RAISE NOTICE 'Stats Last Updated: %', sw.statslastupdated;
RAISE NOTICE 'TTL: %, Age: %', _stats_ttl, now() - sw.statslastupdated;
RAISE NOTICE 'Context: %, Existing Total: %', _context, sw.total_count;
IF
(
sw.statslastupdated IS NULL
OR (now() - sw.statslastupdated) > _stats_ttl
OR (context(conf) != 'off' AND sw.total_count IS NULL)
) AND NOT ro
THEN
updatestats := TRUE;
END IF;
END IF;
sw._where := inwhere;
sw.lastused := now();
sw.usecount := coalesce(sw.usecount,0) + 1;
IF NOT updatestats THEN
UPDATE pgstac.search_wheres SET
lastused = sw.lastused,
usecount = sw.usecount
WHERE encode(sha256(_where)::bytea, 'hex') = inwhere_hash
RETURNING * INTO sw
;
RETURN sw;
END IF;
-- Use explain to get estimated count/cost and a list of the partitions that would be hit by the query
t := clock_timestamp();
EXECUTE format('EXPLAIN (format json) SELECT 1 FROM items WHERE %s', inwhere)
INTO explain_json;
RAISE NOTICE 'Time for just the explain: %', clock_timestamp() - t;
i := clock_timestamp() - t;
sw.statslastupdated := now();
sw.estimated_count := explain_json->0->'Plan'->'Plan Rows';
sw.estimated_cost := explain_json->0->'Plan'->'Total Cost';
sw.time_to_estimate := extract(epoch from i);
RAISE NOTICE 'ESTIMATED_COUNT: % < %', sw.estimated_count, _estimated_count;
RAISE NOTICE 'ESTIMATED_COST: % < %', sw.estimated_cost, _estimated_cost;
-- Do a full count of rows if context is set to on or if auto is set and estimates are low enough
IF
_context = 'on'
OR
( _context = 'auto' AND
(
sw.estimated_count < _estimated_count
AND
sw.estimated_cost < _estimated_cost
)
)
THEN
t := clock_timestamp();
RAISE NOTICE 'Calculating actual count...';
EXECUTE format(
'SELECT count(*) FROM items WHERE %s',
inwhere
) INTO sw.total_count;
i := clock_timestamp() - t;
RAISE NOTICE 'Actual Count: % -- %', sw.total_count, i;
sw.time_to_count := extract(epoch FROM i);
ELSE
sw.total_count := NULL;
sw.time_to_count := NULL;
END IF;
IF NOT ro THEN
INSERT INTO pgstac.search_wheres
(_where, lastused, usecount, statslastupdated, estimated_count, estimated_cost, time_to_estimate, partitions, total_count, time_to_count)
SELECT sw._where, sw.lastused, sw.usecount, sw.statslastupdated, sw.estimated_count, sw.estimated_cost, sw.time_to_estimate, sw.partitions, sw.total_count, sw.time_to_count
ON CONFLICT ((encode(sha256(_where)::bytea, 'hex')))
DO UPDATE
SET
lastused = sw.lastused,
usecount = sw.usecount,
statslastupdated = sw.statslastupdated,
estimated_count = sw.estimated_count,
estimated_cost = sw.estimated_cost,
time_to_estimate = sw.time_to_estimate,
total_count = sw.total_count,
time_to_count = sw.time_to_count
;
END IF;
RETURN sw;
END;
$$ LANGUAGE PLPGSQL SECURITY DEFINER; If terrible things happen, roll it back with:CREATE OR REPLACE FUNCTION pgstac.where_stats(inwhere text, updatestats boolean default false, conf jsonb default null) RETURNS pgstac.search_wheres AS $$
DECLARE
t timestamptz;
i interval;
explain_json jsonb;
partitions text[];
sw pgstac.search_wheres%ROWTYPE;
inwhere_hash text := md5(inwhere);
_context text := lower(pgstac.context(conf));
_stats_ttl interval := pgstac.context_stats_ttl(conf);
_estimated_cost float := pgstac.context_estimated_cost(conf);
_estimated_count int := pgstac.context_estimated_count(conf);
ro bool := pgstac.readonly(conf);
BEGIN
IF ro THEN
updatestats := FALSE;
END IF;
IF _context = 'off' THEN
sw._where := inwhere;
return sw;
END IF;
SELECT * INTO sw FROM pgstac.search_wheres WHERE md5(_where)=inwhere_hash FOR UPDATE;
-- Update statistics if explicitly set, if statistics do not exist, or statistics ttl has expired
IF NOT updatestats THEN
RAISE NOTICE 'Checking if update is needed for: % .', inwhere;
RAISE NOTICE 'Stats Last Updated: %', sw.statslastupdated;
RAISE NOTICE 'TTL: %, Age: %', _stats_ttl, now() - sw.statslastupdated;
RAISE NOTICE 'Context: %, Existing Total: %', _context, sw.total_count;
IF
(
sw.statslastupdated IS NULL
OR (now() - sw.statslastupdated) > _stats_ttl
OR (pgstac.context(conf) != 'off' AND sw.total_count IS NULL)
) AND NOT ro
THEN
updatestats := TRUE;
END IF;
END IF;
sw._where := inwhere;
sw.lastused := now();
sw.usecount := coalesce(sw.usecount,0) + 1;
IF NOT updatestats THEN
UPDATE pgstac.search_wheres SET
lastused = sw.lastused,
usecount = sw.usecount
WHERE md5(_where) = inwhere_hash
RETURNING * INTO sw
;
RETURN sw;
END IF;
-- Use explain to get estimated count/cost and a list of the partitions that would be hit by the query
t := clock_timestamp();
EXECUTE format('EXPLAIN (format json) SELECT 1 FROM items WHERE %s', inwhere)
INTO explain_json;
RAISE NOTICE 'Time for just the explain: %', clock_timestamp() - t;
i := clock_timestamp() - t;
sw.statslastupdated := now();
sw.estimated_count := explain_json->0->'Plan'->'Plan Rows';
sw.estimated_cost := explain_json->0->'Plan'->'Total Cost';
sw.time_to_estimate := extract(epoch from i);
RAISE NOTICE 'ESTIMATED_COUNT: % < %', sw.estimated_count, _estimated_count;
RAISE NOTICE 'ESTIMATED_COST: % < %', sw.estimated_cost, _estimated_cost;
-- Do a full count of rows if context is set to on or if auto is set and estimates are low enough
IF
_context = 'on'
OR
( _context = 'auto' AND
(
sw.estimated_count < _estimated_count
AND
sw.estimated_cost < _estimated_cost
)
)
THEN
t := clock_timestamp();
RAISE NOTICE 'Calculating actual count...';
EXECUTE format(
'SELECT count(*) FROM items WHERE %s',
inwhere
) INTO sw.total_count;
i := clock_timestamp() - t;
RAISE NOTICE 'Actual Count: % -- %', sw.total_count, i;
sw.time_to_count := extract(epoch FROM i);
ELSE
sw.total_count := NULL;
sw.time_to_count := NULL;
END IF;
IF NOT ro THEN
INSERT INTO pgstac.search_wheres
(_where, lastused, usecount, statslastupdated, estimated_count, estimated_cost, time_to_estimate, partitions, total_count, time_to_count)
SELECT sw._where, sw.lastused, sw.usecount, sw.statslastupdated, sw.estimated_count, sw.estimated_cost, sw.time_to_estimate, sw.partitions, sw.total_count, sw.time_to_count
ON CONFLICT ((md5(_where)))
DO UPDATE
SET
lastused = sw.lastused,
usecount = sw.usecount,
statslastupdated = sw.statslastupdated,
estimated_count = sw.estimated_count,
estimated_cost = sw.estimated_cost,
time_to_estimate = sw.time_to_estimate,
total_count = sw.total_count,
time_to_count = sw.time_to_count
;
END IF;
RETURN sw;
END;
$$ LANGUAGE PLPGSQL SECURITY DEFINER; |
Yes. The fix is indeed very easy and will work for any new searches. The problem is that anyone who uses saved searches (ie to use with titiler-pgstac) will be looking for the old md5 hash and so this will be a major breaking change. You can certainly make the change locally on your instance, and I will definitely look to make this change on our next major version release, but I am very leery to make this change right now as I know for certain it would break all instances that have existing searches / search_wheres (which can be in the millions) that they leverage with tools like titiler-pgstac. When we make the change, we will need to create a manual migration to convert the old hash over to the new hash. |
I would not worry too much about this, search should always be registered or maybe in the migration step we can migrate old hash to the new format 🤷 (even if this has to be a manual step for titiler-pgstac application) |
Description
This issue was discovered after a RHEL 9 machine running postgres 16 and titiler-pgstac was upgraded to use FIPS which, among other things, bars the use of md5 hashes. This makes pgstac unusable on some systems.
Error
The superficial error comes from titiler-pgstac:
I am suspecting the error is referring to this line here and its usage of the
md5()
function:pgstac/src/pgstac/pgstac.sql
Line 3405 in 6da165b
Possible Solution
Here are relevant example SQL commands run on our FIPS postgres instance. One to highlight the limitation and the other as a possible alternative:
Specs
md5()
s are still in the latest)The text was updated successfully, but these errors were encountered: