Use temp tables for all sources #23

marco44 · 2025-01-30T14:36:46Z

Right now, the collector fetches all the records to insert for a data source into a unique data_source_name_src_tmp table. The problem is that this table gets a lot of inserts and deletes, in big batches, and autovacuum has a hard time keeping up. So the table gets large, and then getting all records for a srvid to coalesce them gets slower (as the full scan gets slower), making the whole situation snowballing.

This happens only when there are quite a few servers, but it's problematic, as the IO load gets very high.

A solution would be to have one table per srvid and data source. That could be done with temporary tables. We could then simply truncate them instead of deleting them, so no need to vacuum those anymore, nor doing the deletes, we'll save a lot of IO.

The downside is some catalog bloat, but it's going to be much easier to vacuum the catalog tables than the _src_tmp tables

So what I'm proposing, is to have a schema belonging to the powa user, and create temp tables with the exact same name as those currently user (data_source_name_src_tmp) in those schemas, so we can use them to overload the table names with the temporary ones transparently.

Those temporary tables would be created using create temp table data_source_name_src_tmp like (app_schema.data_source_name_src_tmp) or something similar

Then we would fill them, and take our snapshots using the exact same code as currently. The current powa-archivist would also delete the content of those temp tables.

For a later iteration of powa-archivist, we could replace this delete with a truncate: the table will never be shared by several servers anymore, so this code

    IF (_srvid != 0) THEN
        DELETE FROM @[email protected]_statements_src_tmp WHERE srvid = _srvid;
    END IF;

could be a truncate instead. The problem is that we may have a recent powa-archivist and an old powa-collector, it could not be doing the temp table method, and then this table would contain records for different srvid

The only solution I see for this would be to be able to tell this function to perform either a delete or a truncate. This could be done by adding a default parameter for this method, that would default to "delete", and we could set it to truncate. Old calls would work as before, and a recent collector could insist that we want to truncate instead. Or do nothing and let the collector truncate by itself.

The text was updated successfully, but these errors were encountered:

rjuju · 2025-01-31T00:11:58Z

So what I'm proposing, is to have a schema belonging to the powa user, and create temp tables with the exact same name as those currently user (data_source_name_src_tmp) in those schemas, so we can use them to overload the table names with the temporary ones transparently.

Those temporary tables would be created using create temp table data_source_name_src_tmp like (app_schema.data_source_name_src_tmp) or something similar

I'm a bit confused by this part. You can't create a temporary table with a schema except pg_temp, but you example also doesn't try to do that and uses the schema for the existing unlogged tables (which should be in the sae schema as powa).

For a later iteration of powa-archivist, we could replace this delete with a truncate: the table will never be shared by several servers anymore

If you want to avoid sharing temp table with multiple remote serers, you will need to create multiple temp tables so you need to have the server id part of its name or something like that, and therefore make the query that reads from it dynamic.

rjuju · 2025-01-31T05:44:56Z

The best solution I could come up with is to do something like

CREATE TEMP TABLE powa_XXX_src_tmp_$srvid (CHECK (srvid = $srvid) INHERITS (powa_XXX_src_tmp);

This way it would not require any change and will be transparent on the powa archivist side. If you have an old collector you just do the same as before, and if you have the new collector you will still get the correct value but hopefully better performance provided that constraint_exclusion is enabled.

We can still do the delete as before, but transform it to a DELETE FROM ONLY ..., which will be a no-op if you have the new collector (and otherwise will cleanup the data with the old collector), and the new collector can be responsible for the TRUNCATE. If you happen to have the old powa-archivist, then the DELETE will not be a no-op and will be a bit slower, but as least it won't bloat as before.

marco44 · 2025-01-31T08:01:37Z

So what I'm proposing, is to have a schema belonging to the powa user, and create temp tables with the exact same name as those currently user (data_source_name_src_tmp) in those schemas, so we can use them to overload the table names with the temporary ones transparently.
Those temporary tables would be created using create temp table data_source_name_src_tmp like (app_schema.data_source_name_src_tmp) or something similar

I'm a bit confused by this part. You can't create a temporary table with a schema except pg_temp, but you example also doesn't try to do that and uses the schema for the existing unlogged tables (which should be in the sae schema as powa).
Yeah so

1 error from my part. You can't specify a schema, but you don't need anyway: if you create a temp table with the same name as a regular table, as it's earlier in the search path it will be used instead of the regular table
yeah we need to find a way around the schema… couldn't we instead set a search_path in the function ?

For a later iteration of powa-archivist, we could replace this delete with a truncate: the table will never be shared by several servers anymore

If you want to avoid sharing temp table with multiple remote serers, you will need to create multiple temp tables so you need to have the server id part of its name or something like that, and therefore make the query that reads from it dynamic.

Aren't all the powa collector workers using their own dedicated session to the central powa database ?

The best solution I could come up with is to do something like

CREATE TEMP TABLE powa_XXX_src_tmp_$srvid (CHECK (srvid = $srvid) INHERITS (powa_XXX_src_tmp);

This way it would not require any change and will be transparent on the powa archivist side. If you have an old collector you just do the same as before, and if you have the new collector you will still get the correct value but hopefully better performance provided that constraint_exclusion is enabled.

We can still do the delete as before, but transform it to a DELETE FROM ONLY ..., which will be a no-op if you have the new collector (and otherwise will cleanup the data with the old collector), and the new collector can be responsible for the TRUNCATE. If you happen to have the old powa-archivist, then the DELETE will not be a no-op and will be a bit slower, but as least it won't bloat as before.

Yeah that may even be easier, you're right

rjuju · 2025-01-31T08:10:14Z

Aren't all the powa collector workers using their own dedicated session to the central powa database ?

They do, but I think that @frost242 had to use some connection pooler to avoid having too many concurrent snapshots. That's an easy solution if you have more than a few dozen of remote servers and we should avoid breaking that compatibility.

marco44 · 2025-01-31T08:12:25Z

Ok, then it means they can't be temp tables I suppose… or we need to recreate them on each collect

rjuju · 2025-01-31T08:13:32Z

why not? if we use the srvid in the temp table name there shouldn't be any problem.

marco44 · 2025-01-31T08:14:49Z

Because we're not sure that the next collection for the same srvid will come from the same session?

rjuju · 2025-01-31T08:17:53Z

we can just check if the table exists before each snapshot rather than relying on some thread state. That will be even easier to do and not that expensive compared to the snapshot itself. That will increase a bit the number of pgss entries but they should stay somewhat stable, as long as you don't keep restarting postgres, the pooler or the collector

Closes #23 If this goes according to plan, then we can just slightly patch powa archivist to DELETE FROM ONLY to not waste time with those temp tables. They'll just get purged at disconnection The only issue I see that this could bring is that we'll have lots of distinct queryids on the collector itself, if we decide to manage via powa We could make disconnection from the database optional so that we can keep the temp tables active, but I really think this temp table mechanism isn't a big price to pay for the dramatic IO reduction it should bring Anyway, comments are welcome

marco44 self-assigned this Feb 3, 2025

marco44 linked a pull request Feb 3, 2025 that will close this issue

Make use of temp tables #24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use temp tables for all sources #23

Use temp tables for all sources #23

marco44 commented Jan 30, 2025

rjuju commented Jan 31, 2025

rjuju commented Jan 31, 2025 •

edited

Loading

marco44 commented Jan 31, 2025

rjuju commented Jan 31, 2025

marco44 commented Jan 31, 2025 •

edited

Loading

rjuju commented Jan 31, 2025

marco44 commented Jan 31, 2025

rjuju commented Jan 31, 2025 •

edited

Loading

Use temp tables for all sources #23

Use temp tables for all sources #23

Comments

marco44 commented Jan 30, 2025

rjuju commented Jan 31, 2025

rjuju commented Jan 31, 2025 • edited Loading

marco44 commented Jan 31, 2025

rjuju commented Jan 31, 2025

marco44 commented Jan 31, 2025 • edited Loading

rjuju commented Jan 31, 2025

marco44 commented Jan 31, 2025

rjuju commented Jan 31, 2025 • edited Loading

rjuju commented Jan 31, 2025 •

edited

Loading

marco44 commented Jan 31, 2025 •

edited

Loading

rjuju commented Jan 31, 2025 •

edited

Loading