-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use temp tables for all sources #23
Comments
I'm a bit confused by this part. You can't create a temporary table with a schema except pg_temp, but you example also doesn't try to do that and uses the schema for the existing unlogged tables (which should be in the sae schema as powa).
If you want to avoid sharing temp table with multiple remote serers, you will need to create multiple temp tables so you need to have the server id part of its name or something like that, and therefore make the query that reads from it dynamic. |
The best solution I could come up with is to do something like CREATE TEMP TABLE powa_XXX_src_tmp_$srvid (CHECK (srvid = $srvid) INHERITS (powa_XXX_src_tmp); This way it would not require any change and will be transparent on the powa archivist side. If you have an old collector you just do the same as before, and if you have the new collector you will still get the correct value but hopefully better performance provided that We can still do the delete as before, but transform it to a |
Aren't all the powa collector workers using their own dedicated session to the central powa database ?
Yeah that may even be easier, you're right |
They do, but I think that @frost242 had to use some connection pooler to avoid having too many concurrent snapshots. That's an easy solution if you have more than a few dozen of remote servers and we should avoid breaking that compatibility. |
Ok, then it means they can't be temp tables I suppose… or we need to recreate them on each collect |
why not? if we use the srvid in the temp table name there shouldn't be any problem. |
Because we're not sure that the next collection for the same srvid will come from the same session? |
we can just check if the table exists before each snapshot rather than relying on some thread state. That will be even easier to do and not that expensive compared to the snapshot itself. That will increase a bit the number of pgss entries but they should stay somewhat stable, as long as you don't keep restarting postgres, the pooler or the collector |
Closes #23 If this goes according to plan, then we can just slightly patch powa archivist to DELETE FROM ONLY to not waste time with those temp tables. They'll just get purged at disconnection The only issue I see that this could bring is that we'll have lots of distinct queryids on the collector itself, if we decide to manage via powa We could make disconnection from the database optional so that we can keep the temp tables active, but I really think this temp table mechanism isn't a big price to pay for the dramatic IO reduction it should bring Anyway, comments are welcome
Right now, the collector fetches all the records to insert for a data source into a unique
data_source_name_src_tmp
table. The problem is that this table gets a lot of inserts and deletes, in big batches, and autovacuum has a hard time keeping up. So the table gets large, and then getting all records for a srvid to coalesce them gets slower (as the full scan gets slower), making the whole situation snowballing.This happens only when there are quite a few servers, but it's problematic, as the IO load gets very high.
A solution would be to have one table per srvid and data source. That could be done with temporary tables. We could then simply truncate them instead of deleting them, so no need to vacuum those anymore, nor doing the deletes, we'll save a lot of IO.
The downside is some catalog bloat, but it's going to be much easier to vacuum the catalog tables than the
_src_tmp
tablesSo what I'm proposing, is to have a schema belonging to the powa user, and create temp tables with the exact same name as those currently user (
data_source_name_src_tmp
) in those schemas, so we can use them to overload the table names with the temporary ones transparently.Those temporary tables would be created using
create temp table data_source_name_src_tmp like (app_schema.data_source_name_src_tmp)
or something similarThen we would fill them, and take our snapshots using the exact same code as currently. The current powa-archivist would also delete the content of those temp tables.
For a later iteration of powa-archivist, we could replace this delete with a truncate: the table will never be shared by several servers anymore, so this code
could be a truncate instead. The problem is that we may have a recent powa-archivist and an old powa-collector, it could not be doing the temp table method, and then this table would contain records for different srvid
The only solution I see for this would be to be able to tell this function to perform either a delete or a truncate. This could be done by adding a default parameter for this method, that would default to "delete", and we could set it to truncate. Old calls would work as before, and a recent collector could insist that we want to truncate instead. Or do nothing and let the collector truncate by itself.
The text was updated successfully, but these errors were encountered: