-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing indexes for srvid in tables? High CPU usage #194
Comments
After some time, i received following errors in PoWA Web
|
With adding of the indexes from the initial post, I was able to have a functional gathering of data from 8 remote servers. However, the high CPU issue still remains. Everytime when the data gathering is initiated and when I see parallel queries running I will try to increase frequency from 2 minutes back to default 5 minutes. |
Additional update, it looks like reducing the interval to 5 minutes has given enough time for syncing, and I'm no longer seeing high CPU (80-90%), only some spikes up to 10-20% |
hi, thanks for the report. The original reason why we avoid creating indexes on the *_src_tmp tables is because they're used as temporary tables mostly, so adding indexes makes the snapshot operation more expensive. in general you shouldn't have too many rows in those tables since they're only used temporarily and the data should be removed almost immediately. if reducing the snapshot frequency solves the problem it's likely that the actual problem is auto vacuum not being aggressive enough. Maybe you could try that to see if it can lower the cpu usage enough even with more frequent snapshots? |
Hi, I did see that autovacuum processes were running in parallel with
Below are autovacuum settings that I have on local machine where collector is running
I will try to re-enable autovacuum and see if there is difference |
Well, disabling the auto vacuum was a horrible idea (although I was very aware of the consequences). powa_all_tables_history_current table size went up to ~800GB. I deleted the whole DB to release the disk space, recreated it and registered the servers again. This time I kept the autovacuum enabled per tables (default settings) and reduced the retention interval and frequency from 5 to 10 minutes. I have also recreated all indexes from initial post. Will let you know the results. |
Hi, After all of mentioned changes, CPU is more-or less idle, disk usage is around ~20GB with retention interval of 6 hours. I assume that initially, I have encountered wild scenarios with highly active remote servers and intense polling frequency, which caused CPU issues. Nevertheless, I think that these indexes can be of use in these temp tables, mostly depending on activity on remote servers. |
Hello, Same for me. Very high CPU usage in POWA5. Was not the case in POWA4. |
thanks @ikalafat for all the details. We could of course add those indexes but if we can avoid it it's good as it will make the whole process way more expensive. @BornTKill do you have more details on your use case so we can try to investigate? What would be useful would be at least:
|
Hi @rjuju Thank you for you answer.
pg_qualstats | ✓ | ✓ | ✓ | 2.1.0
|
two more cents from my side - I only have the @BornTKill what is sampling rate/frequency on your end? can you try to add for example 5 by 5 servers with some pause in between? |
thanks for the extra details. I think that the common denominator here is a large number of remote serves all added at the same time. it's something not well tested for now, and that can indeed have some side effects. @frost242 already started some discussions about it in various issues. the main problem with adding a lot of servers at the same time is that almost everything will happen concurrently. we added some mechanism to try to spread the snapshots themselves (I think it should kick in in that case too, you could confirm by comparing the snapshot time for each remote servers), but then there are other problems, like all the catalog snapshot will still happen mostly at the same time (for now first snapshot and then every month), same for the coalesce and the purge. you could try to apply locally the patch merged at powa-team/powa-archivist#92 to see if this helps, and if yes if there are still some other problems after that. |
Hello, Thank you for you answer. Will try the patch today and let you know. |
Can you explain me how to load this SQL ?
I have to put the file here ?
|
OK I put the file here /usr/share/postgresql/16/extension/ |
Sorry I should have been clearer. The idea was to copy the new CREATE OR REPLACE FUNCTION command and apply it on your powa-repository, ie this part https://github.com/powa-team/powa-archivist/blob/master/powa--5.0.2.sql#L3345-L3605. The mentioned changes has been merged as version 5.0.2, so if you only took the |
@rjuju After one day, i still have high CPU finally.
|
Sorry to hijack a bit, but autovacuum_vacuum_cost_delay to 20ms is extremely high… The default is 2ms on recent PostgreSQL versions (and it's already tuned for smallish installations). Mine is at 1ms (so my autovacuum is tuned to cleanup blocks 20 times faster than yours, as you seem to not have touched the vacuum_cost_limit). So it's very likely your autovacuum can't keep up, and your _tmp tables are not cleaned fast enough. Which then would make all the rest pile up, as you'll have to scan very bloated tables. Which will make the scans even slower, and everything would snowball from here. If you try changing the tuning, also put more autovacuum workers (like 8), so you're sure that smaller tables get vacuumed in a timely fashion |
@marco44 thanks for the headsup. will give it a shot today |
Just a quick info. I have applied new settings related to the autovacuum
to be more precise
have been reduced significantly
I have also (re)created extension with version 5.0.2 (copied the files to If this goes well, I can try to drop the indexes to see if there is a difference. I am also considering reducing the |
Similar thing on my end. I am additionally seeing weird behavior related to the disk usage, it looks like the retention isn't working well and then disk usage goes to 800GB for 8 servers with 6hrs retention CPU graph Disk graph 1st of February around 11:45, I have deleted the database and recreated it and added the servers back (disk was depleted) |
Hi,
I'm not sure if this report should be in the powa-collector repo. Please accept my apologies if I missed the repo.
I am seeing a lot of CPU pressure on local machine (where powa-collector) is running, for example on following query
SELECT public.powa_take_snapshot(5)
with comments/application name such as (powa_all_tables_snapshot, all_indexes_snapshot
)I was also seeing timeouts on query like this one
SELECT 1 FROM public.powa_catalog_class_src_tmp WHERE srvid = 2 LIMIT 1
Based on that, I think that some tables should have indexes, at least on srvid column.
For example, these are ones that I have executed on my machine:
I am seeing consistent increase in table size for these tables (which absolutely makes sense) however it looks like that due to indexes not being here, PostgreSQL uses sequential scan and a lot of CPU
For clarity, remote machines (4 clusters with master and replica servers) are beefy machines (8-16vCPU with 64 to 128GB RAM) with quite a lot of load.
If I have only one cluster (2 machines) registered to POWA, then I'm not seeing CPU issues and timeouts.
Local machine where the powa-collector is running and PostgreSQL has 16vCPU and 64GB of RAM
Any hints how can I reduce the CPU load?
Thank you,
The text was updated successfully, but these errors were encountered: