Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/mirror/status/ is slow #517

Open
jelly opened this issue Jul 22, 2024 · 4 comments
Open

/mirror/status/ is slow #517

jelly opened this issue Jul 22, 2024 · 4 comments
Labels

Comments

@jelly
Copy link
Member

jelly commented Jul 22, 2024

Loading /mirrors/status/ takes ~ 2 seconds on our production env. this needs some optimisation.

@jelly jelly added the bug label Jul 22, 2024
@christian-heusel
Copy link
Member

Can we just precompute it via cronjob every idk 20 min and serve the result? 😆

@jelly
Copy link
Member Author

jelly commented Jul 22, 2024

So some analysis, locally nothing is shown because I need to tweak the DEFAULT_CUTOFF to DEFAULT_CUTOFF = timedelta(hours=100) in mirrors/utils.py`

SELECT "mirrors_mirrorlog"."url_id",
       "mirrors_mirrorlog"."error",
       COUNT("mirrors_mirrorlog"."error") AS "error_count",
       MAX("mirrors_mirrorlog"."check_time") AS "last_occurred"
  FROM "mirrors_mirrorlog"
 INNER JOIN "mirrors_mirrorurl"
    ON ("mirrors_mirrorlog"."url_id" = "mirrors_mirrorurl"."id")
 INNER JOIN "mirrors_mirror"
    ON ("mirrors_mirrorurl"."mirror_id" = "mirrors_mirror"."id")
 WHERE ("mirrors_mirrorlog"."check_time" >= '2024-07-18T14:12:56.565672+00:00'::timestamptz AND NOT "mirrors_mirrorlog"."is_success" AND "mirrors_mirror"."public" AND "mirrors_mirrorurl"."active" AND "mirrors_mirror"."active" AND "mirrors_mirror"."public")
 GROUP BY "mirrors_mirrorlog"."url_id",
          "mirrors_mirrorlog"."error"
 ORDER BY 4 DESC,
          3 DESC

This query locally takes 500 ms(!), on our live env the page consistenly loads in ~ 2 seconds.

                                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=525187.86..525234.16 rows=18518 width=31)
   Sort Key: (max(mirrors_mirrorlog.check_time)) DESC, (count(mirrors_mirrorlog.error)) DESC
   ->  GroupAggregate  (cost=413.78..523875.25 rows=18518 width=31)
         Group Key: mirrors_mirrorlog.url_id, mirrors_mirrorlog.error
         ->  Incremental Sort  (cost=413.78..523504.89 rows=18518 width=23)
               Sort Key: mirrors_mirrorlog.url_id, mirrors_mirrorlog.error
               Presorted Key: mirrors_mirrorlog.url_id
               ->  Nested Loop  (cost=1.13..522889.88 rows=18518 width=23)
                     ->  Nested Loop  (cost=0.56..889.93 rows=1027 width=4)
                           ->  Index Scan using mirrors_mirrorurl_pkey on mirrors_mirrorurl  (cost=0.28..310.97 rows=1685 width=8)
                                 Filter: active
                           ->  Memoize  (cost=0.29..0.75 rows=1 width=4)
                                 Cache Key: mirrors_mirrorurl.mirror_id
                                 Cache Mode: logical
                                 ->  Index Scan using mirrors_mirror_pkey on mirrors_mirror  (cost=0.28..0.74 rows=1 width=4)
                                       Index Cond: (id = mirrors_mirrorurl.mirror_id)
                                       Filter: (public AND active AND public)
                     ->  Index Scan using mirrors_mirrorlog_url_id_check_time on mirrors_mirrorlog  (cost=0.56..508.02 rows=26 width=23)
                           Index Cond: ((url_id = mirrors_mirrorurl.id) AND (check_time >= '2024-07-18 16:12:56.565672+02'::timestamp with time zone))
                           Filter: (NOT is_success)
 JIT:
   Functions: 24
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(23 rows)

This is get_mirror_errors which is not cached by us and thus explains why the page loads slowly.

get_mirror_statuses is also slow ~ 300 ms locally:

archweb=> explain SELECT l.url_id,
       u.mirror_id,
       COUNT(l.id) AS check_count,
       COUNT(l.last_sync) AS success_count,
       MAX(l.last_sync) AS last_sync,
       MAX(l.check_time) AS last_check,
       AVG(l.duration) AS duration_avg,
       STDDEV(l.duration) AS duration_stddev,
       AVG(check_time - last_sync) AS delay
  FROM mirrors_mirrorlog l
  JOIN mirrors_mirrorurl u
    ON u.id = l.url_id
 WHERE l.check_time >= '2024-07-18T14:12:56.193089+00:00'::timestamptz
 GROUP BY l.url_id,
          u.mirror_id;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=617361.15..623734.47 rows=192402 width=72)
   Group Key: l.url_id, u.mirror_id
   Planned Partitions: 8
   ->  Nested Loop  (cost=0.85..597158.94 rows=192402 width=36)
         ->  Index Scan using mirrors_mirrorlog_check_time on mirrors_mirrorlog l  (cost=0.56..591390.08 rows=192402 width=32)
               Index Cond: (check_time >= '2024-07-18 16:12:56.193089+02'::timestamp with time zone)
         ->  Memoize  (cost=0.29..0.77 rows=1 width=8)
               Cache Key: l.url_id
               Cache Mode: logical
               ->  Index Scan using mirrors_mirrorurl_pkey on mirrors_mirrorurl u  (cost=0.28..0.76 rows=1 width=8)
                     Index Cond: (id = l.url_id)
 JIT:
   Functions: 14
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(14 rows)

@jelly
Copy link
Member Author

jelly commented Jul 22, 2024

Applied a workaround for now in 6edcaaa

@HBwilliam
Copy link

HBwilliam commented Jul 22, 2024

have you tried

SELECT "mirrors_mirrorlog"."url_id",
       "mirrors_mirrorlog"."error",
       COUNT("mirrors_mirrorlog"."error") AS "error_count",
       MAX("mirrors_mirrorlog"."check_time") AS "last_occurred"
  FROM "mirrors_mirrorlog"
 INNER JOIN "mirrors_mirrorurl"
    ON ("mirrors_mirrorlog"."url_id" = "mirrors_mirrorurl"."id" AND "mirrors_mirrorurl"."active")
 INNER JOIN "mirrors_mirror"
    ON ("mirrors_mirrorurl"."mirror_id" = "mirrors_mirror"."id" AND "mirrors_mirror"."public" AND "mirrors_mirror"."active")
 WHERE "mirrors_mirrorlog"."check_time" >= '2024-07-18T14:12:56.565672+00:00'::timestamptz
       AND NOT "mirrors_mirrorlog"."is_success"
 GROUP BY "mirrors_mirrorlog"."url_id",
          "mirrors_mirrorlog"."error"
 ORDER BY 4 DESC,
          3 DESC

filtering public and active earlier should help, fingers crossed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants