Replace LIMIT/OFFSET with Psycopg2 server-side cursor #140

anthonyfok · 2021-10-22T18:17:50Z

Fixes #138

(Draft, to be tested in full stack run.)

drotheram · 2021-10-22T18:27:17Z

Nice work. Would be interested to see some performance comparisons. LIMIT/OFFSET is also used extensively in many of the *_postgres2es.py scripts.
Would like to make sure that changing the pagination method is at least:

more efficient than LIMIT/OFFSET
as robust in making sure that all records and no duplicates are copied from PostGIS to ES

anthonyfok · 2021-10-22T23:18:18Z

Thank you @drotheram for the great ideas! You have hit the nail on the head, and indeed, my first stack run has already failed — related to #137, used a wrong search-and-replace regex and changing python3 to pypy3 in too many places, and Debian's pypy3 and python3-numpy not working together. Easy to fix, but does show how fragile these changes can be, especially when hurriedly committed in a somewhat sleepy state. 😅

But yeah, there are three main changes in this series of pull requests:

New base OS image for python-env, and slightly updated Python libraries
The switch from CPython to PyPy for some scripts
LIMIT/OFFSET vs server-side cursor

and each of them can introduce unintended changes to the data. I have been wondering e.g. how to verify that the actual data exported to Elasticsearch are identical: Maybe dumping the GeoJSON data for each 10000 rows as individual files to compare? Maybe using a separate script to query the Elasticsearch server?

Ideally, probably need to run 5 test cases:

v1.2.0 as control
with the new debian:sid-20201012-slim-based python-env OS image, but nothing else changed. (CPython + LIMIT/OFFSET)
No. 2 + PyPy + LIMIT/OFFSET
No. 2 + CPython + server-side cursor
No. 2 + PyPy + server-side cursor

Ditto for the benchmark, and in particular, I am interested to find out whether the LIMIT/OFFSET method does gets slower and slower when OFFSET gets really big. Yes, let's record the timestamps and plot them! And maybe have these tests and benchmarks as part of the Python scripts so we can monitor data integrity and performance over time.

Sounds a bit scary but at the same time exciting to me, but yeah, this would be something that let us monitor the health and performance of our stack. Probably as something that CI/CD with GitHub Actions could do.

And before I forget, I am also starting to think that in the future we could have the Docker Compose logs, test and benchmark results semi-automatically uploaded to an S3 bucket for record and analysis. (Privacy alert!) With the end user's consent of course. And also collect e.g. CPU, RAM, hard disk information too... kind of like CPU-ID or Passmark... Or maybe not that good of an idea. 😅

anthonyfok added the Task label Oct 22, 2021

anthonyfok added this to the Sprint 45 milestone Oct 22, 2021

anthonyfok self-assigned this Oct 22, 2021

anthonyfok force-pushed the use-psycopg2-server-side-cursor branch from d3f2c80 to e54dfba Compare October 22, 2021 18:20

Replace LIMIT/OFFSET with Psycopg2 server-side cursor

fa2778e

Fixes OpenDRR#138

anthonyfok force-pushed the use-psycopg2-server-side-cursor branch from e54dfba to fa2778e Compare October 22, 2021 18:25

anthonyfok modified the milestones: Sprint 45, Sprint 46 Nov 8, 2021

anthonyfok modified the milestones: Sprint 46, Sprint 47 Nov 22, 2021

anthonyfok removed this from the Sprint 47 milestone Jan 17, 2022

anthonyfok force-pushed the master branch from 936f602 to 9942217 Compare April 2, 2024 08:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace LIMIT/OFFSET with Psycopg2 server-side cursor #140

Replace LIMIT/OFFSET with Psycopg2 server-side cursor #140

anthonyfok commented Oct 22, 2021

drotheram commented Oct 22, 2021

anthonyfok commented Oct 22, 2021

Replace LIMIT/OFFSET with Psycopg2 server-side cursor #140

Are you sure you want to change the base?

Replace LIMIT/OFFSET with Psycopg2 server-side cursor #140

Conversation

anthonyfok commented Oct 22, 2021

drotheram commented Oct 22, 2021

anthonyfok commented Oct 22, 2021