Optimize repo open pipeline #5

amitsr4 · 2024-12-09T09:21:49Z

The current implementation uses limited prefetching between the main thread and the worker thread. To improve performance, I've modified the architecture to use a ring buffer shared between the main and worker threads. It employs a shared buffer pool with three sections (30MB each) that rotate between states (EMPTY → WORKER_PROCESSING → READY_FOR_MAIN → MAIN_PROCESSING → EMPTY), allowing simultaneous reading and processing of data.

amitsr4 · 2024-12-09T14:22:15Z

On my machine, MacBook Pro M1 with 16GB RAM, the worker thread reads and identifies JSON object positions in the file (~35ms per section), while the main thread handles parsing and data persistence (~250ms per section). Each section processes approximately 39,500 objects. The parallel architecture ensures continuous data flow by having the worker prepare upcoming sections while the main thread processes the current one. However, performance testing revealed that this architectural change did not yield the expected improvements. Testing with JSON log files containing 1 million commits still takes approximately 7 seconds to process, similar to the original implementation. Further investigation and optimization are needed to achieve meaningful performance improvements.

ofriw added the enhancement New feature or request label Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize repo open pipeline #5

Optimize repo open pipeline #5

amitsr4 commented Dec 9, 2024 •

edited

Loading

amitsr4 commented Dec 9, 2024

Optimize repo open pipeline #5

Optimize repo open pipeline #5

Comments

amitsr4 commented Dec 9, 2024 • edited Loading

amitsr4 commented Dec 9, 2024

amitsr4 commented Dec 9, 2024 •

edited

Loading