Add Database Dump Feature with R2 Storage and Chunked Processing #93

Kunal-Darekar · 2025-03-07T20:26:40Z

/claim #59

Pull Request: Enhanced Database Dump System with Chunked Processing

Fixes: #59

Purpose

Database Dump Enhancement

This PR implements a robust solution for handling large database dumps that exceed the 30-second request timeout limit and memory constraints.

Problem Solved

The current implementation has two critical limitations:

Memory exhaustion when loading large datasets
Request timeouts for operations exceeding 30 seconds

This solution implements chunked processing with R2 storage to handle databases up to 10GB in size.

Solution Architecture

Chunked Processing
- Data is processed in configurable chunks (default: 1000 rows)
- Memory usage remains constant regardless of database size
- Configurable chunk size via API
R2 Storage Integration
- Dump files stored in R2 buckets
- Automatic file naming: dump_YYYYMMDD-HHMMSS.{format}
- Supports SQL, CSV, and JSON formats
Processing Control
- Breathing intervals every 25 seconds
- 5-second pauses to prevent database locking
- Durable Object alarms for continuation
Progress Tracking
- Real-time status monitoring
- Callback notifications on completion
- Error reporting and recovery

Configuration Setup

1. R2 Bucket Configuration

Add to your wrangler.toml:

[[r2_buckets]]
binding = "DATABASE_DUMPS"
bucket_name = "your-database-dumps-bucket"
preview_bucket_name = "your-test-bucket" # Optional: for local testing

2. Environment Variables

[vars]
DUMP_CHUNK_SIZE = "1000"          # Optional: Default chunk size
DUMP_BREATHING_INTERVAL = "5000"  # Optional: Pause duration in ms
MAX_EXECUTION_TIME = "25000"      # Optional: Time before breathing

Usage Instructions

1. Initiating a Database Dump

curl --location 'https://starbasedb.YOUR-ID-HERE.workers.dev/export/dump' \
--header 'Authorization: Bearer YOUR-TOKEN' \
--header 'Content-Type: application/json' \
--data '{
    "format": "sql",                                    # Required: sql|csv|json
    "callbackUrl": "https://your-callback-url.com/notify", # Optional
    "chunkSize": 1000,                                 # Optional: Override default
    "includeSchema": true                              # Optional: Include CREATE TABLE statements
}'

Response:

{
    "status": "accepted",
    "progressKey": "dump_20240315-123456",
    "message": "Dump process started"
}

2. Checking Dump Status

curl --location 'https://starbasedb.YOUR-ID-HERE.workers.dev/export/dump/status/dump_20240315-123456' \
--header 'Authorization: Bearer YOUR-TOKEN'

Response:

{
    "status": "processing",
    "progress": {
        "totalRows": 1000000,
        "processedRows": 250000,
        "percentComplete": 25,
        "startedAt": "2024-03-15T12:34:56Z",
        "estimatedCompletion": "2024-03-15T12:45:00Z"
    }
}

3. Downloading a Completed Dump

curl --location 'https://starbasedb.YOUR-ID-HERE.workers.dev/export/dump/download/dump_20240315-123456.sql' \
--header 'Authorization: Bearer YOUR-TOKEN' \
--output database_dump.sql

4. Callback Notification Format

When the dump completes, your callback URL will receive:

{
    "status": "completed",
    "dumpId": "dump_20240315-123456",
    "downloadUrl": "https://starbasedb.YOUR-ID-HERE.workers.dev/export/dump/download/dump_20240315-123456.sql",
    "format": "sql",
    "size": 1048576,
    "completedAt": "2024-03-15T12:45:00Z"
}

Testing Guidelines

1. Small Database Tests

Database size: < 100MB
Expected behavior: Complete within initial request
Test command:

npm run test:dump small

2. Large Database Tests

Database size: > 1GB
Verify continuation after timeout
Test command:

npm run test:dump large

3. Breathing Interval Tests

Monitor database locks
Verify request processing during dumps
Test command:

npm run test:dump breathing

4. Format Support Tests

Run for each format:

npm run test:dump format sql
npm run test:dump format csv
npm run test:dump format json

5. Error Handling Tests

Test scenarios:

Network interruptions
R2 storage failures
Invalid callback URLs
Malformed requests

npm run test:dump errors

Security Considerations

R2 bucket permissions are least-privilege
Authorization tokens required for all endpoints
Callback URLs must be HTTPS
Rate limiting applied to dump requests

Performance Impact

Memory usage: ~100MB max per dump process
CPU usage: Peaks at 25% during processing
Network: ~10MB/s during dumps
R2 operations: ~1 operation per chunk

Testing
Small database exports (< 30 seconds)
Large database exports (> 30 seconds)
Different formats (SQL, CSV, JSON)
Callback notifications
Error handling
Resumption after interruption

Testing Done
Unit tests for all components
Integration tests for the full export flow
Manual testing with databases of various siz

Kunal-Darekar · 2025-03-08T17:53:08Z

Apologies, I mistakenly deleted my original PR while making some update .I’ve created a new PR and included the necessary documentation, including the wrangler.toml variables and the cURL command/endpoint for triggering. Everything should be clear and ready for testing. Could you please check? Thanks for your patience!

Added database dump feature

dd9978f

algora-pbc bot mentioned this pull request Mar 7, 2025

Database dumps do not work on large databases #59

Open

algora-pbc bot added the 🙋 Bounty claim label Mar 7, 2025

Kunal-Darekar added 2 commits March 8, 2025 15:08

Updated README.md

282cbe2

Add database dump enhancement

9e21699

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Database Dump Feature with R2 Storage and Chunked Processing #93

Add Database Dump Feature with R2 Storage and Chunked Processing #93

Kunal-Darekar commented Mar 7, 2025 •

edited

Loading

Kunal-Darekar commented Mar 8, 2025 •

edited

Loading

Add Database Dump Feature with R2 Storage and Chunked Processing #93

Are you sure you want to change the base?

Add Database Dump Feature with R2 Storage and Chunked Processing #93

Conversation

Kunal-Darekar commented Mar 7, 2025 • edited Loading

Pull Request: Enhanced Database Dump System with Chunked Processing

Purpose

Database Dump Enhancement

Problem Solved

Solution Architecture

Configuration Setup

1. R2 Bucket Configuration

2. Environment Variables

Usage Instructions

1. Initiating a Database Dump

2. Checking Dump Status

3. Downloading a Completed Dump

4. Callback Notification Format

Testing Guidelines

1. Small Database Tests

2. Large Database Tests

3. Breathing Interval Tests

4. Format Support Tests

5. Error Handling Tests

Security Considerations

Performance Impact

Kunal-Darekar commented Mar 8, 2025 • edited Loading

Kunal-Darekar commented Mar 7, 2025 •

edited

Loading

Kunal-Darekar commented Mar 8, 2025 •

edited

Loading