Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(v2): reduce leaf I/O #1044

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

kocubinski
Copy link
Member

@kocubinski kocubinski commented Feb 13, 2025

Part 1 of #1043

  • separate branch and leaf sequence numbers into different key spaces reducing I/O (~10% faster in bench test)
  • refactor tests (all tests in v2/ now passing)
  • pull bench test into CLI command from unit test
  • refactor metrics to allow pluggable backends

Summary by CodeRabbit

  • New Features

    • Introduced a command-line benchmarking tool that supports snapshot loading, CPU profiling, and real-time performance monitoring.
    • Added a debug logging mode to provide detailed runtime diagnostics.
  • Bug Fixes

    • Improved error handling and metrics tracking in various database operations.
  • Refactor

    • Enhanced metrics integration to offer improved monitoring of performance and operational data.
    • Streamlined tree operations and database interactions for better efficiency and reliability.
    • Updated test logic for improved metrics handling and tree building.
  • Chores

    • Updated Go version and toolchain specifications in the project configuration.

@kocubinski kocubinski requested a review from a team as a code owner February 13, 2025 18:27
Copy link

coderabbitai bot commented Feb 13, 2025

Walkthrough

This pull request introduces a new benchmarking command for the IAVL tree along with extensive updates across the codebase. A new Cobra-based command in the bench package enables standard development benchmarks with environment configuration, CPU profiling, and Prometheus metrics integration. Additional modifications include refactoring tests to use default tree options, improvements to metrics handling via new types, enhanced logging through a debug logger, and streamlined tree and SQLite operations. A deprecated NodePool test was removed, and concurrency tests were added for SQLite operations.

Changes

File(s) Change Summary
v2/cmd/bench/bench.go, v2/cmd/root.go Added a new benchmarking command with a Cobra structure; integrated the bench command into the root command.
v2/iterator_test.go, v2/sqlite_test.go, v2/tree_test.go Updated tests to use default tree options, removed obsolete functions, and added concurrent database operation tests.
v2/logger.go Introduced NewDebugLogger using slog to support debug-level logging.
v2/metrics/metrics.go, v2/multitree.go, v2/sqlite.go, v2/sqlite_writer.go, v2/tree.go Enhanced metrics handling with new structures and methods; refactored tree creation, multi-tree operations, and SQLite interactions for better metrics integration and error handling.
v2/pool_test.go Removed obsolete test for NodePool functionality.
v2/migrate/go.mod Updated Go version to 1.21.0 and added toolchain specification for Go 1.23.6.
v2/sqlite_batch.go Added metrics field to sqliteBatch struct and modified return value of saveLeaves method.

Suggested labels

backport/v1.x

Suggested reviewers

  • tac0turtle
  • julienrbrt

Poem

I'm a rabbit, hopping with glee,
New benchmarks and metrics for all to see.
My code carrots crunch with each run,
Logging and testing—oh what fun!
Cheers from CodeRabbit, happy as can be!
🐇🥕✨

Warning

There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure.

🔧 golangci-lint (1.62.2)

level=warning msg="The linter 'exportloopref' is deprecated (since v1.60.2) due to: Since Go1.22 (loopvar) this linter is no longer relevant. Replaced by copyloopvar."
level=warning msg="[runner] Can't run linter goanalysis_metalinter: buildir: failed to load package sqlite3: could not load export data: no export data for "github.com/bvinc/go-sqlite-lite/sqlite3""
level=error msg="Running error: can't run linter goanalysis_metalinter\nbuildir: failed to load package sqlite3: could not load export data: no export data for "github.com/bvinc/go-sqlite-lite/sqlite3""

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@kocubinski kocubinski mentioned this pull request Feb 13, 2025
2 tasks
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
v2/sqlite_writer.go (1)

443-464: 🛠️ Refactor suggestion

Consider adding transaction rollback on error.

The save operation should consider rolling back the transaction if an error occurs during the save process.

 func (w *sqlWriter) saveTree(tree *Tree) error {
     defer tree.metrics.MeasureSince(time.Now(), metricsNamespace, "db_write")
+    var rollback bool
+    defer func() {
+        if rollback {
+            if err := tree.sql.RollbackTx(); err != nil {
+                tree.logger.Error("failed to rollback transaction", "error", err)
+            }
+        }
+    }()
     
     batch := &sqliteBatch{
         sql:    tree.sql,
         tree:   tree,
         size:   200_000,
         logger: w.sql.logger,
     }
     saveSig := &saveSignal{batch: batch, root: tree.root, version: tree.version, wantCheckpoint: tree.shouldCheckpoint}
     w.treeCh <- saveSig
     w.leafCh <- saveSig
     treeResult := <-w.treeResult
     leafResult := <-w.leafResult
     tree.metrics.IncrCounter(float32(len(tree.leaves)), metricsNamespace, "db_write_leaf")
 
     err := errors.Join(treeResult.err, leafResult.err)
+    if err != nil {
+        rollback = true
+    }
     return err
 }
🧹 Nitpick comments (13)
v2/metrics/metrics.go (3)

48-83: **Silent ignoring of unexpected key arguments **

The IncrCounter method returns immediately if len(keys) != 2, silently ignoring any extra or fewer arguments. This might unintentionally hide incorrect usage. Consider at least logging a warning in that branch or documenting the requirement for exactly two keys.

 if len(keys) != 2 {
-    return
+    // Consider logging or handling the unexpected case:
+    // log.Printf("IncrCounter called with %d keys, expected 2", len(keys))
    return
 }

145-178: **Automatic reset of metrics in QueryReport **

Calling s.SetQueryZero() within QueryReport unexpectedly clears metrics data right after reporting. If consumers prefer to preserve historical stats, consider making the reset optional or moving it outside QueryReport.

 func (s *StructMetrics) QueryReport(bins int) error {
     ...
     // after printing stats
-    s.SetQueryZero()
+    // optionally reset metrics or rely on caller to reset
     return nil
 }

191-202: **Add method concurrency caution **

The Add method appends slices and updates totals, which can lead to data races if multiple goroutines call it concurrently. Consider adding a mutex or clarifying single-thread access.

v2/sqlite_test.go (4)

12-12: **Experimental golang.org/x/exp/rand usage **

You're using "golang.org/x/exp/rand" for randomness. Although it can be suitable for certain cases, note that it's an experimental package. Consider "math/rand" unless specific "exp/rand" features are required.


49-63: **Transaction batch insertion logic **

Inserting data in large batches (200,000 records) helps performance. If concurrency is expanded in the future, ensure each goroutine uses its own transaction or synchronization protocol. Otherwise, no immediate concerns here.


158-211: **Test_ConcurrentDetach concurrency issue **

This test is skipped, but it demonstrates concurrency with DETACH. The panic is known. If solving it in production is important, consider rearchitecting the detach or restricting concurrent reads. Let us know if you’d like assistance investigating further or proposing a fix.


213-267: **Test_ConcurrentQuery is skipped due to panic **

Similar concurrency/panic scenario. If concurrency usage is needed, we can research defensive query patterns or alternative concurrency-safe solutions. Let us know if you’d like further help.

v2/sqlite.go (2)

313-316: Adequate instrumentation in getLeaf.

Measuring timings with MeasureSince and incrementing counters is a solid approach to monitor database “get leaf” performance. Ensure concurrency safety if multiple goroutines may invoke the same statements (or at least confirm the library’s concurrency guarantees).


710-712: Bitwise approach to detect leaf sequences.

Using seq&(1<<31) != 0 is a concise way to differentiate leaf nodes by sequence number, but remember to document this bitwise approach so it’s clear why bit 31 marks leaves.

v2/cmd/bench/bench.go (2)

48-61: Consider handling deferred cleanup in a more robust way.

The CPU profile cleanup could be more robust by checking for errors during file closure.

-				defer func() {
-					pprof.StopCPUProfile()
-					f.Close()
-				}()
+				defer func() {
+					pprof.StopCPUProfile()
+					if err := f.Close(); err != nil {
+						logger.Error("failed to close CPU profile file", "error", err)
+					}
+				}()

147-148: Implement missing metrics methods.

The IncrCounter and MeasureSince methods are currently empty stubs. Consider implementing them to provide complete metrics functionality.

Would you like me to help implement these methods to track additional metrics like operation counts and durations?

Also applies to: 160-160

v2/multitree.go (2)

123-138: Consider adding connection pooling for better resource management.

The current implementation creates a new connection for each tree. Consider implementing connection pooling to manage database connections more efficiently.


313-456: Consider breaking down the TestBuild method for better maintainability.

The TestBuild method is quite long (>140 lines) and handles multiple responsibilities. Consider extracting the reporting and metrics collection logic into separate methods.

+func (mt *MultiTree) generateReport(workingBytes uint64, workingSize, writeLeaves int64, writeTime time.Duration, hashCount int64) error {
+    // ... reporting logic ...
+}

+func (mt *MultiTree) collectMetrics(tr *Tree) (uint64, int64, int64, time.Duration, int64) {
+    // ... metrics collection logic ...
+}

 func (mt *MultiTree) TestBuild(opts *testutil.TreeBuildOptions) (int64, error) {
     // ... existing setup code ...
     
-    report := func() error {
-        // ... current report logic ...
-    }
+    report := mt.generateReport
     // ... rest of the method ...
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bce70d and cccee9e.

⛔ Files ignored due to path filters (1)
  • v2/cmd/gen/gen.go is excluded by !**/gen/**
📒 Files selected for processing (12)
  • v2/cmd/bench/bench.go (1 hunks)
  • v2/cmd/root.go (2 hunks)
  • v2/iterator_test.go (3 hunks)
  • v2/logger.go (2 hunks)
  • v2/metrics/metrics.go (4 hunks)
  • v2/multitree.go (7 hunks)
  • v2/pool_test.go (0 hunks)
  • v2/sqlite.go (12 hunks)
  • v2/sqlite_test.go (7 hunks)
  • v2/sqlite_writer.go (3 hunks)
  • v2/tree.go (16 hunks)
  • v2/tree_test.go (11 hunks)
💤 Files with no reviewable changes (1)
  • v2/pool_test.go
🧰 Additional context used
🪛 golangci-lint (1.62.2)
v2/cmd/bench/bench.go

62-62: undefined: iavl

(typecheck)


73-73: undefined: iavl

(typecheck)


76-76: undefined: iavl

(typecheck)


81-81: undefined: iavl

(typecheck)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Test
  • GitHub Check: Analyze (go)
  • GitHub Check: Benchmarks
🔇 Additional comments (57)
v2/metrics/metrics.go (7)

23-26: **Interface assignment for compile-time checks **

These lines confirm that both StructMetrics and NilMetrics correctly implement the Proxy interface at compile time. No issues here.


28-35: **NilMetrics no-op approach is reasonable **

The NilMetrics struct provides no-op methods, ensuring that metric calls can be safely ignored when metrics are disabled. This is a straightforward and useful pattern for conditional metric collection.


36-39: **StructMetrics embedding design **

StructMetrics embeds both TreeMetrics and DbMetrics as pointers, which allows aggregated metric handling for tree- and database-related operations. This design is concise and clear.


41-46: **Ensure pointers are properly initialized **

The constructor NewStructMetrics correctly initializes the embedded pointers to TreeMetrics and DbMetrics. This is a sound approach that reduces the chance of nil-pointer references.


85-85: **SetGauge method is intentionally a no-op **

Having a no-op for SetGauge is acceptable if gauge metrics are not needed yet. This preserves the Proxy interface without forcing the user to handle gauge-based metrics.


114-114: **Added TreeHash metric **

TreeHash is now tracked as an int64. Ensure that it’s updated or reported consistently if needed. No immediate concerns here.


182-189: **SetQueryZero concurrency note **

SetQueryZero resets various slice fields to nil and integer fields to 0. If other goroutines are updating these metrics at the same time, this might lead to data races. Ensure this reset call is guarded by locks or that usage is strictly single-threaded.

v2/sqlite_test.go (8)

4-4: **sync import introduced for concurrency usage **

Importing "sync" is appropriate given the wait groups and goroutines below.


16-17: **Initialization code for TestBuildSqlite **

Using t.TempDir() for test isolation and calling sql.nextShard(1) as part of setup is logical. Requiring no error is good practice. No issues spotted.

Also applies to: 31-34


35-46: **Schema creation and indexing approach **

Creating tables and indexes up front is a sound approach for a large data load test. The commented-out alternative index suggests awareness of performance trade-offs. This is well-structured.


98-102: **Commit and new transaction in insertion loop **

Frequent commits and re-begins manage large data sets without massive transaction overhead. The approach is valid.


114-114: **Final commit for the insertion process **

Wrapping up the final commit ensures all data is persisted. No issues spotted.


125-125: **TestNodeKeyFormat checks **

Ensuring nk is non-nil and logging the computed key is a straightforward debug test. Good addition for clarity.

Also applies to: 127-127


148-148: **Logging mmap_size result **

Logging the result of PRAGMA mmap_size can help diagnose memory-mapped configurations. No concerns here.


269-324: **Test_ConcurrentIndexRead concurrency approach **

Queries and CREATE INDEX run in parallel. There might be potential for data race or locked database if reading/writing overlap incorrectly. Verify that the test reliably passes under repeated runs. Consider retry logic or locking if sporadic failures occur.

v2/tree_test.go (12)

32-34: **Adding MetricsProxy to TreeOptions **

Enabling metrics via NewStructMetrics is consistent with the rest of the PR. This ensures tree operations can be tracked.


43-44: **Using TestBuild on multiTree **

Swapping in multiTree.TestBuild allows a more centralized build path. The usage here is straightforward.


53-57: **Creating multiTree with metrics and checkpoint intervals **

Providing checkpoint intervals and a metrics proxy is a clean approach for frequent states and metric tracking. No concerns spotted.


64-65: **Capturing build errors **

Requiring no error after multiTree.TestBuild ensures correctness. Straightforward test coverage.


73-73: **ImportMultiTree usage **

ImportMultiTree fosters continuity by importing a prior snapshot. Good approach to test snapshot-based reinitializations.


81-83: **Completing the new build and closing **

Verifying correctness on the final build and closing the multiTree helps ensure no resource leaks. Looks good.


98-98: **NewTree creation in sqlite test variant **

This snippet ensures a minimal Tree is created with default options. No issues found.


110-110: **No-db Tree creation variant **

Creating the Tree without a DB is valid for memory-only or ephemeral tests. This expands test coverage.


162-165: **Using TempDir and NewSqliteDb for an empty tree test **

Consistent use of t.TempDir plus the default TreeOptions for an empty tree scenario is a good approach to verifying basic usage.


217-218: **Adjusting the default TreeOptions **

Updating CheckpointInterval to 100 for replay tests aligns with smaller incremental snapshots. No concerns.


278-278: **Reloading and reusing Tree with new DB instances **

Re-opening the DB across versions ensures the replay logic is robust. The approach is consistent with the partial ingestion pattern.

Also applies to: 286-286


335-335: **Logging version and hash for debugging **

Printing version and hash is helpful for verifying prune logic and incremental states. Looks fine.

v2/sqlite.go (16)

28-29: Introduce logger and metrics fields together.

Adding both Logger and Metrics fields in SqliteDbOptions helps standardize logging and metrics usage across the database. This centralizes configuration and avoids scattered handling in multiple places.


69-71: Use of NilMetrics as a safe default.

Falling back to metrics.NilMetrics{} when opts.Metrics is nil ensures the system won't break if metrics aren't explicitly configured. This is a clean approach that prevents nil references.


74-76: Graceful logger fallback.

Defaulting to a no-op logger when opts.Logger is unset avoids nil checks throughout the code and maintains robust logging behavior.


351-351: Updated getNode signature.

Removing the *sqlite3.Stmt parameter from getNode helps encapsulate query logic, simplifying usage. Verify that all callers have been updated accordingly.


353-356: Shard-based query retrieval.

Fetching the database statement via sql.getShardQuery(...) clarifies how nodes map to sharded tables. It looks correct and aligns with the new design.


357-360: Branch metrics instrumentation.

Similar to getLeaf, measuring durations and incrementing counters provides valuable insights into performance for branch lookups.


715-717: Prevent child retrieval from leaf nodes (right child).

Stopping attempts at child retrieval on leaf nodes helps avoid invalid lookups.


719-723: Conditional retrieval of node’s right child.

Shifting the logic between getLeaf and getNode based on the leaf bit is consistent with the approach in isLeafSeq. This clarifies node type resolution.


724-725: Check for missing right child.

Explicitly erroring out if node.rightNode is still nil offers an immediate signal of data corruption or missing references.


735-737: Prevent child retrieval from leaf nodes (left child).

Similar check for the left child ensures consistent enforcement that leaf nodes have no children.


739-743: Conditional retrieval of node’s left child.

Mirroring the logic from getRightNode, gating between getLeaf and getNode is a straightforward, uniform pattern.


744-745: Check for missing left child.

Same approach as the right side. Quick erroring if node.leftNode == nil is helpful for diagnosing incomplete data.


993-993: Ensure sequences reset before replay iteration.

Calling tree.resetSequences() after stepping versions helps the tree remain consistent. Confirm that external references to sequence counters are safely updated in the process.


1005-1007: Validate leafSequence upon set.

Double-checking the tree.leafSequence aligns with the persisted sequence prevents mismatched data from creeping into the tree. The error message is descriptive and helpful.


1016-1016: Validate leafSequence upon delete.

Same approach for delete operations; nice consistency.


1030-1030: Final sequence reset.

Resetting sequences once the replay completes helps keep the tree in a known state.

v2/cmd/root.go (1)

4-4: Add bench command support.

Importing "github.com/cosmos/iavl/v2/cmd/bench" and registering bench.Command() extends the CLI with a new benchmarking command. This is a clear improvement for performance testing.

Also applies to: 23-23

v2/logger.go (2)

3-6: Added imports for debug logging.

Importing log/slog and os is necessary for the new debug logger, which writes debug-level logs to stdout.


45-47: New debug logger.

Providing NewDebugLogger() for debug-level logging is beneficial for diagnostics. This is a straightforward and unobtrusive addition.

v2/iterator_test.go (1)

16-17: LGTM! Good use of default options.

Using DefaultTreeOptions() improves consistency and maintainability by centralizing tree configuration.

v2/multitree.go (1)

34-47: LGTM! Good defensive programming with metrics initialization.

The nil check for MetricsProxy ensures safe operation by providing a default no-op implementation.

v2/sqlite_writer.go (1)

44-58: LGTM! Good defensive programming with nil checks.

The addition of the nil check for SQL before accessing the logger improves robustness.

v2/tree.go (8)

14-17: LGTM! Effective sequence space partitioning.

The separation of branch and leaf sequences using leafSequenceStart = 1 << 31 effectively partitions the sequence space, making it easier to distinguish between branch and leaf nodes.


29-29: LGTM! Enhanced metrics handling and sequence management.

The changes improve the code in several ways:

  • More flexible metrics handling through metrics.Proxy
  • Clear separation of branch and leaf sequences
  • Better field naming for improved readability

Also applies to: 50-58


579-598: LGTM! Robust sequence management implementation.

The sequence management methods are well-implemented with:

  • Clear separation of branch and leaf sequences
  • Protection against leaf sequence underflow
  • Proper sequence initialization

606-610: LGTM! Consistent sequence assignment.

The method correctly assigns sequences based on node type, maintaining the sequence space partitioning.


292-292: LGTM! Consistent metrics naming.

All metrics now use the metricsNamespace constant, ensuring consistent naming across operations.

Also applies to: 311-311, 337-337, 344-344, 346-346, 374-374, 388-388, 473-473


646-670: LGTM! Clear method naming and consistent sequence usage.

The method rename to NewLeafNode better reflects its purpose, and it correctly uses the leaf sequence space.


640-640: LGTM! Appropriate sequence usage for deletes.

Using leaf sequence for delete operations is consistent with the sequence space partitioning.


1-717: LGTM! High-quality implementation of sequence space partitioning.

The changes effectively implement sequence space partitioning between branch and leaf nodes, which should help reduce I/O operations as intended in the PR objectives. The implementation is consistent, well-documented, and follows best practices.

Comment on lines +87 to +102
func (s *StructMetrics) MeasureSince(start time.Time, keys ...string) {
dur := time.Since(start)
if len(keys) != 2 {
return
}
k := keys[1]
switch k {
case "db_get":
s.QueryDurations = append(s.QueryDurations, dur)
s.QueryTime += dur
s.QueryCount++
case "db_write":
s.WriteDurations = append(s.WriteDurations, dur)
s.WriteTime += dur
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

**Potential concurrency issue with appending to slices **

MeasureSince appends durations to slices like QueryDurations and WriteDurations. If multiple goroutines call this method concurrently, it may cause data races or slice growth conflicts. Consider protecting these fields with synchronization or moving to a thread-safe structure (e.g., sync.Mutex or channels).

 func (s *StructMetrics) MeasureSince(start time.Time, keys ...string) {
+    // Potential concurrency: protect with mutex if called from multiple goroutines
     dur := time.Since(start)
     ...
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s *StructMetrics) MeasureSince(start time.Time, keys ...string) {
dur := time.Since(start)
if len(keys) != 2 {
return
}
k := keys[1]
switch k {
case "db_get":
s.QueryDurations = append(s.QueryDurations, dur)
s.QueryTime += dur
s.QueryCount++
case "db_write":
s.WriteDurations = append(s.WriteDurations, dur)
s.WriteTime += dur
}
}
func (s *StructMetrics) MeasureSince(start time.Time, keys ...string) {
// Potential concurrency: protect with mutex if called from multiple goroutines
dur := time.Since(start)
if len(keys) != 2 {
return
}
k := keys[1]
switch k {
case "db_get":
s.QueryDurations = append(s.QueryDurations, dur)
s.QueryTime += dur
s.QueryCount++
case "db_write":
s.WriteDurations = append(s.WriteDurations, dur)
s.WriteTime += dur
}
}

Comment on lines +748 to +749
return nil, fmt.Errorf("failed to get right node node_key=%s height=%d path=%s: %w",
node.leftNodeKey, node.subtreeHeight, sql.opts.Path, err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Mismatch in error message.

The error string says "failed to get right node" but references node.leftNodeKey. This likely leads to confusion. Update the error text to accurately describe the left node.

Proposed fix:

-	return nil, fmt.Errorf("failed to get right node node_key=%s height=%d path=%s: %w",
-		node.leftNodeKey, node.subtreeHeight, sql.opts.Path, err)
+	return nil, fmt.Errorf("failed to get left node node_key=%s height=%d path=%s: %w",
+		node.leftNodeKey, node.subtreeHeight, sql.opts.Path, err)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return nil, fmt.Errorf("failed to get right node node_key=%s height=%d path=%s: %w",
node.leftNodeKey, node.subtreeHeight, sql.opts.Path, err)
return nil, fmt.Errorf("failed to get left node node_key=%s height=%d path=%s: %w",
node.leftNodeKey, node.subtreeHeight, sql.opts.Path, err)

Comment on lines +137 to +143
http.Handle("/metrics", promhttp.Handler())
go func() {
err := http.ListenAndServe(":2112", nil)
if err != nil {
panic(err)
}
}()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle HTTP server errors more gracefully.

The current implementation panics on HTTP server errors, which is not ideal for a long-running service.

-	go func() {
-		err := http.ListenAndServe(":2112", nil)
-		if err != nil {
-			panic(err)
-		}
-	}()
+	go func() {
+		if err := http.ListenAndServe(":2112", nil); err != nil && !errors.Is(err, http.ErrServerClosed) {
+			logger.Error("metrics server failed", "error", err)
+		}
+	}()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
http.Handle("/metrics", promhttp.Handler())
go func() {
err := http.ListenAndServe(":2112", nil)
if err != nil {
panic(err)
}
}()
http.Handle("/metrics", promhttp.Handler())
go func() {
if err := http.ListenAndServe(":2112", nil); err != nil && !errors.Is(err, http.ErrServerClosed) {
logger.Error("metrics server failed", "error", err)
}
}()

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
v2/cmd/bench/bench.go (1)

137-143: ⚠️ Potential issue

Handle HTTP server errors more gracefully.

The current implementation panics on HTTP server errors, which is not ideal for a long-running service.

 	http.Handle("/metrics", promhttp.Handler())
 	go func() {
-		err := http.ListenAndServe(":2112", nil)
-		if err != nil {
-			panic(err)
-		}
+		if err := http.ListenAndServe(":2112", nil); err != nil && !errors.Is(err, http.ErrServerClosed) {
+			logger.Error("metrics server failed", "error", err)
+		}
 	}()
🧹 Nitpick comments (3)
v2/cmd/bench/bench.go (3)

63-67: Make tree options configurable via flags.

The tree options contain hardcoded values that should be configurable through command-line flags for better flexibility.

Add these flags to the command:

 func benchCommand() *cobra.Command {
 	var (
 		dbPath        string
 		changelogPath string
 		loadSnapshot  bool
 		usePrometheus bool
 		cpuProfile    string
+		checkpointInterval int
+		evictionDepth     int
 	)
 	// ... command setup ...
 	cmd.Flags().IntVar(&checkpointInterval, "checkpoint-interval", 80, "checkpoint interval for the tree")
 	cmd.Flags().IntVar(&evictionDepth, "eviction-depth", 22, "eviction depth for the tree")
 	// ... in RunE ...
 	treeOpts := iavl.DefaultTreeOptions()
-	treeOpts.CheckpointInterval = 80
+	treeOpts.CheckpointInterval = checkpointInterval
 	treeOpts.StateStorage = true
 	treeOpts.HeightFilter = 1
-	treeOpts.EvictionDepth = 22
+	treeOpts.EvictionDepth = evictionDepth

96-99: Remove or document commented code.

The commented-out code appears to be alternative configuration values. If these are important test cases, they should be documented or moved to a separate configuration file.

Either remove the commented code or document its purpose:

-			// opts.Until = 1_000
-			// opts.UntilHash = "557663181d9ab97882ecfc6538e3b4cfe31cd805222fae905c4b4f4403ca5cda"
+			// Test case for larger dataset:
+			// opts.Until = 1_000
+			// opts.UntilHash = "557663181d9ab97882ecfc6538e3b4cfe31cd805222fae905c4b4f4403ca5cda"

147-148: Document empty method implementations.

Empty method implementations should be documented to explain why they're not implemented.

+// IncrCounter is not implemented as it's not needed for the current metrics collection
 func (p *prometheusMetricsProxy) IncrCounter(_ float32, _ ...string) {
 }

+// MeasureSince is not implemented as it's not needed for the current metrics collection
 func (p *prometheusMetricsProxy) MeasureSince(_ time.Time, _ ...string) {}

Also applies to: 160-160

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cccee9e and 10fffef.

📒 Files selected for processing (1)
  • v2/cmd/bench/bench.go (1 hunks)
🧰 Additional context used
🪛 golangci-lint (1.62.2)
v2/cmd/bench/bench.go

62-62: undefined: iavl

(typecheck)


73-73: undefined: iavl

(typecheck)


76-76: undefined: iavl

(typecheck)


81-81: undefined: iavl

(typecheck)

Comment on lines +150 to +158
func (p *prometheusMetricsProxy) SetGauge(val float32, keys ...string) {
k := keys[1]
switch k {
case "working_size":
p.workingSize.Set(float64(val))
case "working_bytes":
p.workingBytes.Set(float64(val))
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make SetGauge more robust.

The current implementation assumes the keys slice has at least 2 elements and doesn't validate the key value.

 func (p *prometheusMetricsProxy) SetGauge(val float32, keys ...string) {
+	if len(keys) < 2 {
+		return
+	}
 	k := keys[1]
 	switch k {
 	case "working_size":
 		p.workingSize.Set(float64(val))
 	case "working_bytes":
 		p.workingBytes.Set(float64(val))
+	default:
+		// Log unexpected key for debugging
+		logger.Debug("unexpected metric key", "key", k)
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (p *prometheusMetricsProxy) SetGauge(val float32, keys ...string) {
k := keys[1]
switch k {
case "working_size":
p.workingSize.Set(float64(val))
case "working_bytes":
p.workingBytes.Set(float64(val))
}
}
func (p *prometheusMetricsProxy) SetGauge(val float32, keys ...string) {
if len(keys) < 2 {
return
}
k := keys[1]
switch k {
case "working_size":
p.workingSize.Set(float64(val))
case "working_bytes":
p.workingBytes.Set(float64(val))
default:
// Log unexpected key for debugging
logger.Debug("unexpected metric key", "key", k)
}
}

Comment on lines +111 to +116
if err := cmd.MarkFlagRequired("changelog"); err != nil {
panic(err)
}
if err := cmd.MarkFlagRequired("db"); err != nil {
panic(err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace panic with proper error handling in flag validation.

Using panic for flag validation is not a good practice. Since this is happening during command initialization, return an error instead.

-	if err := cmd.MarkFlagRequired("changelog"); err != nil {
-		panic(err)
-	}
-	if err := cmd.MarkFlagRequired("db"); err != nil {
-		panic(err)
-	}
+	for _, flag := range []string{"changelog", "db"} {
+		if err := cmd.MarkFlagRequired(flag); err != nil {
+			return nil
+		}
+	}

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
v2/metrics/metrics.go (1)

89-104: ⚠️ Potential issue

Potential concurrency issue with slice operations.

The MeasureSince method appends to slices (QueryDurations and WriteDurations) without synchronization. This could lead to data races if called from multiple goroutines.

Consider protecting the slice operations with a mutex:

+type StructMetrics struct {
+    *TreeMetrics
+    *DbMetrics
+    mu sync.Mutex
+}

 func (s *StructMetrics) MeasureSince(start time.Time, keys ...string) {
+    s.mu.Lock()
+    defer s.mu.Unlock()
     dur := time.Since(start)
     if len(keys) != 2 {
         return
     }
     k := keys[1]
     switch k {
     case "db_get":
         s.QueryDurations = append(s.QueryDurations, dur)
         s.QueryTime += dur
         s.QueryCount++
     case "db_write":
         s.WriteDurations = append(s.WriteDurations, dur)
         s.WriteTime += dur
     }
 }
🧹 Nitpick comments (7)
v2/sqlite_batch.go (1)

227-230: Improve error handling in saveLeaves method.

The method now correctly returns the leaf count instead of 0 when the index creation fails. However, consider adding error wrapping to provide more context about where the error occurred.

-		return b.leafCount, err
+		return b.leafCount, fmt.Errorf("failed to create leaf index: %w", err)
v2/sqlite_writer.go (1)

467-471: Consider addressing the TODO comment.

There's an open TODO comment about unifying the delete approach between tree and leaf operations. This could be an opportunity for optimization.

Would you like me to analyze both approaches and propose a unified solution that optimizes the delete operations?

v2/cmd/bench/bench.go (4)

63-67: Make tree options configurable via flags.

The tree options are currently hardcoded. Consider making these configurable via command-line flags for better flexibility.

+	cmd.Flags().IntVar(&checkpointInterval, "checkpoint-interval", 80, "checkpoint interval")
+	cmd.Flags().BoolVar(&stateStorage, "state-storage", true, "enable state storage")
+	cmd.Flags().IntVar(&heightFilter, "height-filter", 1, "height filter")
+	cmd.Flags().IntVar(&evictionDepth, "eviction-depth", 22, "eviction depth")

96-99: Remove commented out code.

The commented out code should be removed as it adds noise to the codebase. If these values are important, consider making them configurable via flags.


48-103: Refactor long RunE function for better maintainability.

The RunE function is handling multiple responsibilities. Consider breaking it down into smaller, focused functions:

  • setupCPUProfiling
  • createMultiTree
  • setupBenchmarkOptions
🧰 Tools
🪛 golangci-lint (1.62.2)

62-62: undefined: iavl

(typecheck)


73-73: undefined: iavl

(typecheck)


76-76: undefined: iavl

(typecheck)


81-81: undefined: iavl

(typecheck)


147-148: Document empty method implementations.

The empty implementations of IncrCounter and MeasureSince should be documented to explain why they are not implemented.

+// IncrCounter is not implemented as it's not needed for the current metrics collection
 func (p *prometheusMetricsProxy) IncrCounter(_ float32, _ ...string) {
 }

+// MeasureSince is not implemented as it's not needed for the current metrics collection
 func (p *prometheusMetricsProxy) MeasureSince(_ time.Time, _ ...string) {}

Also applies to: 160-160

v2/multitree.go (1)

313-457: Consider adding progress tracking and graceful shutdown.

The TestBuild method performs long-running operations but lacks:

  1. Progress tracking for partial completions
  2. Context for cancellation
  3. Graceful shutdown mechanism

Consider refactoring to add these features:

-func (mt *MultiTree) TestBuild(opts *testutil.TreeBuildOptions) (int64, error) {
+func (mt *MultiTree) TestBuild(ctx context.Context, opts *testutil.TreeBuildOptions) (int64, error) {
     var (
         version  int64
         err      error
         cnt      = int64(1)
     )
     
+    // Track progress
+    progress := &atomic.Int64{}
+    defer func() {
+        mt.logger.Info("Build progress", "completed", progress.Load())
+    }()
     
     for ; itr.Valid(); err = itr.Next() {
+        select {
+        case <-ctx.Done():
+            return cnt, ctx.Err()
+        default:
+        }
         
         if err != nil {
             return cnt, err
         }
         // ... rest of the code ...
+        progress.Add(1)
     }
     return cnt, nil
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10fffef and 404c50e.

📒 Files selected for processing (6)
  • v2/cmd/bench/bench.go (1 hunks)
  • v2/metrics/metrics.go (4 hunks)
  • v2/migrate/go.mod (2 hunks)
  • v2/multitree.go (7 hunks)
  • v2/sqlite_batch.go (2 hunks)
  • v2/sqlite_writer.go (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • v2/migrate/go.mod
🧰 Additional context used
🪛 golangci-lint (1.62.2)
v2/cmd/bench/bench.go

62-62: undefined: iavl

(typecheck)


73-73: undefined: iavl

(typecheck)


76-76: undefined: iavl

(typecheck)


81-81: undefined: iavl

(typecheck)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Benchmarks
  • GitHub Check: Analyze (go)
  • GitHub Check: Test
🔇 Additional comments (8)
v2/sqlite_batch.go (1)

13-17: LGTM! Good addition of metrics field.

The addition of the metrics field to the sqliteBatch struct is well-placed and follows the pattern of other fields. This enables proper metrics collection for batch operations.

v2/sqlite_writer.go (2)

44-58: LGTM! Improved nil safety in logger initialization.

The changes add proper nil checking before assigning the logger, which prevents potential nil pointer dereferences. This is a good defensive programming practice.


443-461: LGTM! Enhanced metrics collection.

The changes improve metrics collection by:

  1. Using deferred measurement for overall DB write time
  2. Adding separate counters for leaf and branch writes
  3. Using consistent metric namespacing
v2/cmd/bench/bench.go (3)

111-116: Replace panic with proper error handling in flag validation.

Using panic for flag validation is not a good practice. Since this is happening during command initialization, return an error instead.


137-143: Handle HTTP server errors more gracefully.

The current implementation panics on HTTP server errors, which is not ideal for a long-running service.


150-158: Make SetGauge more robust.

The current implementation assumes the keys slice has at least 2 elements and doesn't validate the key value.

v2/multitree.go (2)

34-37: LGTM! Good defensive programming.

The initialization of MetricsProxy with NilMetrics when it's nil is a good practice to avoid nil pointer dereferences.


123-138:

❌ Incorrect review comment

Consider adding error handling for pool initialization.

The newTree method creates a new node pool but doesn't handle potential initialization errors.

Let's verify if NewNodePool can return an error:


🏁 Script executed:

#!/bin/bash
# Description: Check if NewNodePool has error handling
ast-grep --pattern 'func NewNodePool() (*NodePool, error)'

Length of output: 60


Node pool initialization error handling is not needed

After verifying, it appears that the NewNodePool function does not return an error (its signature is likely func NewNodePool() *NodePool), so there's no possibility of an error that would need handling in the newTree method. Please disregard the suggestion to add error handling for pool initialization.

Likely an incorrect or invalid review comment.

Comment on lines +194 to 206
func (s *StructMetrics) Add(os *StructMetrics) {
s.WriteDurations = append(s.WriteDurations, os.WriteDurations...)
s.WriteTime += os.WriteTime
s.WriteLeaves += os.WriteLeaves
s.WriteBranch += os.WriteBranch

s.QueryDurations = append(s.QueryDurations, os.QueryDurations...)
s.QueryTime += os.QueryTime
s.QueryCount += os.QueryCount
s.QueryLeafMiss += os.QueryLeafMiss
s.QueryLeafCount += os.QueryLeafCount
s.QueryBranchCount += os.QueryBranchCount
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add synchronization to prevent data races.

The Add method performs concurrent slice operations and counter updates without synchronization.

Apply mutex protection to prevent data races:

 func (s *StructMetrics) Add(os *StructMetrics) {
+    s.mu.Lock()
+    defer s.mu.Unlock()
+    os.mu.Lock()
+    defer os.mu.Unlock()
     s.WriteDurations = append(s.WriteDurations, os.WriteDurations...)
     s.WriteTime += os.WriteTime
     s.WriteLeaves += os.WriteLeaves
     s.WriteBranch += os.WriteBranch

     s.QueryDurations = append(s.QueryDurations, os.QueryDurations...)
     s.QueryTime += os.QueryTime
     s.QueryCount += os.QueryCount
     s.QueryLeafMiss += os.QueryLeafMiss
     s.QueryLeafCount += os.QueryLeafCount
     s.QueryBranchCount += os.QueryBranchCount
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s *StructMetrics) Add(os *StructMetrics) {
s.WriteDurations = append(s.WriteDurations, os.WriteDurations...)
s.WriteTime += os.WriteTime
s.WriteLeaves += os.WriteLeaves
s.WriteBranch += os.WriteBranch
s.QueryDurations = append(s.QueryDurations, os.QueryDurations...)
s.QueryTime += os.QueryTime
s.QueryCount += os.QueryCount
s.QueryLeafMiss += os.QueryLeafMiss
s.QueryLeafCount += os.QueryLeafCount
s.QueryBranchCount += os.QueryBranchCount
}
func (s *StructMetrics) Add(os *StructMetrics) {
s.mu.Lock()
defer s.mu.Unlock()
os.mu.Lock()
defer os.mu.Unlock()
s.WriteDurations = append(s.WriteDurations, os.WriteDurations...)
s.WriteTime += os.WriteTime
s.WriteLeaves += os.WriteLeaves
s.WriteBranch += os.WriteBranch
s.QueryDurations = append(s.QueryDurations, os.QueryDurations...)
s.QueryTime += os.QueryTime
s.QueryCount += os.QueryCount
s.QueryLeafMiss += os.QueryLeafMiss
s.QueryLeafCount += os.QueryLeafCount
s.QueryBranchCount += os.QueryBranchCount
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant