Skip to content

Releases: streamingfast/substreams-sink-sql

v2.5.1

01 Sep 18:43
Compare
Choose a tag to compare

This is a bug fix release containing a fix for inserting rows into a table for which no primary key constraint exist. For now, we still requires internally that your provide an id in your DatabaseChange of your row, a future update will lift that limitations.

v2.5.0

01 Sep 01:51
Compare
Choose a tag to compare

Highlights

This releases brings improvements to reported progress message while your Substreams executes which should greatly enhanced progression tracking

Note

Stay tuned, we are planning even more useful progression tracking now that we've updated progression data sent back to the client!

This releases also introduces a new mode to dump data in the database at high speed, useful for large amount of data insertion.

Substreams Progress Messages

Bumped substreams-sink v0.3.1 and substreams to v1.1.12 to support the new progress message format. Progression now relates to stages instead of modules. You can get stage information using the substreams info command starting from version v1.1.12.

Important

This client only support progress messages sent from a server using substreams version >=v1.1.12

Changed Prometheus Metrics

  • substreams_sink_progress_message removed in favor of substreams_sink_progress_message_total_processed_blocks
  • substreams_sink_progress_message_last_end_block removed in favor of substreams_sink_progress_message_last_block (per stage)

Added Prometheus Metrics

  • Added substreams_sink_progress_message_last_contiguous_block (per stage)
  • Added substreams_sink_progress_message_running_jobs(per stage)

New injection method

A new injection method has been added to this substreams-sink-postgres release. It's a 2 steps method that leverage COPY FROM SQL operations to inject at high speed a great quantity of data.

Note

This method will be useful if you insert a lot of data into the database. If the standard ingestion speed satisfy your needs, continue to use it, the new feature is an advanced use case.

See the High Throughput Injection section of the README.md file to check how to use it.

Added

  • Added newer method of populating the database via CSV (thanks @gusinacio!).

    Newer commands:

    • generate-csv: Generates CSVs for each table
    • inject-csv: Injects generated CSV rows for <table>

v2.4.0

20 Jul 20:17
Compare
Choose a tag to compare

Changed

  • gRPC InvalidArgument error(s) are not retried anymore like specifying and invalid start block or argument in your request.

  • Breaking Flag shorthand -p for --plaintext has been re-assigned to Substreams params definition, to align with substreams run/gui on that aspect. There is no shorthand anymore for --plaintext.

    If you were using before -p, please convert to --plaintext.

    Note We expect that this is affecting very few users as --plaintext is usually used only on developers machine.

Added

  • Added support for --params, -p (can be repeated multiple times) on the form -p <module>=<value>.

v2.3.4

13 Jul 12:07
Compare
Choose a tag to compare

Added

  • Added logging of new Session received values (linear_handoff_block, max_parallel_workers and resolved_start_block).

  • Added --header, -H (can be repeated multiple times) flag to pass extra headers to the server.

Changed

  • Now reporting available columns when an unknown column is encountered.

v2.3.3

26 Jun 15:13
Compare
Choose a tag to compare

Fixed

  • Batches written to the database now respects the insertion ordering has received from your Substreams. This fixes for example auto-increment to be as defined on the chain.

v2.3.2

14 Jun 19:57
Compare
Choose a tag to compare
  • Fixed problem where string had unicode character and caused pq: invalid message format

v2.3.1

13 Jun 18:03
Compare
Choose a tag to compare

Fixed

  • The substreams-sink-postgres setup command has been fixed to use the correct schema defined by the DSN.

  • The cursors table suggestion when the table is not found has been updated to be in-sync with table used in substreams-sink-postgres setup.

Changed

v2.3.0

09 Jun 14:16
Compare
Choose a tag to compare

Added

  • Added Composite keys support following the update in substreams-database-change

    The code was updated to use oneOf primary keys (pk and composite) to keep backward compatibility. Therefore, Substreams using older versions of DatabaseChange can still use newer versions of postgres-sink without problems. To use composite key, define your schema to use Postgres composite keys, update to latest version of substreams-database-changes and update your code to send a CompositePrimaryKey key object for the primary_key field of the TableChange message.

  • Added escape to value in case the postgres data type is BYTES. We now escape the byte array.

Fixed

  • Added back support for old Substreams Database Change Protobuf package id sf.substreams.database.v1.DatabaseChanges.

v2.2.1

30 May 15:05
Compare
Choose a tag to compare

Changed

  • Reduced the amount of allocations and escaping performed which should increase ingestion speed, this will be more visible for Substreams where a lot of entities and columns are processed.

Fixed

  • The schema is correctly respected now for the the cursors table.

v2.2.0

29 May 19:39
Compare
Choose a tag to compare

Highlights

Cursor Bug Fix

It appeared that the cursor was not saved properly until the first graceful shutdown of substreams-sink-postgres. Furthermore, the on exit save was actually wrong because it was saving the cursor without flushing accumulated data which is wrong (e.g. that we had N blocks in memory unflushed and a cursor, and we were saving this cursor to the database without having flushed the in memory logic).

This bug has been introduced in v2.0.0 by mistake which means if we synced a new database with v2.0.0+, there is a good chance your are actually missing some data in your database. It's highly recommended that you re-synchronize your database from scratch.

Note If your are using the same .spkg that you are using right now, database ingestion from scratch should go at very high speed because you will be reading from previously cached output, so the bottleneck should be network and the database write performance.

Behavior on .spkg update

In the release, we change a big how cursor is associated to the <module>'s hash in the database and how it's stored.

Prior this version, when loading the cursor back from the database on restart, we were retrieving the cursor associated to the <module>'s hash received by substreams-sink-postgres run. The consequence of that is that if you change the .spkg version you were sinking with, on restart we would find no cursor since the module's hash of this new .spkg would have changed and which you mean a full sync back would be happening because we would start without a cursor.

This silent behavior is problematic because it could seen like the cursor was lost somehow while actually, we just picked up a new one from scratch because the .spkg changed.

This release brings in a new flag substreams-sink-postgres run --on-module-hash-mistmatch=error (default value shown) where it would control how we should react to a changes in the module's hash since last run.

  • If error is used (default), it will exit with an error explaining the problem and how to fix it.
  • If warn is used, it does the same as 'ignore' but it will log a warning message when it happens.
  • If ignore is set, we pick the cursor at the highest block number and use it as the starting point. Subsequent updates to the cursor will overwrite the module hash in the database.

There is a possibility that multiple cursors exists in your database, hence why we pick the one with the highest block. If it's the case, you will be warned that multiple cursors exists. You can run substreams-sink-postgres tools cursor cleanup <manifest> <module> --dsn=<dsn> which will delete now useless cursors.

The ignore value can be used to change to a new .spkg while retaining the previous data in the database, the database schema will start to be different after a certain point where the new .spkg became active.

Added

  • Added substreams-sink-postgres run --on-module-hash-mistmatch=error to control how a change in module's hash should be handled.

Changed

  • Changed behavior of how cursor are retrieved on restart.

Fixed

  • Fixed cursor not being saved correctly until the binary exits.

  • Fixed wrong handling of updating the cursor, we were not checking if a row was updated when doing the flush operation.

  • Fixed a bug where it was possible if the sink was terminating to write a cursor for data not yet flushed. This was happening if the substreams-sink-postgres run was stopped before we ever written a cursor, which normally happens each 1000 blocks. We don't expect anybody to have been hit by this but if you are unsure, you should check data for the 1000 first blocks of you sink (for example from 11 000 000 to 11 001 000 if your module start block was 11 000 000).