Skip to content

Trino v2 client protocol

Praveen Krishna edited this page Jan 3, 2021 · 2 revisions

WARNING! THIS IS A DRAFT.

This is a draft proposal and will change without notice. Do not implement clients against this protocol. Continue to use the v1 protocol .

Considerations

  • Ability to get progress independently from data
  • Binary format for payload
  • Parallel reads from client
  • Reads from multiple clients
  • Generalized types
  • Authentication (username+password, kerberos, etc)
  • Transport encryption
  • Column and row oriented formats (or a format that is easy to pivot)
  • Multiple representations of a value (e.g. HyperLogLog could have cardinality as an integer, a description of the struct as a string and the full data structure as binary)
  • Multiple result sets?

Key differences between v1 (current) and v2 (new) client protocols:

  • Control information, like properties, prepared statements, etc is now passed as part of json request. Previously it was passed in http headers which imposed size limits.
  • Query stats and data are now coming through different endpoints, possibly on different nodes (see next point). Previously, they were part of the same json response sent from a coordinator.
  • Data download is now done from worker(s). As opposed to going through the coordinator.
  • Data download can be done in parallel. Previously there was only one download channel which is client talking to coordinator. New protocol supports download from several endpoints on different workers.
  • New protocol supports message types other than json for creating queries and downloading data. Current implementation supports only json though.
  • "Update count" field in the result will no longer be populated.

Protocol description

Submit a query

Issue a POST request to /v2/statement. Request must include Content-Type header set to application/json. Accept header must be set to application/json.

Request json structure:

{
    "session": {
        "user": "bob",
        "source": "cli",
        "clientInfo": "junk",
        "catalog": "tpch",
        "schema": "sf10",
        "timeZone": "UTC",
        "language": "en-US",
        "transactionId": "tx123",
        "properties": {
            "query_max_run_time": "30s",
            "hive.bucket_execution_enabled": "true"
        },
        "preparedStatements": {
            "abc": "data",
            "xyz": "data"
        }
    },
    "query": "SELECT 1"
}

Response json structure:

{
    "id": "query12345",
    "infoUri": "http://...",
    "nextUri": "http://...",
    "nextUriDone": true,
    "finalUri": "..send delete to.."
    "columns": [...],
    "dataUris": [...],
    "actions": {
        "setTransactionId": "tx123",
        "clearTransactionId": true,
        "setSessionProperties": {
            "query_max_run_time": "30s",
            "hive.bucket_execution_enabled": "true"
        },
        "clearSessionProperties": [
            "query_max_run_time",
            "hive.bucket_execution_enabled"
        ],
        "addPreparedStatements": {
            "abc": "data",
            "xyz": "data"
        },
        "deallocatePreparedStatements": [
            "abc",
            "xyz"
        ]
    }
}

Compared to v1 protocol new fields are "dataUris" and "actions".

Get status

A client must follow nextUri link until it becomes unavailable. Status structure is the same as "response" structure described above. Note, that response will only contain status information and no data. Once dataUris filed is available the client needs to send GET requests to specified URLs. Note, that dataUris field might be sent several times - when new URIs become available. However, each time the list will be cumulative, so it'll contain URIs sent in previous responses. Client needs to de-duplicate them. When nextUri becomes unavailable the client must explicitly close the query by sending DELETE request to the last available nextUri link.

Get data

When issuing GET request to data url the client must set Accept header to application/json. The structure of the data response is the following:

{
    "data": []
}

Next uri will be set in X-Presto-Data-Next-Uri header. The client must follow the next uri until it becomes unavailable. Note, that no special handling is necessary to close data download. Once all data is downloaded next uri will point the client to a closing url which is protocol implementation detail.

Usage in Trino CLI

In order to use v2 protocol in Trino the following config property needs to be set:

experimental.enable-client-protocol-v2=true

When launching CLI pass the following command line argument:

trino —protocol-version v2