-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Trino v2 client protocol
This is a draft proposal and will change without notice. Do not implement clients against this protocol. Continue to use the v1 protocol .
- Ability to get progress independently from data
- Binary format for payload
- Parallel reads from client
- Reads from multiple clients
- Generalized types
- Authentication (username+password, kerberos, etc)
- Transport encryption
- Column and row oriented formats (or a format that is easy to pivot)
- Multiple representations of a value (e.g. HyperLogLog could have cardinality as an integer, a description of the struct as a string and the full data structure as binary)
- Multiple result sets?
- Control information, like properties, prepared statements, etc is now passed as part of json request. Previously it was passed in http headers which imposed size limits.
- Query stats and data are now coming through different endpoints, possibly on different nodes (see next point). Previously, they were part of the same json response sent from a coordinator.
- Data download is now done from worker(s). As opposed to going through the coordinator.
- Data download can be done in parallel. Previously there was only one download channel which is client talking to coordinator. New protocol supports download from several endpoints on different workers.
- New protocol supports message types other than json for creating queries and downloading data. Current implementation supports only json though.
- "Update count" field in the result will no longer be populated.
Issue a POST request to /v2/statement
. Request must include Content-Type
header set to application/json
. Accept
header must be set to application/json
.
Request json structure:
{
"session": {
"user": "bob",
"source": "cli",
"clientInfo": "junk",
"catalog": "tpch",
"schema": "sf10",
"timeZone": "UTC",
"language": "en-US",
"transactionId": "tx123",
"properties": {
"query_max_run_time": "30s",
"hive.bucket_execution_enabled": "true"
},
"preparedStatements": {
"abc": "data",
"xyz": "data"
}
},
"query": "SELECT 1"
}
Response json structure:
{
"id": "query12345",
"infoUri": "http://...",
"nextUri": "http://...",
"nextUriDone": true,
"finalUri": "..send delete to.."
"columns": [...],
"dataUris": [...],
"actions": {
"setTransactionId": "tx123",
"clearTransactionId": true,
"setSessionProperties": {
"query_max_run_time": "30s",
"hive.bucket_execution_enabled": "true"
},
"clearSessionProperties": [
"query_max_run_time",
"hive.bucket_execution_enabled"
],
"addPreparedStatements": {
"abc": "data",
"xyz": "data"
},
"deallocatePreparedStatements": [
"abc",
"xyz"
]
}
}
Compared to v1 protocol new fields are "dataUris" and "actions".
A client must follow nextUri
link until it becomes unavailable. Status structure is the same as "response" structure described above. Note, that response will only contain status information and no data. Once dataUris
filed is available the client needs to send GET
requests to specified URLs. Note, that dataUris
field might be sent several times - when new URIs become available. However, each time the list will be cumulative, so it'll contain URIs sent in previous responses. Client needs to de-duplicate them.
When nextUri
becomes unavailable the client must explicitly close the query by sending DELETE
request to the last available nextUri
link.
When issuing GET
request to data url the client must set Accept
header to application/json
. The structure of the data response is the following:
{
"data": []
}
Next uri will be set in X-Presto-Data-Next-Uri
header. The client must follow the next uri until it becomes unavailable.
Note, that no special handling is necessary to close data download. Once all data is downloaded next uri will point the client to a closing url which is protocol implementation detail.
In order to use v2 protocol in Trino the following config property needs to be set:
experimental.enable-client-protocol-v2=true
When launching CLI pass the following command line argument:
trino —protocol-version v2