Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat!: Add support for virtual statements to be executed post update #3524

Merged
merged 15 commits into from
Jan 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/concepts/macros/macro_variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,6 @@ SQLMesh provides two other predefined variables used to modify model behavior ba
* 'evaluating' - The model query logic is being evaluated.
* 'testing' - The model query logic is being evaluated in the context of a unit test.
* @gateway - A string value containing the name of the current [gateway](../../guides/connections.md).
* @this_model - A string value containing the name of the physical table the model view selects from. Typically used to create [generic audits](../audits.md#generic-audits).
* @this_model - A string value containing the name of the physical table the model view selects from. Typically used to create [generic audits](../audits.md#generic-audits). In the case of [on_virtual_update statements](../models/sql_models.md#optional-on-virtual-update-statements) it contains the qualified view name instead.
* Can be used in model definitions when SQLGlot cannot fully parse a statement and you need to reference the model's underlying physical table directly.
* Can be passed as an argument to macros that access or interact with the underlying physical table.
35 changes: 35 additions & 0 deletions docs/concepts/models/python_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,41 @@ def execute(
context.fetchdf("CREATE INDEX idx ON example.pre_post_statements (id);")
```

## Optional on-virtual-update statements

The optional on-virtual-update statements allow you to execute SQL commands after the completion of the [Virtual Update](#virtual-update).

These can be used, for example, to grant privileges on views of the virtual layer.

Similar to pre/post-statements you can set the `on_virtual_update` argument in the `@model` decorator to a list of SQL strings, SQLGlot expressions, or macro calls.

``` python linenums="1" hl_lines="8"
@model(
"db.test_model",
kind="full",
columns={
"id": "int",
"name": "text",
},
on_virtual_update=["GRANT SELECT ON VIEW @this_model TO ROLE dev_role"],
)
def execute(
context: ExecutionContext,
start: datetime,
end: datetime,
execution_time: datetime,
**kwargs: t.Any,
) -> pd.DataFrame:

return pd.DataFrame([
{"id": 1, "name": "name"}
])
```

!!! note

Table resolution for these statements occurs at the virtual layer. This means that table names, including `@this_model` macro, are resolved to their qualified view names. For instance, when running the plan in an environment named `dev`, `db.test_model` and `@this_model` would resolve to `db__dev.test_model` and not to the physical table name.

## Dependencies
In order to fetch data from an upstream model, you first get the table name using `context`'s `resolve_table` method. This returns the appropriate table name for the current runtime [environment](../environments.md):

Expand Down
29 changes: 29 additions & 0 deletions docs/concepts/models/seed_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,3 +194,32 @@ ALTER SESSION SET TIMEZONE = 'UTC';
-- These are post-statements
ALTER SESSION SET TIMEZONE = 'PST';
```

## On-virtual-update statements

Seed models also support on-virtual-update statements, which are executed after the completion of the [Virtual Update](#virtual-update).

These must be enclosed within an `ON_VIRTUAL_UPDATE_BEGIN;` ...; `ON_VIRTUAL_UPDATE_END;` block:

```sql linenums="1" hl_lines="8-13"
MODEL (
name test_db.national_holidays,
kind SEED (
path 'national_holidays.csv'
)
);

ON_VIRTUAL_UPDATE_BEGIN;
GRANT SELECT ON VIEW @this_model TO ROLE dev_role;
JINJA_STATEMENT_BEGIN;
GRANT SELECT ON VIEW {{ this_model }} TO ROLE admin_role;
JINJA_END;
ON_VIRTUAL_UPDATE_END;
```


[Jinja expressions](../macros/jinja_macros.md) can also be used within them, as demonstrated in the example above. These expressions must be properly nested within a `JINJA_STATEMENT_BEGIN;` and `JINJA_END;` block.

!!! note

Table resolution for these statements occurs at the virtual layer. This means that table names, including `@this_model` macro, are resolved to their qualified view names. For instance, when running the plan in an environment named `dev`, `db.customers` and `@this_model` would resolve to `db__dev.customers` and not to the physical table name.
38 changes: 36 additions & 2 deletions docs/concepts/models/sql_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ The SQL-based definition of SQL models is the most common one, and consists of t
* Optional pre-statements
* A single query
* Optional post-statements
* Optional on-virtual-update-statements

These models are designed to look and feel like you're simply using SQL, but they can be customized for advanced use cases.

Expand Down Expand Up @@ -90,6 +91,38 @@ MODEL (

Note that the SQL command `UNCACHE TABLE countries` inside the `@IF()` macro does **not** end with a semi-colon. Instead, the semi-colon comes after the `@IF()` macro's closing parenthesis.

### Optional on-virtual-update statements

The optional on-virtual-update statements allow you to execute SQL commands after the completion of the [Virtual Update](#virtual-update).

These can be used, for example, to grant privileges on views of the virtual layer.

These SQL statements must be enclosed within an `ON_VIRTUAL_UPDATE_BEGIN;` ...; `ON_VIRTUAL_UPDATE_END;` block like this:

```sql linenums="1" hl_lines="10-15"
MODEL (
name db.customers,
kind FULL
);

SELECT
r.id::INT
FROM raw.restaurants AS r;

ON_VIRTUAL_UPDATE_BEGIN;
GRANT SELECT ON VIEW @this_model TO ROLE role_name;
JINJA_STATEMENT_BEGIN;
GRANT SELECT ON VIEW {{ this_model }} TO ROLE admin;
JINJA_END;
ON_VIRTUAL_UPDATE_END;
```

[Jinja expressions](../macros/jinja_macros.md) can also be used within them, as demonstrated in the example above. These expressions must be properly nested within a `JINJA_STATEMENT_BEGIN;` and `JINJA_END;` block.

!!! note

Table resolution for these statements occurs at the virtual layer. This means that table names, including `@this_model` macro, are resolved to their qualified view names. For instance, when running the plan in an environment named `dev`, `db.customers` and `@this_model` would resolve to `db__dev.customers` and not to the physical table name.

### The model query

The model must contain a standalone query, which can be a single `SELECT` expression, or multiple `SELECT` expressions combined with the `UNION`, `INTERSECT`, or `EXCEPT` operators. The result of this query will be used to populate the model's table or view.
Expand All @@ -98,7 +131,7 @@ The model must contain a standalone query, which can be a single `SELECT` expres

The Python-based definition of SQL models consists of a single python function, decorated with SQLMesh's `@model` [decorator](https://wiki.python.org/moin/PythonDecorators). The decorator is required to have the `is_sql` keyword argument set to `True` to distinguish it from [Python models](./python_models.md) that return DataFrame instances.

This function's return value serves as the model's query, and it must be either a SQL string or a [SQLGlot expression](https://github.com/tobymao/sqlglot/blob/main/sqlglot/expressions.py). The `@model` decorator is used to define the model's [metadata](#MODEL-DDL) and, optionally its pre/post-statements that are also in the form of SQL strings or SQLGlot expressions.
This function's return value serves as the model's query, and it must be either a SQL string or a [SQLGlot expression](https://github.com/tobymao/sqlglot/blob/main/sqlglot/expressions.py). The `@model` decorator is used to define the model's [metadata](#MODEL-DDL) and, optionally its pre/post-statements or on-virtual-update-statements that are also in the form of SQL strings or SQLGlot expressions.

Defining a SQL model using Python can be beneficial in cases where its query is too complex to express cleanly in SQL, for example due to having many dynamic components that would require heavy use of [macros](../macros/overview/). Since Python-based models generate SQL, they support the same features as regular SQL models, such as column-level [lineage](../glossary/#lineage).

Expand All @@ -120,6 +153,7 @@ from sqlmesh.core.macros import MacroEvaluator
kind="FULL",
pre_statements=["CACHE TABLE countries AS SELECT * FROM raw.countries"],
post_statements=["UNCACHE TABLE countries"],
on_virtual_update=["GRANT SELECT ON VIEW @this_model TO ROLE dev_role"],
)
def entrypoint(evaluator: MacroEvaluator) -> str | exp.Expression:
return (
Expand All @@ -139,7 +173,7 @@ One could also define this model by simply returning a string that contained the

The `@model` decorator is the Python equivalent of the `MODEL` DDL.

In addition to model metadata and configuration information, one can also set the keyword arguments `pre_statements` and `post_statements` to a list of SQL strings and/or SQLGlot expressions to define the pre/post-statements of the model, respectively.
In addition to model metadata and configuration information, one can also set the keyword arguments `pre_statements`, `post_statements` and `on_virtual_update` to a list of SQL strings and/or SQLGlot expressions to define the pre/post-statements and on-virtual-update-statements of the model, respectively.

!!! note

Expand Down
115 changes: 99 additions & 16 deletions sqlmesh/core/dialect.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ class JinjaStatement(Jinja):
pass


class VirtualUpdateStatement(exp.Expression):
arg_types = {"expressions": True}


class ModelKind(exp.Expression):
arg_types = {"this": True, "expressions": False}

Expand Down Expand Up @@ -772,6 +776,8 @@ def _is_command_statement(command: str, tokens: t.List[Token], pos: int) -> bool
JINJA_QUERY_BEGIN = "JINJA_QUERY_BEGIN"
JINJA_STATEMENT_BEGIN = "JINJA_STATEMENT_BEGIN"
JINJA_END = "JINJA_END"
ON_VIRTUAL_UPDATE_BEGIN = "ON_VIRTUAL_UPDATE_BEGIN"
ON_VIRTUAL_UPDATE_END = "ON_VIRTUAL_UPDATE_END"


def _is_jinja_statement_begin(tokens: t.List[Token], pos: int) -> bool:
Expand All @@ -794,10 +800,24 @@ def jinja_statement(statement: str) -> JinjaStatement:
return JinjaStatement(this=exp.Literal.string(statement.strip()))


def _is_virtual_statement_begin(tokens: t.List[Token], pos: int) -> bool:
return _is_command_statement(ON_VIRTUAL_UPDATE_BEGIN, tokens, pos)


def _is_virtual_statement_end(tokens: t.List[Token], pos: int) -> bool:
return _is_command_statement(ON_VIRTUAL_UPDATE_END, tokens, pos)


def virtual_statement(statements: t.List[exp.Expression]) -> VirtualUpdateStatement:
return VirtualUpdateStatement(expressions=statements)


class ChunkType(Enum):
JINJA_QUERY = auto()
JINJA_STATEMENT = auto()
SQL = auto()
VIRTUAL_STATEMENT = auto()
VIRTUAL_JINJA_STATEMENT = auto()


def parse_one(
Expand Down Expand Up @@ -837,9 +857,15 @@ def parse(
total = len(tokens)

pos = 0
virtual = False
while pos < total:
token = tokens[pos]
if _is_jinja_end(tokens, pos) or (
if _is_virtual_statement_end(tokens, pos):
chunks[-1][0].append(token)
virtual = False
chunks.append(([], ChunkType.SQL))
pos += 2
elif _is_jinja_end(tokens, pos) or (
chunks[-1][1] == ChunkType.SQL
and token.token_type == TokenType.SEMICOLON
and pos < total - 1
Expand All @@ -850,36 +876,93 @@ def parse(
# Jinja end statement
chunks[-1][0].append(token)
pos += 2
chunks.append(([], ChunkType.SQL))
chunks.append(
(
[],
ChunkType.VIRTUAL_STATEMENT
if virtual and tokens[pos] != ON_VIRTUAL_UPDATE_END
else ChunkType.SQL,
)
)
elif _is_jinja_query_begin(tokens, pos):
chunks.append(([token], ChunkType.JINJA_QUERY))
pos += 2
elif _is_jinja_statement_begin(tokens, pos):
chunks.append(([token], ChunkType.JINJA_STATEMENT))
pos += 2
elif _is_virtual_statement_begin(tokens, pos):
chunks.append(([token], ChunkType.VIRTUAL_STATEMENT))
pos += 2
virtual = True
else:
chunks[-1][0].append(token)
pos += 1

parser = dialect.parser()
expressions: t.List[exp.Expression] = []

for chunk, chunk_type in chunks:
if chunk_type == ChunkType.SQL:
parsed_expressions: t.List[t.Optional[exp.Expression]] = (
parser.parse(chunk, sql) if into is None else parser.parse_into(into, chunk, sql)
)
for expression in parsed_expressions:
if expression:
def parse_sql_chunk(chunk: t.List[Token], meta_sql: bool = True) -> t.List[exp.Expression]:
parsed_expressions: t.List[t.Optional[exp.Expression]] = (
parser.parse(chunk, sql) if into is None else parser.parse_into(into, chunk, sql)
)
expressions = []
for expression in parsed_expressions:
if expression:
if meta_sql:
expression.meta["sql"] = parser._find_sql(chunk[0], chunk[-1])
expressions.append(expression)
else:
start, *_, end = chunk
segment = sql[start.end + 2 : end.start - 1]
factory = jinja_query if chunk_type == ChunkType.JINJA_QUERY else jinja_statement
expression = factory(segment.strip())
expressions.append(expression)
return expressions

def parse_jinja_chunk(chunk: t.List[Token], meta_sql: bool = True) -> exp.Expression:
start, *_, end = chunk
segment = sql[start.end + 2 : end.start - 1]
factory = jinja_query if chunk_type == ChunkType.JINJA_QUERY else jinja_statement
expression = factory(segment.strip())
if meta_sql:
expression.meta["sql"] = sql[start.start : end.end + 1]
expressions.append(expression)
return expression

def parse_virtual_statement(
chunks: t.List[t.Tuple[t.List[Token], ChunkType]], pos: int
) -> t.Tuple[t.List[exp.Expression], int]:
# For virtual statements we need to handle both SQL and Jinja nested blocks within the chunk
virtual_update_statements = []
start = chunks[pos][0][0].start

while (
chunks[pos - 1][0] == [] or chunks[pos - 1][0][-1].text.upper() != ON_VIRTUAL_UPDATE_END
):
chunk, chunk_type = chunks[pos]
if chunk_type == ChunkType.JINJA_STATEMENT:
virtual_update_statements.append(parse_jinja_chunk(chunk, False))
else:
virtual_update_statements.extend(
parse_sql_chunk(
chunk[int(chunk[0].text.upper() == ON_VIRTUAL_UPDATE_BEGIN) : -1], False
),
)
pos += 1

if virtual_update_statements:
statements = virtual_statement(virtual_update_statements)
end = chunk[-1].end + 1
statements.meta["sql"] = sql[start:end]
return [statements], pos

return [], pos

pos = 0
total_chunks = len(chunks)
while pos < total_chunks:
chunk, chunk_type = chunks[pos]
if chunk_type == ChunkType.VIRTUAL_STATEMENT:
virtual_expression, pos = parse_virtual_statement(chunks, pos)
expressions.extend(virtual_expression)
elif chunk_type == ChunkType.SQL:
expressions.extend(parse_sql_chunk(chunk))
else:
expressions.append(parse_jinja_chunk(chunk))
pos += 1

return expressions

Expand Down
1 change: 1 addition & 0 deletions sqlmesh/core/model/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,7 @@ def depends_on(cls: t.Type, v: t.Any, values: t.Dict[str, t.Any]) -> t.Optional[
"expressions_",
"pre_statements_",
"post_statements_",
"on_virtual_update_",
"unique_key",
mode="before",
check_fields=False,
Expand Down
2 changes: 1 addition & 1 deletion sqlmesh/core/model/decorator.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ def model(
**self.kwargs,
}

for key in ("pre_statements", "post_statements"):
for key in ("pre_statements", "post_statements", "on_virtual_update"):
statements = common_kwargs.get(key)
if statements:
common_kwargs[key] = [
Expand Down
Loading
Loading