Add experimental_allow_partial support (pydantic#10748)

Co-authored-by: hyperlint-ai[bot] <154288675+hyperlint-ai[bot]@users.noreply.github.com> Co-authored-by: sydney-runkle <[email protected]> Co-authored-by: Sydney Runkle <[email protected]>
MuhammadNizamani · Nov 4, 2024 · 678ec30 · 678ec30
1 parent b4308e0
commit 678ec30
Show file tree

Hide file tree

Showing 8 changed files with 479 additions and 112 deletions.
diff --git a/docs/concepts/experimental.md b/docs/concepts/experimental.md
@@ -149,3 +149,244 @@ This example uses plain idiomatic Python code that may be easier to understand,
 The approach you choose should really depend on your use case.
 You will have to compare verbosity, performance, ease of returning meaningful errors to your users, etc. to choose the right pattern.
 Just be mindful of abusing advanced patterns like the pipeline API just because you can.
+
+## Partial Validation
+
+Pydantic v2.10.0 introduces experimental support for "partial validation".
+
+This allows you to validate an incomplete JSON string, or a Python object representing incomplete input data.
+
+Partial validation is particularly helpful when processing the output of an LLM, where the model streams structured responses, and you may wish to begin validating the stream while you're still receiving data (e.g. to show partial data to users).
+
+!!! warning
+    Partial validation is an experimental feature and may change in future versions of Pydantic. The current implementation should be considered a proof of concept at this time and has a number of [limitations](#limitations-of-partial-validation).
+
+Partial validation can be enabled when using the three validation methods on `TypeAdapter`: [`TypeAdapter.validate_json()`][pydantic.TypeAdapter.validate_json], [`TypeAdapter.validate_python()`][pydantic.TypeAdapter.validate_python], and [`TypeAdapter.validate_strings()`][pydantic.TypeAdapter.validate_strings]. This allows you to parse and validation incomplete JSON, but also to validate Python objects created by parsing incomplete data of any format.
+
+`experiment_allow_partial` in action:
+
+```python
+from typing import List
+
+from annotated_types import MinLen
+from typing_extensions import Annotated, NotRequired, TypedDict
+
+from pydantic import TypeAdapter
+
+
+class Foobar(TypedDict):  # (1)!
+    a: int
+    b: NotRequired[float]
+    c: NotRequired[Annotated[str, MinLen(5)]]
+
+
+ta = TypeAdapter(List[Foobar])
+
+v = ta.validate_json('[{"a": 1, "b"', experimental_allow_partial=True)  # (2)!
+print(v)
+#> [{'a': 1}]
+
+v = ta.validate_json(
+    '[{"a": 1, "b": 1.0, "c": "abcd', experimental_allow_partial=True  # (3)!
+)
+print(v)
+#> [{'a': 1, 'b': 1.0}]
+
+v = ta.validate_json(
+    '[{"b": 1.0, "c": "abcde"', experimental_allow_partial=True  # (4)!
+)
+print(v)
+#> []
+
+v = ta.validate_json(
+    '[{"a": 1, "b": 1.0, "c": "abcde"},{"a": ', experimental_allow_partial=True
+)
+print(v)
+#> [{'a': 1, 'b': 1.0, 'c': 'abcde'}]
+
+v = ta.validate_python([{'a': 1}], experimental_allow_partial=True)  # (5)!
+print(v)
+#> [{'a': 1}]
+
+v = ta.validate_python(
+    [{'a': 1, 'b': 1.0, 'c': 'abcd'}], experimental_allow_partial=True  # (6)!
+)
+print(v)
+#> [{'a': 1, 'b': 1.0}]
+```
+
+1. The TypedDict `Foobar` has three field, but only `a` is required, that means that a valid instance of `Foobar` can be created even if the `b` and `c` fields are missing.
+2. Parsing JSON, the input is valid JSON up to the point where the string is truncated.
+3. In this case truncation of the input means the value of `c` (`abcd`) is invalid as input to `c` field, hence it's omitted.
+4. The `a` field is required, so validation on the only item in the list fails and is dropped.
+5. Partial validation also works with Python objects, it should have the same semantics as with JSON except of course you can't have a genuinely "incomplete" Python object.
+6. The same as above but with a Python object, `c` is dropped as it's not required and failed validation.
+
+### How Partial Validation Works
+
+Partial validation follows the zen of Pydantic — it makes no guarantees about what the input data might have been, but it does guarantee to return a valid instance of the type you required, or raise a validation error.
+
+To do this, the `experimental_allow_partial` flag enables two pieces of behavior:
+
+#### 1. Partial JSON parsing
+
+The [jiter](https://github.com/pydantic/jiter) JSON parser used by Pydantic already supports parsing partial JSON,
+`experimental_allow_partial` is simply passed to jiter via the `allow_partial` argument.
+
+!!! note
+    If you just want pure JSON parsing with support for partial JSON, you can use the [`jiter`](https://pypi.org/project/jiter/) Python library directly, or pass the `allow_partial` argument when calling [`pydantic_core.from_json`][pydantic_core.from_json].
+
+#### 2. Ignore errors in the last element of the input {#2-ignore-errors-in-last}
+
+Only having access to part of the input data means errors can commonly occur in the last element of the input data.
+
+For example:
+
+* if a string has a constraint `MinLen(5)`, when you only see part of the input, validation might fail because part of the string is missing (e.g. `{"name": "Sam` instead of `{"name": "Samuel"}`)
+* if an `int` field has a constraint `Ge(10)`, when you only see part of the input, validation might fail because the number is too small (e.g. `1` instead of `10`)
+* if a `TypedDict` field has 3 required fields, but the partial input only has two of the fields, validation would fail because some field are missing
+* etc. etc. — there are lost more cases like this
+
+The point is that if you only see part of some valid input data, validation errors can often occur in the last element of a sequence or last value of mapping.
+
+To avoid these errors breaking partial validation, Pydantic will ignore ALL errors in the last element of the input data.
+
+```py title="Errors in last element ignored"
+from typing import List
+
+from annotated_types import MinLen
+from typing_extensions import Annotated
+
+from pydantic import BaseModel, TypeAdapter
+
+
+class MyModel(BaseModel):
+    a: int
+    b: Annotated[str, MinLen(5)]
+
+
+ta = TypeAdapter(List[MyModel])
+v = ta.validate_json(
+    '[{"a": 1, "b": "12345"}, {"a": 1,',
+    experimental_allow_partial=True,
+)
+print(v)
+#> [MyModel(a=1, b='12345')]
+```
+
+### Limitations of Partial Validation
+
+#### TypeAdapter only
+
+You can only pass `experiment_allow_partial` to [`TypeAdapter`][pydantic.TypeAdapter] methods, it's not yet supported via other Pydantic entry points like [`BaseModel`][pydantic.BaseModel].
+
+#### Types supported
+
+Right now only a subset of collection validators know how to handle partial validation:
+
+- `list`
+- `set`
+- `frozenset`
+- `dict` (as in `dict[X, Y]`)
+- `TypedDict` — only non-required fields may be missing, e.g. via [`NotRequired`][typing.NotRequired] or [`total=False`][typing.TypedDict.__total__])
+
+While you can use `experimental_allow_partial` while validating against types that include other collection validators, those types will be validated "all or nothing", and partial validation will not work on more nested types.
+
+E.g. in the [above](#2-ignore-errors-in-last) example partial validation works although the second item in the list is dropped completely since `BaseModel` doesn't (yet) support partial validation.
+
+But partial validation won't work at all in the follow example because `BaseModel` doesn't support partial validation so it doesn't forward the `allow_partial` instruction down to the list validator in `b`:
+
+```py
+from typing import List
+
+from annotated_types import MinLen
+from typing_extensions import Annotated
+
+from pydantic import BaseModel, TypeAdapter, ValidationError
+
+
+class MyModel(BaseModel):
+    a: int = 1
+    b: List[Annotated[str, MinLen(5)]] = []  # (1)!
+
+
+ta = TypeAdapter(MyModel)
+try:
+    v = ta.validate_json(
+        '{"a": 1, "b": ["12345", "12', experimental_allow_partial=True
+    )
+except ValidationError as e:
+    print(e)
+    """
+    1 validation error for MyModel
+    b.1
+      String should have at least 5 characters [type=string_too_short, input_value='12', input_type=str]
+    """
+```
+
+1. The list validator for `b` doesn't get the `allow_partial` instruction passed down to it by the model validator so it doesn't know to ignore errors in the last element of the input.
+
+#### Some invalid but complete JSON will be accepted
+
+The way [jiter](https://github.com/pydantic/jiter) (the JSON parser used by Pydantic) works means it's currently not possible to differentiate between complete JSON like `{"a": 1, "b": "12"}` and incomplete JSON like `{"a": 1, "b": "12`.
+
+This means that some invalid JSON will be accepted by Pydantic when using `experimental_allow_partial`, e.g.:
+
+```py
+from annotated_types import MinLen
+from typing_extensions import Annotated, TypedDict
+
+from pydantic import TypeAdapter
+
+
+class Foobar(TypedDict, total=False):
+    a: int
+    b: Annotated[str, MinLen(5)]
+
+
+ta = TypeAdapter(Foobar)
+
+v = ta.validate_json(
+    '{"a": 1, "b": "12', experimental_allow_partial=True  # (1)!
+)
+print(v)
+#> {'a': 1}
+
+v = ta.validate_json(
+    '{"a": 1, "b": "12"}', experimental_allow_partial=True  # (2)!
+)
+print(v)
+#> {'a': 1}
+```
+
+1. This will pass validation as expected although the last field will be omitted as it failed validation.
+2. This will also pass validation since the binary representation of the JSON data passed to pydantic-core is indistinguishable from the previous case.
+
+#### Any error in the last field of the input will be ignored
+
+As described [above](#2-ignore-errors-in-last), many errors can result from truncating the input. Rather than trying to specifically ignore errors that could result from truncation, Pydantic ignores all errors in the last element of the input in partial validation mode.
+
+This means clearly invalid data will pass validation if the error is in the last field of the input:
+
+```py
+from typing import List
+
+from annotated_types import Ge
+from typing_extensions import Annotated
+
+from pydantic import TypeAdapter
+
+ta = TypeAdapter(List[Annotated[int, Ge(10)]])
+v = ta.validate_python([20, 30, 4], experimental_allow_partial=True)  # (1)!
+print(v)
+#> [20, 30]
+
+ta = TypeAdapter(List[int])
+
+v = ta.validate_python([1, 2, 'wrong'], experimental_allow_partial=True)  # (2)!
+print(v)
+#> [1, 2]
+```
+
+1. As you would expect, this will pass validation since Pydantic correctly ignores the error in the (truncated) last item.
+2. This will also pass validation since the error in the last item is ignored.
diff --git a/docs/plugins/main.py b/docs/plugins/main.py
@@ -94,7 +94,14 @@ def add_changelog() -> None:
 def add_mkdocs_run_deps() -> None:
     # set the pydantic, pydantic-core, pydantic-extra-types versions to configure for running examples in the browser
     pyproject_toml = (PROJECT_ROOT / 'pyproject.toml').read_text()
-    pydantic_core_version = re.search(r'pydantic-core==(.+?)["\']', pyproject_toml).group(1)
+    m = re.search(r'pydantic-core==(.+?)["\']', pyproject_toml)
+    if not m:
+        logger.info(
+            "Could not find pydantic-core version in pyproject.toml, this is expected if you're using a git ref"
+        )
+        return
+
+    pydantic_core_version = m.group(1)
 
     version_py = (PROJECT_ROOT / 'pydantic' / 'version.py').read_text()
     pydantic_version = re.search(r'^VERSION ?= (["\'])(.+)\1', version_py, flags=re.M).group(2)