Pydantic Puzzlers II

Posted on Posted in technical

Pydantic is a library to validate JSON documents and converts them into Python objects, based on Python Type Annotations. It works really well, it is easy to use, it supports everything we can possibly expect it to support and saves us a ton of effort.

However, there a few less intuitive aspects to it, which is what this post is about.

Mystery Int

Puzzler: What is the output of the following code?

from typing import Union
import pydantic

class Project(pydantic.BaseModel):
    value: Union[pydantic.types.StrictBool, int]

def roundtrip(value_in: str) -> None:
    """Convert to json and back """
    project_in = Project(value=value_in)
    as_json = project_in.dict(by_alias=True)
    project_out = Project(**as_json)
    print(f"input: {project_in.value}, output: {project_out.value}")

roundtrip(5)
roundtrip(True)

The answer on Python 3.7:

input: 5, output: 5
input: True, output: True

So, the round trip works exactly as expected.

On Python 3.6:

input: 5, output: 5
input: True, output: 1

The boolean also becomes an int?

Step 1: Stepping in

It took us a while to understand the problem. A first problem we encountered is that the debugger would not step into the Pydantic code anymore. Nor respond to breakpoint() or execute print statements.

We eventually figured it out:

import pydantic

print(pydantic.__file__)
[...]/pydantic/__init__.cpython-36m-x86_64-linux-gnu.so

Pydantic is packaged with a cythonized version. The python code is not actually executed anymore. By removing the *.so files, we regained the ability to step into the code.

Step 2: It’s not your fault

We happily stepped through the Pydantic code, only to realize the problem was not serialization or deserialization, but the initial object construction! We simplified the test case to:

def roundtrip(value_in: str) -> None:
    """Convert to json and back """
    project_in = Project(value=value_in)
    print(f"input: {value_in}, output: {project_in.value}")

What we figured out is that Pydantic is not at fault, but it is the actual type annotation that is wrong:

>>> Project.__annotations__["value"]
<class 'int'>

Or to say it with a picture

The reason for this is that in Python 3.6 unions are flattened: all duplicates and strict subtypes are removed. And bool and StrictBool are both subtypes of int. In Python 3.7 subclasses are no longer removed.

>>> issubclass(pydantic.types.StrictBool, int)
True

The Solution

The solution is simple, but not pretty. We made a copy of StrictBool that is not a subclass of int:

class StrictNonIntBool(object):
    """
    StrictNonIntBool to allow for bools which are not type-coerced and that are not a subclass of int
    Based on StrictBool from pydantic
    """

    @classmethod
    def __get_validators__(cls) -> "types.CallableGenerator":
        yield cls.validate

    @classmethod
    def validate(cls, value: Any) -> bool:
        """
        Ensure that we only allow bools.
        """
        if isinstance(value, bool):
            return value

        raise errors.StrictBoolError()

class Project(pydantic.BaseModel):

    value: Union[StrictNonIntBool, int]

This produces the correct result on both Python 3.6 and Python 3.7.