Pydantic Puzzlers

Posted on Posted in technical

Pydantic is a library to validate JSON documents and convert them into Python objects, based on Python Type Annotations. It works really well, it is easy to use, it supports everything we can possibly expect it to support and saves us a ton of effort. I’ve only started using it recently, but I love it.

However, there a few less intuitive aspects to it, which is what this post is about.

Mystery Boolean

Puzzler: what is the ouput of the following code?

import pydantic
    
class Project(pydantic.BaseModel):
    source: str
    validate_: bool = False

    class Config:
        # validate is not a valid field name, because BaseModel has a validate method
        # use alias so we can use a validate field in the json and validate_ in python
        fields = {"validate_": {"alias": "validate"}}

def roundtrip(project_in: Project) -> None:
    """Convert to json and back """
    project_out = Project(**project_in.dict())
    print(f"input: {project_in.validate_}, output: {project_out.validate_}")

projectt = Project(source="test", validate=True)
roundtrip(projectt)

projectf = Project(source="test", validate=False)
roundtrip(projectf)

The answer:

input: True, output: False
input: False, output: False

This is not what we expect! The validate field always comes out as False.

It took us a while to understand what we did wrong here. To make it more clear, we can print the intermediate JSON.

def roundtrip(project_in: Project) -> None:
    """Convert to json and back """
    as_json = project_in.dict()
    pprint(as_json)
    project_out = Project(**as_json)
    print(f"input: {project_in.validate_}, output: {project_out.validate_}")
{'source': 'test', 'validate_': True}
input: True, output: False
{'source': 'test', 'validate_': False}
input: False, output: False

So, in an unexpected turn of events, the intermediate output does not contain validate but it contains validate_.

In other words, three things work against us here:

  1. The contructor expects aliases (which is the expected behavior) but the dict() call doesn’t use aliases by default (which is unexpected).
  2. Fields in the JSON that are unexpected are ignored, so the validate_ is dropped.
  3. If a field with a default is missing, the default value is used, so validate becomes False.

The correct code would be:

def roundtrip(project_in: Project) -> None:
    """Convert to json and back """
    as_json = project_in.dict(by_alias=True)
    pprint(as_json)
    project_out = Project(**as_json)
    print(f"input: {project_in.validate_}, output: {project_out.validate_}")

Which works as expected:

{'source': 'test', 'validate': True}
input: True, output: True
{'source': 'test', 'validate': False}
input: False, output: False

The asymmetry between serializing (dict()) and deserializing (constructor) is something you have to keep in mind when coding on Pydantic, as it can cause very subtle problems.

My guess is that this asymmetry is a historical accident that has grown into the code and is very hard to change now, as it will break a lot of people’s code.

But even with this unexpected behavior, I still think Pydantic is awesome.