Introduction to Pydantic

Introduction to Pydantic#

Starting File: 03_manual_valid_molecule.py

This chapter will start from the 03_manual_valid_molecule.py and end on the 04_pydantic_molecule.py.

Validating data is hard and time consuming to do by hand. The last chapter showed just how difficult it can be to do even simple validation. Furthermore, all of the type hints we’ve written and the dataclass decorator have been helpful for visually making the code more legible, but otherwise have not been really programmatically helpful in doing validation.

You’ll be introduced to a powerful non-native library called pydantic in this chapter. A data validation and settings management tool which leverages existing Python type hints to handle validation for you. pydantic is not the only possible solution out there for validation of Python data and schema, but it is a natural extension of the type hints and dataclass we’ve already discussed.

Check Out Pydantic

We will not be covering all the capabilities of pydantic here, and we highly encourage you to visit the pydantic docs to learn about all the powerful and easy-to-execute things pydantic can do.

This lesson uses Pydantic 2.0 or greater

This lesson works on Pydantic 2.0+. It is incompatible with Pydantic 1.X.

Compatibility with Python 3.8 and below

If you have Python 3.8 or below, you will need to import container type objects such as List, Tuple, Dict, etc. from the typing library instead of their native types of list, tuple, dict, etc. This chapter will assume Python 3.9 or greater, however, both approaches will work in >=Python 3.9 and have 1:1 replacements of the same name.

Pydantic’s Main Object: BaseModel#

pydantic operates by modifying a class which looks and behaves similar to a dataclass object. However, it instead subclasses the pydantic object called BaseModel to do so. Let’s start with our final Molecule object from Manual Data Validation.

from dataclasses import dataclass
from typing import Union

# Type Helpers
fi = Union[float, int]
lfi = list[fi]
tfi = tuple[fi, ...]
inner = Union[lfi, tfi]
lo = list[inner]
tupo = tuple[inner, ...]


@dataclass
class Molecule:
    name: str
    charge: fi
    symbols: Union[list[str], tuple[str, ...]]
    coordinates: Union[lo, tupo]

    def __post_init__(self):
        # We'll validate the inputs here.
        if not isinstance(self.name, str):
            raise ValueError(f"'name' must be a str, was {self.name}")

        if not (isinstance(self.charge, float) or isinstance(self.charge, int)):
            raise ValueError(f"'charge' must be a float or int, was {self.charge}")

        try:
            if not (isinstance(self.symbols, list) or isinstance(self.symbols, tuple)):
                raise TypeError
            for content in self.symbols:  # Loop over elements
                if not isinstance(content, str):  # Check content
                    raise ValueError(content, type(content))
        except TypeError as exec:  # Trap not iterable item
            # This will throw if you can't iterate over self.symbols
            raise ValueError(f"'symbols' must be a list or tuple of string, was {type(self.symbols)}") from exec
        except ValueError as exec:  # Trap the content error
            raise ValueError(f"Each element of 'symbols' must be a string, was {exec.args[0]} of type {exec.args[1]}") from exec

        try:
            if not (isinstance(self.coordinates, list) or isinstance(self.coordinates, tuple)):
                raise TypeError
            for inner in self.coordinates:  # Loop over elements
                try:
                    if not (isinstance(inner, list) or isinstance(inner, tuple)):
                        raise TypeError
                    for content in inner:  # Loop over elements
                        if not (isinstance(content, int), isinstance(content, float)):  # Check content
                            raise ValueError(content, type(content))
                except TypeError as exec:  # Trap not iterable item
                    # This will throw if you can't iterate over self.symbols
                    raise ValueError(f"'coordinates' inner elements must be a list or tuple of float/int, was {type(inner)}") from exec
                except ValueError as exec:  # Trap the content error
                    raise ValueError(f"Each inner element of 'coordinates' must be a string, was {exec.args[0]} of type {exec.args[1]}") from exec
        except TypeError as exec:  # Trap not iterable item
            # This will throw if you can't iterate over self.symbols
            raise ValueError(f"'coordinates' must be a list or tuple of int/float, was {type(inner)}") from exec
        except ValueError as exec:  # Trap the content error
            raise ValueError(f"'coordinates' must be a list or tuple of int/float, however the following error was thrown: {exec}") from exec

    @property
    def num_atoms(self):
        return len(self.symbols)

    def __str__(self):
        return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"

mol_data = {  # Good data
    "coordinates": [[0, 0, 0]], 
    "symbols": ["H", "H", "O"], 
    "charge": 0.0, 
    "name": "water"
}

bad_name = {"name": 789}  # Name is not str
bad_charge = {"charge": [1, 0.0]}  # Charge is not int or float
noniter_symbols = {"symbols": 1234567890}  # Symbols is an int
nonlist_symbols = {"symbols": '["H", "H", "O"]'}  # Symbols is a string (notably is a string-ified list)
tuple_symbols = {"symbols": ("H", "H", "O")}  # Symbols as a tuple?
bad_coords = {"coordinates": ["1", "2", "3"]}  # Coords is a single list of string
bad_symbols_and_cords = {"symbols": ["H", "H", "O"],
                         "coordinates": [[1, 1, 1], [2.0, 2.0, 2.0]]
                        }  # Coordinates top-level list is not the same length as symbols

This was a fair amount of work to get it to this point from the original version. However, it has the problems of lots of manually written validation code, not actually doing anything with the type hints, and very quickly bloating up. Let’s fix all of these problems one at a time.

To start, let’s convert our Molecule from a dataclass to a pydantic BaseModel by importing BaseModel, subclassing BaseModel into our Molecule, and removing the dataclass decorator. At this point we don’t even need the dataclasses import, so lets remove it as well.

from typing import Union

from pydantic import BaseModel

# Type Helpers
fi = Union[float,int]
lfi = list[fi]
tfi = tuple[fi, ...]
inner = Union[lfi, tfi]
lo = list[inner]
tupo = tuple[inner, ...]

class Molecule(BaseModel):
    name: str
    charge: fi
    symbols: Union[list[str], tuple[str, ...]]
    coordinates: Union[lo, tupo]
    
    def __post_init__(self):
        # We'll validate the inputs here.
        if not isinstance(self.name, str):
            raise ValueError(f"'name' must be a str, was {self.name}")
            
        if not (isinstance(self.charge, float) or isinstance(self.charge, int)):
            raise ValueError(f"'charge' must be a float or int, was {self.charge}")
            
        try:
            if not (isinstance(self.symbols, list) or isinstance(self.symbols, tuple)):
                raise TypeError
            for content in self.symbols:  # Loop over elements
                if not isinstance(content, str):  # Check content
                    raise ValueError(content, type(content))
        except TypeError as exec:  # Trap not iterable item
            # This will throw if you can't iterate over self.symbols
            raise ValueError(f"'symbols' must be a list or tuple of string, was {type(self.symbols)}") from exec
        except ValueError as exec:  # Trap the content error
            raise ValueError(f"Each element of 'symbols' must be a string, was {exec.args[0]} of type {exec.args[1]}") from exec
            
        try:
            if not (isinstance(self.coordinates, list) or isinstance(coordinates, tuple)):
                raise TypeError
            for inner in self.coordinates:  # Loop over elements
                try:
                    if not (isinstance(inner, list) or isinstance(inner, tuple)):
                        raise TypeError
                    for content in inner:  # Loop over elements
                        if not (isinstance(content, int), isinstance(content, float)):  # Check content
                            raise ValueError(content, type(content))
                except TypeError as exec:  # Trap not iterable item
                        # This will throw if you can't iterate over self.symbols
                        raise ValueError(f"'coordinates' inner elements must be a list or tuple of float/int, was {type(inner)}") from exec
                except ValueError as exec:  # Trap the content error
                        raise ValueError(f"Each inner element of 'coordinates' must be a string, was {exec.args[0]} of type {exec.args[1]}") from exec
        except TypeError as exec:  # Trap not iterable item
                # This will throw if you can't iterate over self.symbols
                raise ValueError(f"'coordinates' must be a list or tuple of int/float, was {type(inner)}") from exec
        except ValueError as exec:  # Trap the content error
                raise ValueError(f"'coordinates' must be a list or tuple of int/float, however the following error was thrown: {exec}") from exec
        
    @property
    def num_atoms(self):
        return len(self.symbols)
        
    def __str__(self):
        return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"

Even though we have removed dataclass decorator from our code now, pydantic structures its inputs much in the same way. __init__ is handled from the class itself: assigning Class Attributes to Instance Attributes on call (and many other things, but that’s beyond the scope of this workshop). Also like dataclass, you should not implement an __init__ method as BaseModel.

One main difference between how BaseModel and dataclass behave on initialization is that BaseModel does not accept arguments on a 1-to-1 match of listed Class Attributes. Anticipation of this change in behavior is one of the reasons we have been calling our Molecule by providing keyword arguments instead of positional arguments (and because it’s good practice for the reasons discussed in Dataclasses In Python)

Dataclasses can work with Pydantic if you really want to

Dataclasses and Pydantic are not mutually exclusive. Pydantic provides a dataclass decorator to nearly perfect mimic the native dataclass, but with all the extra validation pydantic provides.

Let’s see what happens if we try to call this class with no other modifications.

water = Molecule(**mol_data)

Huzzah! It worked! But why? We have removed the dataclass decorator, but none of our validation code ran. BaseModel does not have a specialized single function to handle validation (we’ll cover custom validation in Validating Data Beyond Types), so the __post_init__ function does not run; that was a special method of the dataclass. In fact, let’s just delete the entire __post_init__ method as we won’t be needing it anymore. Let’s also delete the __str__ method as BaseModel provides its own representation.

from typing import Union

from pydantic import BaseModel

# Type Helpers
fi = Union[float,int]
lfi = list[fi]
tfi = tuple[fi, ...]
inner = Union[lfi, tfi]
lo = list[inner]
tupo = tuple[inner, ...]

class Molecule(BaseModel):
    name: str
    charge: fi
    symbols: Union[list[str], tuple[str, ...]]
    coordinates: Union[lo, tupo]
    
        
    @property
    def num_atoms(self):
        return len(self.symbols)

That looks simpler, so lets run our model through and actually take a look at the output from print.

water = Molecule(**mol_data)
print(water)

name='water' charge=0.0 symbols=['H', 'H', 'O'] coordinates=[[0, 0, 0]]

You can see that pydantic provides its own complete representation of the data structure, including all its attributes. The model also allows accessing attributes like you would any class attribute as well.

print(water.name)
print(water.coordinates)

water
[[0, 0, 0]]

pydantic also provides a few built-in methods for quick exporting of models to other data structures like dictionaries and JSON strings.

water.model_dump()

{'name': 'water',
 'charge': 0.0,
 'symbols': ['H', 'H', 'O'],
 'coordinates': [[0, 0, 0]]}

water.model_dump_json()

'{"name":"water","charge":0.0,"symbols":["H","H","O"],"coordinates":[[0,0,0]]}'

Here is where pydantic helps us. Because this is a BaseModel, our type hints are no longer hints, they are mandates. Let’s show that by feeding in invalid data.

mangle = {**mol_data, **bad_name, **bad_charge, **bad_coords}
water = Molecule(**mangle)

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Input In [19], in <cell line: 2>()
      1 mangle = {**mol_data, **bad_name, **bad_charge, **bad_coords}
----> 2 water = Molecule(**mangle)

File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/main.py:150, in BaseModel.__init__(__pydantic_self__, **data)
    148 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    149 __tracebackhide__ = True
--> 150 __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)

ValidationError: 15 validation errors for Molecule
name
  Input should be a valid string [type=string_type, input_value=789, input_type=int]
    For further information visit https://errors.pydantic.dev/2.0.3/v/string_type
charge.float
  Input should be a valid number [type=float_type, input_value=[1, 0.0], input_type=list]
    For further information visit https://errors.pydantic.dev/2.0.3/v/float_type
charge.int
  Input should be a valid integer [type=int_type, input_value=[1, 0.0], input_type=list]
    For further information visit https://errors.pydantic.dev/2.0.3/v/int_type
coordinates.`list[union[list[union[float,int]],tuple[union[float,int], ...]]]`.0.list[union[float,int]]
  Input should be a valid list [type=list_type, input_value='1', input_type=str]
    For further information visit https://errors.pydantic.dev/2.0.3/v/list_type
coordinates.`list[union[list[union[float,int]],tuple[union[float,int], ...]]]`.0.`tuple[union[float,int], ...]`
  Input should be a valid tuple [type=tuple_type, input_value='1', input_type=str]
    For further information visit https://errors.pydantic.dev/2.0.3/v/tuple_type
coordinates.`list[union[list[union[float,int]],tuple[union[float,int], ...]]]`.1.list[union[float,int]]
  Input should be a valid list [type=list_type, input_value='2', input_type=str]
    For further information visit https://errors.pydantic.dev/2.0.3/v/list_type
coordinates.`list[union[list[union[float,int]],tuple[union[float,int], ...]]]`.1.`tuple[union[float,int], ...]`
  Input should be a valid tuple [type=tuple_type, input_value='2', input_type=str]
    For further information visit https://errors.pydantic.dev/2.0.3/v/tuple_type
coordinates.`list[union[list[union[float,int]],tuple[union[float,int], ...]]]`.2.list[union[float,int]]
  Input should be a valid list [type=list_type, input_value='3', input_type=str]
    For further information visit https://errors.pydantic.dev/2.0.3/v/list_type
coordinates.`list[union[list[union[float,int]],tuple[union[float,int], ...]]]`.2.`tuple[union[float,int], ...]`
  Input should be a valid tuple [type=tuple_type, input_value='3', input_type=str]
    For further information visit https://errors.pydantic.dev/2.0.3/v/tuple_type
coordinates.`tuple[union[list[union[float,int]],tuple[union[float,int], ...]], ...]`.0.list[union[float,int]]
  Input should be a valid list [type=list_type, input_value='1', input_type=str]
    For further information visit https://errors.pydantic.dev/2.0.3/v/list_type
coordinates.`tuple[union[list[union[float,int]],tuple[union[float,int], ...]], ...]`.0.`tuple[union[float,int], ...]`
  Input should be a valid tuple [type=tuple_type, input_value='1', input_type=str]
    For further information visit https://errors.pydantic.dev/2.0.3/v/tuple_type
coordinates.`tuple[union[list[union[float,int]],tuple[union[float,int], ...]], ...]`.1.list[union[float,int]]
  Input should be a valid list [type=list_type, input_value='2', input_type=str]
    For further information visit https://errors.pydantic.dev/2.0.3/v/list_type
coordinates.`tuple[union[list[union[float,int]],tuple[union[float,int], ...]], ...]`.1.`tuple[union[float,int], ...]`
  Input should be a valid tuple [type=tuple_type, input_value='2', input_type=str]
    For further information visit https://errors.pydantic.dev/2.0.3/v/tuple_type
coordinates.`tuple[union[list[union[float,int]],tuple[union[float,int], ...]], ...]`.2.list[union[float,int]]
  Input should be a valid list [type=list_type, input_value='3', input_type=str]
    For further information visit https://errors.pydantic.dev/2.0.3/v/list_type
coordinates.`tuple[union[list[union[float,int]],tuple[union[float,int], ...]], ...]`.2.`tuple[union[float,int], ...]`
  Input should be a valid tuple [type=tuple_type, input_value='3', input_type=str]
    For further information visit https://errors.pydantic.dev/2.0.3/v/tuple_type

A new type of error has been thrown. The ValidationError is a custom error that pydantic will throw when you try to insert data which does not adhere to the typing assigned to it via the type annotation.

Type Hints No More

We will be calling the pydantic’s use of type “type annotations” because, although they are still technically a “type hint,” they are no longer hints.

pydantic reads the type annotations assigned to the variables, and then validates the incoming arguments against those types. Because we made sure we were thorough enough with our type hints in the Manual Data Validation, our type annotations correctly capture the correct data.

We also now have simultaneous validation of multiple entries. In Manual Data Validation, our validation code would throw the first error it found, without validating everything else. Here, pydantic is validating everything all at once, and raising it at the end.

Reading the ValidationError output takes some getting used to, but can be understood with practice.

Reading the Validation error#

charge.float
  Input should be a valid number [type=float_type, input_value=[1, 0.0], input_type=list]
    For further information visit https://errors.pydantic.dev/2.0.3/v/float_type
charge.int
  Input should be a valid integer [type=int_type, input_value=[1, 0.0], input_type=list]
    For further information visit https://errors.pydantic.dev/2.0.3/v/int_type

Here charge is the attribute and its checking against its type float. Below that is information about the input it should be and what it was. Below that is a link to this particular validation type helper.

On the next top line, charge’s second possible option of int is also not satisfied because it’s not an int. pydantic treats Union types as an either or and validates them separately, accepting whichever one comes first.

coordinates.`list[union[list[union[float,int]],tuple[union[float,int], ...]]]`.0.list[union[float,int]]
  Input should be a valid list [type=list_type, input_value='1', input_type=str]
    For further information visit https://errors.pydantic.dev/2.0.3/v/list_type
coordinates.`list[union[list[union[float,int]],tuple[union[float,int], ...]]]`.0.`tuple[union[float,int], ...]`
  Input should be a valid tuple [type=tuple_type, input_value='1', input_type=str]
    For further information visit https://errors.pydantic.dev/2.0.3/v/tuple_type
coordinates.`list[union[list[union[float,int]],tuple[union[float,int], ...]]]`.1.list[union[float,int]]
  Input should be a valid list [type=list_type, input_value='2', input_type=str]
    For further information visit https://errors.pydantic.dev/2.0.3/v/list_type

coordinates is the attribute that did not receive valid data. coordinates.{A Bunch of Type Info}.0 indicates that at index 0 of coordinates, the validator was expecting a list but did not get one. The second entry checks against the tuple option as well. The third entry of coordinates.{A Bunch of Type Info}.1 specifies index 1 of coordinates was also not a list.

Data Coercion#

pydantic has already helped us simplify our code by providing type checking, but let’s simplify further by reducing our type annotation complexity and seeing what pydantic does to some invalid types.

For starters, let’s assume charge can only be a float. Right now charge accepts float or int, and because it does, we can see that pydantic does show a different output depending on what type we give it.

int_charge = {**mol_data, **{"charge": 0}}
float_charge = {**mol_data, **{"charge": -1.5}}  # Value that can't be cast to int.
print(type(int_charge["charge"]))
print(type(float_charge["charge"]))

<class 'int'>
<class 'float'>

int_water = Molecule(**int_charge)
float_water = Molecule(**float_charge)
print(f"Integer water has value {int_water.charge} and type {type(int_water.charge)}")
print(f"Float water has value {float_water.charge} and type {type(float_water.charge)}")

Integer water has value 0 and type <class 'int'>
Float water has value -1.5 and type <class 'float'>

You can see that pydantic preserved the correct types of data when that type was a valid type. Version 1.X of pydantic would automatically cast data to the first valid type that it could, which could result in unexpected behaviors.

However, there may be a case where we want data to be cast to specific types. This process is called Data Coercion: the process of molding and shaping data to adhere to certain rules. To see what pydantic is doing under the hood, take a look at our first type helper.

# Type Helpers
fi = Union[float,int]

We specified that float was first and int was second. From a pure set theory standpoint, order should not matter, and pydantic respects that rule. Down at a code level, pydantic is doing a few things. Here is a simplified list

Handle pre-validators (covered later Validating Data Beyond Types)
Attempt to coerce data through first type annotation encountered without loss
Keep checking against all other types to see if there is a more exact type match
If a more correct match is found, accept it and move to final step.
If no more-correct match is found, accept earlier match with coercion
Repeat 2-4 until resolved or error thrown with no resolution
Handle user validators (covered later Validating Data Beyond Types)

We could reverse the order and try again to prove the order does not matter:

class IntThenFloatMolecule(Molecule):  # Subclass our defined model to inherit attributes
    charge: Union[int, float]

int_water = IntThenFloatMolecule(**int_charge)
float_water = IntThenFloatMolecule(**float_charge)
print(f"Integer water has value {int_water.charge} and type {type(int_water.charge)}")
print(f"Float water has value {float_water.charge} and type {type(float_water.charge)}")

Integer water has value 0 and type <class 'int'>
Float water has value -1.5 and type <class 'float'>

So far we have not lost information, but data coercion is still quite important to prevent loss of data. For example, there may be some times where we want to coerce data, such as if someone gives us a count of something as a float, but we need it as an int; alternately, a value you expect to support fractional float values like partial charges but someone gave you an int. Pydantic can help us with that.

class FloatMolecule(Molecule):  # Subclass our defined model to inherit attributes
    charge: Union[float]        # Maybe you want to only have float based charges

int_water = FloatMolecule(**int_charge)
float_water = FloatMolecule(**float_charge)
print(f"Integer water has value {int_water.charge} and type {type(int_water.charge)}")
print(f"Float water has value {float_water.charge} and type {type(float_water.charge)}")

Integer water has value 0.0 and type <class 'float'>
Float water has value -1.5 and type <class 'float'>

Here we have taken int from the int_charge values and coerced it into a float type. What happens if we do the reverse? Let’s also define an additional float_charge-like data into something to illustrate data coercion.

class IntMolecule(Molecule):  # Subclass our defined model to inherit attributes
    charge: Union[int]        # Maybe you want to only want whole charges?
        
coercable_float_charge = {**mol_data, **{"charge": 2.0}}

int_water = IntMolecule(**int_charge)
float_water = IntMolecule(**coercable_float_charge)
print(f"Integer water has value {int_water.charge} and type {type(int_water.charge)}")
print(f"Float water has value {float_water.charge} and type {type(float_water.charge)}")

Integer water has value 0 and type <class 'int'>
Float water has value 2 and type <class 'int'>

We can see here the 2.0 was cast to an integer because there was no loss of data. You might be able to guess what will happen if we use a number which cannot be cast to int without loss of data, such as the -1.5 of float_charge, but lets see what happens anyways.

int_water = IntMolecule(**int_charge)
float_water = IntMolecule(**float_charge)
print(f"Integer water has value {int_water.charge} and type {type(int_water.charge)}")
print(f"Float water has value {float_water.charge} and type {type(float_water.charge)}")

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Input In [48], in <cell line: 2>()
      1 int_water = IntMolecule(**int_charge)
----> 2 float_water = IntMolecule(**float_charge)
      3 print(f"Integer water has value {int_water.charge} and type {type(int_water.charge)}")
      4 print(f"Float water has value {float_water.charge} and type {type(float_water.charge)}")

File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/main.py:150, in BaseModel.__init__(__pydantic_self__, **data)
    148 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    149 __tracebackhide__ = True
--> 150 __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)

ValidationError: 1 validation error for IntMolecule
charge
  Input should be a valid integer, got a number with a fractional part [type=int_from_float, input_value=-1.5, input_type=float]
    For further information visit https://errors.pydantic.dev/2.0.3/v/int_from_float

In order to prevent loss of data, pydantic didn’t let us convert the float of -1.5 to an int because the fractional value of the data would have been lost.

The lesson here is that it is better to be permissive where possible. Since float will ensure we don’t have data loss or errors for a field which can accept both, we will simplify out types to make reading and error parsing easier.

Don’t like something? Config it

Pydantic’s BaseModels are highly configurable through a class attribute you can create in any model called model_config. Some configurations will be shown later, but you can always check the pydantic model_config docs for more things you can do. Preventing coercion at all, for example, is a setting called “strict.”

from typing import Union

from pydantic import BaseModel

# Type Helpers
fi = Union[float,int]
lfi = list[fi]
tfi = tuple[fi, ...]
inner = Union[lfi, tfi]
lo = list[inner]
tupo = tuple[inner, ...]

class Molecule(BaseModel):
    name: str
    charge: float
    symbols: list[str]
    coordinates: Union[lo, tupo]
    
        
    @property
    def num_atoms(self):
        return len(self.symbols)

We’ve simplified some of our type hints now. One of the other changes we’ve made is setting symbols to a list of strings instead of accepting either list or tuple.

Practice simplifying type annotations with coercion

What would a simplified type annotation of coordinates be?

Solution:

Any one of the following. Note: We’ll prefer list so we can change the coordinates in place if we need to later.

coordinates: list[list[float]]
coordinates: list[list[Union[float, int]]
coordinates: tuple[tuple[float]]
coordinates: tuple[tuple[Union[float, int]]

Incorrect Answer:

This option is unstable because it will try to cast floats to integers by default, which is bad unless you are on a discrete grid of coordinates and throw an error on anything which can’t be cast to int.

coordinates: list[list[Union[int]]

Preparing for Custom Validation#

from pydantic import BaseModel


class Molecule(BaseModel):
    name: str
    charge: float
    symbols: list[str]
    coordinates: list[list[float]]

    @property
    def num_atoms(self):
        return len(self.symbols)

Above is our final code for this chapter and is the code in 04_pydantic_molecule.py. We’ve converted our original code to a type validated model which is easy to read. We did this by leveraging the power of the pydantic module, but through the process of understanding type hints, and then dataclasses structure native to Python.

Next chapter we’ll cover doing so much more with pydantic (and yet still so little of what it can), focusing on writing validators beyond simple type checks.