Validating Data Beyond Types#
Starting File: 04_pydantic_molecule.py
This chapter will start from the 04_pydantic_molecule.py
and end on the 05_valid_pydantic_molecule.py
.
Data validation goes far beyond just type. Pydantic has provided the basic tools for doing data validation on data types, but it also provides the tools for writing custom validators to check so much more.
We’ll be covering the pydantic validator
decorator and applying that to our data to check structure and scientific rigor. We’ll also cover how to validate types not native to Python, such as NumPy arrays.
Check Out Pydantic
We will not be covering all the capabilities of pydantic here, and we highly encourage you to visit the pydantic docs to learn about all the powerful and easy-to-execute things pydantic can do.
Compatibility with Python 3.8 and below
If you have Python 3.8 or below, you will need to import container type objects such as List
, Tuple
, Dict
, etc. from the typing
library instead of their native types of list
, tuple
, dict
, etc. This chapter will assume Python 3.9 or greater, however, both approaches will work in >=Python 3.9 and have 1:1 replacements of the same name.
Pydantic’s Validator Decorator#
Let’s start by looking at the state of our code prior to extending the validators. As usual, let’s also define our test data.
from pydantic import BaseModel
class Molecule(BaseModel):
name: str
charge: float
symbols: list[str]
coordinates: list[list[float]]
@property
def num_atoms(self):
return len(self.symbols)
mol_data = { # Good data
"coordinates": [[0, 0, 0], [1, 1, 1], [2, 2, 2]],
"symbols": ["H", "H", "O"],
"charge": 0.0,
"name": "water"
}
bad_name = {"name": 789} # Name is not str
bad_charge = {"charge": [1, 0.0]} # Charge is not int or float
noniter_symbols = {"symbols": 1234567890} # Symbols is an int
nonlist_symbols = {"symbols": '["H", "H", "O"]'} # Symbols is a string (notably is a string-ified list)
tuple_symbols = {"symbols": ("H", "H", "O")} # Symbols as a tuple?
bad_coords = {"coordinates": ["1", "2", "3"]} # Coords is a single list of string
inner_coords_not3d = {"coordinates": [[1, 2, 3], [4, 5]]}
bad_symbols_and_cords = {"symbols": ["H", "H", "O"],
"coordinates": [[1, 1, 1], [2.0, 2.0, 2.0]]
} # Coordinates top-level list is not the same length as symbols
You may notice we have extended our “Good Data” here to have coordinates
actually define the Nx3
structure where N = len(symbols)
. This is important for what we plan to validate.
pydantic allows you to write custom validators, in addition to the type validators which run automatically for a type annotation. This field_validator
is pulled from the pydantic
module just like BaseModel
, and is used to decorate a class function you write. Let’s look at the most basic field_validator
we can write and assign it to coordinates
.
Field vs Annotated Validators
pydantic
allows validators to be defined functionally for reuse, ordering, and much more powerful utilization through the Annotated
class. We will be showing field_validator
for this example to keep the validator much more local for learning purposes. Please see (the pydantic docs on validators for more info.)[https://docs.pydantic.dev/latest/usage/validators/]
from pydantic import BaseModel, field_validator
class Molecule(BaseModel):
name: str
charge: float
symbols: list[str]
coordinates: list[list[float]]
@field_validator("coordinates")
@classmethod
def ensure_coordinates_is_3D(cls, coords):
return coords
@property
def num_atoms(self):
return len(self.symbols)
Here we have defined an additional validator which does nothing, but has the basic structure we can look at. For convenience and reference, I’ve broken the aspects of the field_validator
into a list.
The
field_validator
decorator takes as arguments the exact name of the attributes you are validating against as a string. In this casecoordinates
. You could provide multiple string args of each attribute you want to run through the validator if you want to reuse it.The function name can be whatever you want it to be. We’ve called it
ensure_coordinates_is_3D
to be meaningful if anyone ever wants to come back and see what this should be doing.The function itself is a class function. This is why we have included the
@classmethod
decorator from native Python, this validator is intended to be called on the non-instanced class. The formal nomenclature for the first variable here is thereforecls
and notself
. You can define the validators without the@classmethod
decorator, but your IDE may complain about this, so we also add the@classmethod
decorator so we can usecls
without IDE issues, at least on that point.The first (non
cls
) argument of the function can be whatever string name you want. The optional second argument will be give a pydantic metadata class of typeFieldValidationInfo
and can also be named whatever we want. We’ll use this metadata class later in the chapter.The return MUST be the validated data to be fed into the attribute. We’ve done nothing to our variable
coords
, so we simply return it. If you fail to have areturn
statement with something, it will returnNone
and that will be considered valid.If the data are not validated correctly, the function must raise either a
ValueError
orAssertionError
for pydantic to correctly trap the error, anything else will raise the Python error stack as normal.field_validator
runs after type validation, unless specified (see later in this chapter).
That may seem like lots of rules, but most of them are boilerplate and intuitive. Let’s apply these items to our validator. We want to make sure the inner lists of coordinates
are 3D, or length 3. We don’t have to worry about type checking (that was done before any custom field_validator
was run), so we can just do an iteration of the top list and make sure. Let’s apply that now.
from pydantic import BaseModel, field_validator
class Molecule(BaseModel):
name: str
charge: float
symbols: list[str]
coordinates: list[list[float]]
@field_validator("coordinates")
@classmethod
def ensure_coordinates_is_3D(cls, coords):
if any(len(failure := inner) != 3 for inner in coords): # Walrus operator (:=) for Python 3.8+
raise ValueError(f"Inner coordinates must be 3D, got {failure} of length {len(failure)}")
return coords
@property
def num_atoms(self):
return len(self.symbols)
good_water = Molecule(**mol_data)
mangled = {**mol_data, **inner_coords_not3d}
water = Molecule(**mangled)
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Input In [90], in <cell line: 3>()
1 good_water = Molecule(**mol_data)
2 mangled = {**mol_data, **inner_coords_not3d}
----> 3 water = Molecule(**mangled)
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/main.py:150, in BaseModel.__init__(__pydantic_self__, **data)
148 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
149 __tracebackhide__ = True
--> 150 __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
ValidationError: 1 validation error for Molecule
coordinates
Value error, Inner coordinates must be 3D, got [4.0, 5.0] of length 2 [type=value_error, input_value=[[1, 2, 3], [4, 5]], input_type=list]
For further information visit https://errors.pydantic.dev/2.0.3/v/value_error
Here we have checked the good data still works, and checked that the mangled data raised an error. It’s important to note the error raised by the function was a ValueError
(or AssertionError
) so the error report was a ValidationError
. We can also see the error message is what we put as the error string and type
of error is of the type we raised. This is why it’s very important to have meaningful error strings when your custom validator fails.
With all that said, our validator function really does look like any other function we may call to do a quick check of data, and then some special addons to make it work with pydantic. There is no practical limit to the number of field_validator
s you have in a given class, so validate to your heart’s content.
Python Assignment Expressions “The Walrus Operator” :=
Since Python 3.8, there is a new operator for “assignment expressions” called “The Walrus Operator” which allows variables to be assigned inside other expressions. We’ve used it here to trap the value at time of error and save space. Do not feel compelled to use this yourself, especially if it’s not clear what is happening.
Check your knowledge: Validator Basics
How would you validate that symbols
entries are at most 2 characters? There is more than one correct solution beyond what we show here.
Possible Solution:
@field_validator("symbols")
@classmethod
def symbols_are_possible_element_length(cls, symbs):
if not all(1 <= len(failure := symb) <= 2 for symb in symbs):
raise ValueError(f"Symbols be 1 or 2 characters, got {failure}")
return symbs
Validating against other fields#
pydantic’s validators can check fields beyond their own. This is helpful for cross referencing dependent data. In our example, we want to make sure there are exactly the right number of coordinates
as there are symbols
in our Molecule
. To check against other fields in a field_validator
, we extend the arguments to include the optional secondary one for metadata we’re going to call info
. We are going to leave our initial validator to show a feature of the field_validator
s for now, but we could combine them (and will) later.
from pydantic import BaseModel, field_validator
class Molecule(BaseModel):
name: str
charge: float
symbols: list[str]
coordinates: list[list[float]]
@field_validator("coordinates")
@classmethod
def ensure_coordinates_match_symbols(cls, coords, info):
n_symbols = len(info.data["symbols"])
if (n_coords := len(coords)) != n_symbols: # Walrus operator (:=) for Python 3.8+
raise ValueError(f"There must be an equal number of XYZ coordinates as there are symbols."
f" There are {n_coords} coordinates and {n_symbols} symbols.")
return coords
@field_validator("coordinates")
@classmethod
def ensure_coordinates_is_3D(cls, coords):
if any(len(failure := inner) != 3 for inner in coords): # Walrus operator (:=) for Python 3.8+
raise ValueError(f"Inner coordinates must be 3D, got {failure} of length {len(failure)}")
return coords
@property
def num_atoms(self):
return len(self.symbols)
We’ve added a second validator to our code called ensure_coordinates_match_symbols
, and this funciton will validate against coordinates
. There are two main things we can see from adding this function:
Multiple functions can be declared to validate against the same field.
We’ve added the additional optional metadata argument to our new validator:
info
.
The second argument, if it appears in a field_validator
, provides metadata for the validation currently happening and that has already happened. The addition of info
as an argument tells the field_validator
to also retrieve all previously validated fields for the model. In our case, that would be name
, charge
, and symbols
as those entries appeared before coordinates
in the list of attributes. Any and all validators which would have been applied to those three entries have already been done and what we have access to is their validated records as metadata object with those validated values stored in the dictionary at .data
. See the pydantic docs for more details about the special argument and metadata for field_validator
.
Let’s see this in action
good_water = Molecule(**mol_data)
mangled = {**mol_data, **bad_symbols_and_cords}
water = Molecule(**mangled)
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Input In [92], in <cell line: 3>()
1 good_water = Molecule(**mol_data)
2 mangled = {**mol_data, **bad_symbols_and_cords}
----> 3 water = Molecule(**mangled)
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/main.py:150, in BaseModel.__init__(__pydantic_self__, **data)
148 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
149 __tracebackhide__ = True
--> 150 __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
ValidationError: 1 validation error for Molecule
coordinates
Value error, There must be an equal number of XYZ coordinates as there are symbols. There are 2 coordinates and 3 symbols. [type=value_error, input_value=[[1, 1, 1], [2.0, 2.0, 2.0]], input_type=list]
For further information visit https://errors.pydantic.dev/2.0.3/v/value_error
Non-native Types in Pydantic#
Scientific data does not, and often should not, be confined to native Python types. One of the most common data types, especially in the sciences, is the NumPy Array (ndarray
class). The most natural place for this would be coordinates
where we want to simplify this list of list construct. Let’s see what happens when we try to just make the type annotation a ndarray
and see how pydantic handles coercion, or how it does not.
import numpy as np
from pydantic import BaseModel, field_validator
class Molecule(BaseModel):
name: str
charge: float
symbols: list[str]
coordinates: np.ndarray
@field_validator("coordinates")
@classmethod
def ensure_coordinates_match_symbols(cls, coords, info):
n_symbols = len(info.data["symbols"])
if (n_coords := len(coords)) != n_symbols: # Walrus operator (:=) for Python 3.8+
raise ValueError(f"There must be an equal number of XYZ coordinates as there are symbols."
f" There are {n_coords} coordinates and {n_symbols} symbols.")
return coords
@field_validator("coordinates")
@classmethod
def ensure_coordinates_is_3D(cls, coords):
if any(len(failure := inner) != 3 for inner in coords): # Walrus operator (:=) for Python 3.8+
raise ValueError(f"Inner coordinates must be 3D, got {failure} of length {len(failure)}")
return coords
@property
def num_atoms(self):
return len(self.symbols)
---------------------------------------------------------------------------
PydanticSchemaGenerationError Traceback (most recent call last)
Input In [93], in <cell line: 4>()
1 import numpy as np
2 from pydantic import BaseModel, field_validator
----> 4 class Molecule(BaseModel):
5 name: str
6 charge: float
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py:174, in ModelMetaclass.__new__(mcs, cls_name, bases, namespace, __pydantic_generic_metadata__, __pydantic_reset_parent_namespace__, **kwargs)
172 types_namespace = get_cls_types_namespace(cls, parent_namespace)
173 set_model_fields(cls, bases, config_wrapper, types_namespace)
--> 174 complete_model_class(
175 cls,
176 cls_name,
177 config_wrapper,
178 raise_errors=False,
179 types_namespace=types_namespace,
180 )
181 # using super(cls, cls) on the next line ensures we only call the parent class's __pydantic_init_subclass__
182 # I believe the `type: ignore` is only necessary because mypy doesn't realize that this code branch is
183 # only hit for _proper_ subclasses of BaseModel
184 super(cls, cls).__pydantic_init_subclass__(**kwargs) # type: ignore[misc]
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py:431, in complete_model_class(cls, cls_name, config_wrapper, raise_errors, types_namespace)
424 handler = CallbackGetCoreSchemaHandler(
425 partial(gen_schema.generate_schema, from_dunder_get_core_schema=False),
426 gen_schema,
427 ref_mode='unpack',
428 )
430 try:
--> 431 schema = cls.__get_pydantic_core_schema__(cls, handler)
432 except PydanticUndefinedAnnotation as e:
433 if raise_errors:
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/main.py:533, in BaseModel.__get_pydantic_core_schema__(cls, _BaseModel__source, _BaseModel__handler)
530 if not cls.__pydantic_generic_metadata__['origin']:
531 return cls.__pydantic_core_schema__
--> 533 return __handler(__source)
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/_internal/_schema_generation_shared.py:82, in CallbackGetCoreSchemaHandler.__call__(self, _CallbackGetCoreSchemaHandler__source_type)
81 def __call__(self, __source_type: Any) -> core_schema.CoreSchema:
---> 82 schema = self._handler(__source_type)
83 ref = schema.get('ref')
84 if self._ref_mode == 'to-def':
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py:280, in GenerateSchema.generate_schema(self, obj, from_dunder_get_core_schema, from_prepare_args)
278 if isinstance(obj, type(Annotated[int, 123])):
279 return self._annotated_schema(obj)
--> 280 return self._generate_schema_for_type(
281 obj, from_dunder_get_core_schema=from_dunder_get_core_schema, from_prepare_args=from_prepare_args
282 )
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py:301, in GenerateSchema._generate_schema_for_type(self, obj, from_dunder_get_core_schema, from_prepare_args)
298 schema = from_property
300 if schema is None:
--> 301 schema = self._generate_schema(obj)
303 metadata_js_function = _extract_get_pydantic_json_schema(obj, schema)
304 if metadata_js_function is not None:
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py:519, in GenerateSchema._generate_schema(self, obj)
516 from ..main import BaseModel
518 if lenient_issubclass(obj, BaseModel):
--> 519 return self._model_schema(obj)
521 if isinstance(obj, PydanticRecursiveRef):
522 return core_schema.definition_reference_schema(schema_ref=obj.type_ref)
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py:370, in GenerateSchema._model_schema(self, cls)
367 self._config_wrapper_stack.append(config_wrapper)
368 try:
369 fields_schema: core_schema.CoreSchema = core_schema.model_fields_schema(
--> 370 {k: self._generate_md_field_schema(k, v, decorators) for k, v in fields.items()},
371 computed_fields=[self._computed_field_schema(d) for d in decorators.computed_fields.values()],
372 extra_validator=extra_validator,
373 model_name=cls.__name__,
374 )
375 finally:
376 self._config_wrapper_stack.pop()
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py:370, in <dictcomp>(.0)
367 self._config_wrapper_stack.append(config_wrapper)
368 try:
369 fields_schema: core_schema.CoreSchema = core_schema.model_fields_schema(
--> 370 {k: self._generate_md_field_schema(k, v, decorators) for k, v in fields.items()},
371 computed_fields=[self._computed_field_schema(d) for d in decorators.computed_fields.values()],
372 extra_validator=extra_validator,
373 model_name=cls.__name__,
374 )
375 finally:
376 self._config_wrapper_stack.pop()
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py:674, in GenerateSchema._generate_md_field_schema(self, name, field_info, decorators)
667 def _generate_md_field_schema(
668 self,
669 name: str,
670 field_info: FieldInfo,
671 decorators: DecoratorInfos,
672 ) -> core_schema.ModelField:
673 """Prepare a ModelField to represent a model field."""
--> 674 common_field = self._common_field_schema(name, field_info, decorators)
675 return core_schema.model_field(
676 common_field['schema'],
677 serialization_exclude=common_field['serialization_exclude'],
(...)
681 metadata=common_field['metadata'],
682 )
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py:714, in GenerateSchema._common_field_schema(self, name, field_info, decorators)
712 schema = self._apply_annotations(source_type, annotations, transform_inner_schema=set_discriminator)
713 else:
--> 714 schema = self._apply_annotations(
715 source_type,
716 annotations,
717 )
719 # This V1 compatibility shim should eventually be removed
720 # push down any `each_item=True` validators
721 # note that this won't work for any Annotated types that get wrapped by a function validator
722 # but that's okay because that didn't exist in V1
723 this_field_validators = filter_field_decorator_info_by_field(decorators.validators.values(), name)
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py:1405, in GenerateSchema._apply_annotations(self, source_type, annotations, transform_inner_schema)
1400 annotation = annotations[idx]
1401 get_inner_schema = self._get_wrapped_inner_schema(
1402 get_inner_schema, annotation, pydantic_js_annotation_functions
1403 )
-> 1405 schema = get_inner_schema(source_type)
1406 if pydantic_js_annotation_functions:
1407 metadata = CoreMetadataHandler(schema).metadata
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/_internal/_schema_generation_shared.py:82, in CallbackGetCoreSchemaHandler.__call__(self, _CallbackGetCoreSchemaHandler__source_type)
81 def __call__(self, __source_type: Any) -> core_schema.CoreSchema:
---> 82 schema = self._handler(__source_type)
83 ref = schema.get('ref')
84 if self._ref_mode == 'to-def':
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py:1366, in GenerateSchema._apply_annotations.<locals>.inner_handler(obj)
1364 from_property = self._generate_schema_from_property(obj, obj)
1365 if from_property is None:
-> 1366 schema = self._generate_schema(obj)
1367 else:
1368 schema = from_property
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py:586, in GenerateSchema._generate_schema(self, obj)
583 return self._type_alias_type_schema(obj)
585 if origin is None:
--> 586 return self._arbitrary_type_schema(obj, obj)
588 # Need to handle generic dataclasses before looking for the schema properties because attribute accesses
589 # on _GenericAlias delegate to the origin type, so lose the information about the concrete parametrization
590 # As a result, currently, there is no way to cache the schema for generic dataclasses. This may be possible
591 # to resolve by modifying the value returned by `Generic.__class_getitem__`, but that is a dangerous game.
592 if _typing_extra.is_dataclass(origin):
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py:638, in GenerateSchema._arbitrary_type_schema(self, obj, type_)
636 return core_schema.is_instance_schema(type_)
637 else:
--> 638 raise PydanticSchemaGenerationError(
639 f'Unable to generate pydantic-core schema for {obj!r}. '
640 'Set `arbitrary_types_allowed=True` in the model_config to ignore this error'
641 ' or implement `__get_pydantic_core_schema__` on your type to fully support it.'
642 '\n\nIf you got this error by calling handler(<some type>) within'
643 ' `__get_pydantic_core_schema__` then you likely need to call'
644 ' `handler.generate_schema(<some type>)` since we do not call'
645 ' `__get_pydantic_core_schema__` on `<some type>` otherwise to avoid infinite recursion.'
646 )
PydanticSchemaGenerationError: Unable to generate pydantic-core schema for <class 'numpy.ndarray'>. Set `arbitrary_types_allowed=True` in the model_config to ignore this error or implement `__get_pydantic_core_schema__` on your type to fully support it.
If you got this error by calling handler(<some type>) within `__get_pydantic_core_schema__` then you likely need to call `handler.generate_schema(<some type>)` since we do not call `__get_pydantic_core_schema__` on `<some type>` otherwise to avoid infinite recursion.
For further information visit https://errors.pydantic.dev/2.0.3/u/schema-for-unknown-type
This error was thrown because pydantic is coded to handle certain types of data, but it cannot handle types it was not programmed to understand. However, pydantic does provide a useful error message to fix this.
You can configure your pydantic models to modify their behavior by adding a class attribute within the BaseModel
class explicitly called model_config
that is an instance of the ConfigDict
class provided by pydantic. Within that class, you set class keywords which serve as settings for the model they are attached to.
More model_config settings
You can see all of the config settings in the pydantic docs
Our particular error says many things, but we are going to focus on the simplest where it says we need to configure our model and set arbitrary_types_allowed
, in this case to True
. This will tell this particular BaseModel
to permit types that it does not naturally understand how to handle, and assume the user/programer will handle it. Let’s see what Molecule
looks like with this set. Note: The location of the model_config
attribute does not matter, and model_config
is on a per-model basis, not a global pydantic configuration.
Better and more powerful ways to do this with pydantic
Pydantic has much more powerful and precise ways to establish custom types than what we show here! Treat this lesson as a rudimentary basics in understanding custom types and only some of the ways to validate them. Please see the pydantic docs on custom validation which includes examples on how to handle third-party types such as NumPy or Pandas.
import numpy as np
from pydantic import BaseModel, field_validator, ConfigDict
class Molecule(BaseModel):
name: str
charge: float
symbols: list[str]
coordinates: np.ndarray
model_config = ConfigDict(arbitrary_types_allowed = True)
@field_validator("coordinates")
@classmethod
def ensure_coordinates_match_symbols(cls, coords, info):
n_symbols = len(info.data["symbols"])
if (n_coords := len(coords)) != n_symbols: # Walrus operator (:=) for Python 3.8+
raise ValueError(f"There must be an equal number of XYZ coordinates as there are symbols."
f" There are {n_coords} coordinates and {n_symbols} symbols.")
return coords
@field_validator("coordinates")
@classmethod
def ensure_coordinates_is_3D(cls, coords):
if any(len(failure := inner) != 3 for inner in coords): # Walrus operator (:=) for Python 3.8+
raise ValueError(f"Inner coordinates must be 3D, got {failure} of length {len(failure)}")
return coords
@property
def num_atoms(self):
return len(self.symbols)
Our model is now configured to allow arbitrary types; no more error. Let’s see what happens when we pass in our data.
water = Molecule(**mol_data)
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Input In [95], in <cell line: 1>()
----> 1 water = Molecule(**mol_data)
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/main.py:150, in BaseModel.__init__(__pydantic_self__, **data)
148 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
149 __tracebackhide__ = True
--> 150 __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
ValidationError: 1 validation error for Molecule
coordinates
Input should be an instance of ndarray [type=is_instance_of, input_value=[[0, 0, 0], [1, 1, 1], [2, 2, 2]], input_type=list]
For further information visit https://errors.pydantic.dev/2.0.3/v/is_instance_of
We’re still getting a validation error, but it’s different. pydantic is now telling us that the data given to coordinates
must be of type ndarray
. Remember there are two default levels of validation in pydantic: Ensure type, manually written validators. When we have arbitrary_types_allowed
configured, any unknown type to pydantic is not type-checked or coerced beyond that it is the declared type. Effectively, a glorified isinstance
check.
So to fix this, either the user has to have already cast the data to the expected type, or the developer has to preempt the type validation somehow.
Before-Validators in Pydantic#
Good news! You can make pydantic validators that run before the type validation, effectively adding a third layer of validation stack. These are called “before validators” and will run before any other level of validator. The primary use case for these validators is data coercion, and that includes casting incoming data to specific types. E.g. Casting a list of lists to a NumPy array because we have arbitrary_types_allowed
set.
A pre-validator is defined exactly like any other field_validator
, it just has the keyword mode='before'
in its arguments. We’re going to use the validator to take the coordinates
data in, and cast it to a NumPy array.
import numpy as np
from pydantic import BaseModel, field_validator, ConfigDict
class Molecule(BaseModel):
name: str
charge: float
symbols: list[str]
coordinates: np.ndarray
model_config = ConfigDict(arbitrary_types_allowed = True)
@field_validator("coordinates", mode='before')
@classmethod
def coord_to_numpy(cls, coords):
try:
coords = np.asarray(coords)
except ValueError:
raise ValueError(f"Could not cast {coords} to numpy array")
return coords
@field_validator("coordinates")
@classmethod
def ensure_coordinates_match_symbols(cls, coords, info):
n_symbols = len(info.data["symbols"])
if (n_coords := len(coords)) != n_symbols: # Walrus operator (:=) for Python 3.8+
raise ValueError(f"There must be an equal number of XYZ coordinates as there are symbols."
f" There are {n_coords} coordinates and {n_symbols} symbols.")
return coords
@field_validator("coordinates")
@classmethod
def ensure_coordinates_is_3D(cls, coords):
if any(len(failure := inner) != 3 for inner in coords): # Walrus operator (:=) for Python 3.8+
raise ValueError(f"Inner coordinates must be 3D, got {failure} of length {len(failure)}")
return coords
@property
def num_atoms(self):
return len(self.symbols)
Now we can see what happens when we run our model
water = Molecule(**mol_data)
water.coordinates
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
We now have a NumPy array for our coordinates
. Since we now have a NumPy array for coordinates
, we can refine the original validator
s. We’ll condense our normal coordinates
validator
s down to a single one.
import numpy as np
from pydantic import BaseModel, field_validator, ConfigDict
class Molecule(BaseModel):
name: str
charge: float
symbols: list[str]
coordinates: np.ndarray
model_config = ConfigDict(arbitrary_types_allowed = True)
@field_validator("coordinates", mode='before')
@classmethod
def coord_to_numpy(cls, coords):
try:
coords = np.asarray(coords)
except ValueError:
raise ValueError(f"Could not cast {coords} to numpy array")
return coords
@field_validator("coordinates")
@classmethod
def coords_length_of_symbols(cls, coords, info):
symbols = info.data["symbols"]
if (len(coords.shape) != 2) or (len(symbols) != coords.shape[0]) or (coords.shape[1] != 3):
raise ValueError(f"Coordinates must be of shape [Number Symbols, 3], was {coords.shape}")
return coords
@property
def num_atoms(self):
return len(self.symbols)
water = Molecule(**mol_data)
mangle = {**mol_data, **bad_charge, **bad_coords}
water = Molecule(**mangle)
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Input In [100], in <cell line: 2>()
1 mangle = {**mol_data, **bad_charge, **bad_coords}
----> 2 water = Molecule(**mangle)
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/main.py:150, in BaseModel.__init__(__pydantic_self__, **data)
148 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
149 __tracebackhide__ = True
--> 150 __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
ValidationError: 2 validation errors for Molecule
charge
Input should be a valid number [type=float_type, input_value=[1, 0.0], input_type=list]
For further information visit https://errors.pydantic.dev/2.0.3/v/float_type
coordinates
Value error, Coordinates must be of shape [Number Symbols, 3], was (3,) [type=value_error, input_value=['1', '2', '3'], input_type=list]
For further information visit https://errors.pydantic.dev/2.0.3/v/value_error
We’ve now upgraded our Molecule
with more advanced data validation leaning into scientific validity, added in custom types which increase our model’s usability, and configured our model to further expand our capabilities. The code is now at the Lesson Materials labeled 05_valid_pydantic_molecule.py
.
Next chapter we’ll look at nesting models to allow more complicated data structures.
Below is a supplementary section on how you can define custom, non-native types without arbitrary_types_allowed
, giving you greater control over defining custom or even shorthand types.
Supplemental: Defining Custom Types with Built-In Validators#
In the example of this chapter, we showed how to combine arbitrary_types_allowed
in Config
with the field_validator(..., mode='before')
to convert incoming data to the types not understood by pydantic. There are obvious limitations to this such as having to write a different set of validators for each Model, being limited (or at least confined) in how you can permit types through, and then having to be accepting of arbitrary types.
pydantic provides a separate way to write your custom class validator by extending the class in question. This can be done even to extend existing known types to augment them to special conditions.
Let’s extend a NumPy array type to have be something pydantic can validate without needing to use arbitrary_types_allowed
. There are two ways to do this, either as an Annotated
type where we overload pydantic’s type logic, or as a custom class schema generator. We’ll look at the Annotated
method which the pydantic docs indicate is more stable than the custom class schema generator from an API standpoint.
import numpy as np
from typing_extensions import Annotated
from pydantic.functional_validators import PlainValidator
def cast_to_np(v):
try:
v = np.asarray(v)
except ValueError:
raise ValueError(f"Could not cast {v} to NumPy Array!")
return v
ValidatableArray = Annotated[np.ndarray, PlainValidator(cast_to_np)]
That’s it.
We’ve first taken the Annotated
object from the back-ported typing_extensions
module which will work with Python 3.7+ (from typing import Annotated
works with Python 3.9+ for identical behavior). This object allows you to augment types with additional metadata information for IDEs and other tools such as pydantic without disrupting normal code behavior.
Next we’ve taken augmented np.ndarray
type with the pydantic PlainValidator
method and passed it a function which will overwrite any of pydantic’s normal logic when validating the np.ndarray
. Otherwise pydantic would have attempted to validate against np.ndarray
and we’d be back where we started with the error asking about allow_arbitrary_types
. Instead, we’ve usurped the normal pydantic logic and effectively said “The validator for this type is the function cast_to_np
, send the data there, and if it doesn’t error, we’re good.”
There is FAR more you can do with the Annotated
object and pydantic, including defining multiple Before, After, and Wrap validators for any and all class attributes. For instance, there is a BeforeValidator
which takes a functional argument as well which can be annotated into any data field that will do the same thing as @field_validator(..., mode='before')
. However, advanced usage is best left to the pydantic docs
Let’s apply this to our Molecule
.
This won’t appear in the next chapter
The main Lesson Materials will not have this modification since this is all supplemental. Next chapter will start with the 05_valid_pydantic_molecule.py
Lesson Materials.
import numpy as np
from pydantic import BaseModel, field_validator, ConfigDict
class Molecule(BaseModel):
name: str
charge: float
symbols: list[str]
coordinates: ValidatableArray
@field_validator("coordinates")
@classmethod
def coords_length_of_symbols(cls, coords, info):
symbols = info.data["symbols"]
if (len(coords.shape) != 2) or (len(symbols) != coords.shape[0]) or (coords.shape[1] != 3):
raise ValueError(f"Coordinates must be of shape [Number Symbols, 3], was {coords.shape}")
return coords
@property
def num_atoms(self):
return len(self.symbols)
water = Molecule(**mol_data)
mangle = {**mol_data, **bad_charge, **bad_coords}
water = Molecule(**mangle)
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Input In [104], in <cell line: 2>()
1 mangle = {**mol_data, **bad_charge, **bad_coords}
----> 2 water = Molecule(**mangle)
File ~/miniconda3/envs/pyd-tut/lib/python3.10/site-packages/pydantic/main.py:150, in BaseModel.__init__(__pydantic_self__, **data)
148 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
149 __tracebackhide__ = True
--> 150 __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
ValidationError: 2 validation errors for Molecule
charge
Input should be a valid number [type=float_type, input_value=[1, 0.0], input_type=list]
For further information visit https://errors.pydantic.dev/2.0.3/v/float_type
coordinates
Value error, Coordinates must be of shape [Number Symbols, 3], was (3,) [type=value_error, input_value=['1', '2', '3'], input_type=list]
For further information visit https://errors.pydantic.dev/2.0.3/v/value_error
We removed the model_config
since we no longer are handling arbitrary types: we’re handling the explicit type we defined. We also removed the mode='before'
validator on coordinates
because that work got pushed to the ValidatableArray
. That new Annotated
type we wrote already preempts our custom coords_length_of_symbols
field_validator
because it operates at the same time as the type annotation check, which comes before custom validators in order of operations.
If we wanted to make a custom schema output for our new type, we would need to add another class method called __get_pydantic_core_schema__
. However, please refer to the pydantic docs for more details.
Supplemental: Defining Custom NumPy Type AND Setting Data Type (dtype)#
It is possible to set the NumPy array dtype
as well as part of the type checking without having to define multiple custom types. This approach is not related to pydantic per se, but is a showcase of chaining several very advanced Python topics together.
In the previous Supplemental, we showed how to write an augmented type with Annotated
to define a NumPy ndarray
type in pydantic. We cast the input data to a numpy array with the np.asarray
. That function can also accept a dtype=...
argument where you can specify the type of data the array will be. How would you support arbitrarily setting the dtype
?
There are several, equally acceptable and perfectly valid, approaches to this.
Multiple Validators#
One option would be to make multiple types of validators and call the one you need. And there are several ways to do this. The first way is to just make multiple annotated types.
def cast_to_np_int(v):
try:
v = np.asarray(v, dtype=int)
except ValueError:
raise ValueError(f"Could not cast {v} to NumPy Array!")
return v
def cast_to_np_float(v):
try:
v = np.asarray(v, dtype=float)
except ValueError:
raise ValueError(f"Could not cast {v} to NumPy Array!")
return v
IntArray = Annotated[np.ndarray, PlainValidator(cast_to_np_int)]
FloatArray = Annotated[np.ndarray, PlainValidator(cast_to_np_float)]
class IntMolecule(Molecule):
coordinates: IntArray
class FloatMolecule(Molecule):
coordinates: FloatArray
print(IntMolecule(**mol_data).coordinates)
print(FloatMolecule(**mol_data).coordinates)
[[0 0 0]
[1 1 1]
[2 2 2]]
[[0. 0. 0.]
[1. 1. 1.]
[2. 2. 2.]]
A valid approach, can be dropped in when needed. However, this involves code duplication.
We can cut down on the work by defining a function which accepts a keyword and use functools.partial
to lock the keyword in.
from functools import partial
def cast_to_np(v, dtype=None):
try:
v = np.asarray(v, dtype=dtype)
except ValueError:
raise ValueError(f"Could not cast {v} to NumPy Array!")
return v
IntArray = Annotated[np.ndarray, PlainValidator(partial(cast_to_np, dtype=int))]
FloatArray = Annotated[np.ndarray, PlainValidator(partial(cast_to_np, dtype=float))]
class IntMolecule(Molecule):
coordinates: IntArray
class FloatMolecule(Molecule):
coordinates: FloatArray
print(IntMolecule(**mol_data).coordinates)
print(FloatMolecule(**mol_data).coordinates)
[[0 0 0]
[1 1 1]
[2 2 2]]
[[0. 0. 0.]
[1. 1. 1.]
[2. 2. 2.]]
Make an on Demand Typer Function#
One option is to just make a function create types on demand.
def array_typer(dtype):
def cast_to_np(v):
try:
v = np.asarray(v, dtype=dtype)
except ValueError:
raise ValueError(f"Could not cast {v} to NumPy Array!")
return v
return Annotated[np.ndarray, PlainValidator(cast_to_np)]
class IntMolecule(Molecule):
coordinates: array_typer(int)
class FloatMolecule(Molecule):
coordinates: array_typer(float)
print(IntMolecule(**mol_data).coordinates)
print(FloatMolecule(**mol_data).coordinates)
[[0 0 0]
[1 1 1]
[2 2 2]]
[[0. 0. 0.]
[1. 1. 1.]
[2. 2. 2.]]
But this has the problem of now having to regenerate a new Annotated
type each time, and its type schema will always have the same signature. This isn’t a problem most of the time, but it can be a little confusing to suddenly see functions in type annotation instead of the normal types and square brackets.
Custom Core Schema#
We’re going to take a look at the other way pydantic has for defining a custom type, one way they specifically suggest for NumPy in their own documentations, namely a custom core schema. We avoided this in the previous blocks of the lessons because the pydantic docs say this functionality touches the underlying pydantic-core
functionality. And while it does have an API (and follows semantic versioning), its also the section most likely to change according to them.
We do want to look at this approach because we are abusing the PlainValidator
a bit to overload pydantic’s internal type checking.
We’re going to build this piece by piece, with the understanding that it won’t work fully until we’ve constructed it. Effectively: writing the instructions for pydantic to handle this with mostly native pydantic-core
functions.
class ValidatableArray:
@classmethod
def __get_pydantic_core_schema__(cls, source, handler):
pass
This is the primary work method for the core schema. The __get_pydantic_core_schema__
is the method that pydantic will look for when validating this type of data.
source
is the class we are generating a schema for; this will generally be the same as the cls
argument if this is a classmethod.
handler
is the call into Pydantic’s internal JSON schema generation logic. Since we’re writing our own schema generator for something that Pydantic does not natively understand, we likely won’t need source
or handler
at all for most uses. However, we will be taking advantage of source
and some other Python typing
tools as well.
We’ll fill in everything we want to do in the function itself.
from pydantic_core import core_schema
def cast_to_np(v):
try:
v = np.asarray(v)
except ValueError:
raise ValueError(f"Could not cast {v} to NumPy Array!")
return v
class ValidatableArray:
@classmethod
def __get_pydantic_core_schema__(cls, source, handler):
"""
We return a pydantic_core.CoreSchema that behaves in the following ways:
* Data will be cast to ndarrays with the correct dtype
* `ndarrays` instances will be parsed as `ndarrays` and cast to the correct dtype
* Serialization will cast the ndarray to list
"""
schema = core_schema.no_info_plain_validator_function(cast_to_np)
return schema
We’ve added back in our actual “cast to NumPy” function we’ve used previously, and then we have added a function from the core_schema
object of pydantic_core
. The no_info_plain_validator
function which is what generates a schema for PlainValidator
as we have seen before. We finally return the schema generated from the function, although we can further manipulate it later as needed.
There are also other calls such as a general_plain_validator_function
which supports additional info being fed into the function as a secondary argument, or no_info_after_validator_function
which would make an AfterValidator
, but we’re not going to cover those topics here.
Thus far, all we have done is cast to a NumPy array, which is good! We have done that before with the Annotated
method, but this sets us up for much more powerful manipulation later if we want. We also want to make sure
class ArrMolecule(Molecule):
coordinates: ValidatableArray
print(ArrMolecule(**mol_data).coordinates)
print(ArrMolecule(**mol_data).coordinates)
[[0 0 0]
[1 1 1]
[2 2 2]]
[[0 0 0]
[1 1 1]
[2 2 2]]
Great! We’ve made a validatable array. Now we’re going to extend this approach to handle passing types in as part of the array construction. However, our dtype option isn’t used anywhere. To fix that, we’re going to expand on this with some of Python’s native typing
tools.
from typing import Sequence, TypeVar
from pydantic_core import core_schema
dtype = TypeVar("dtype")
def cast_to_np(v):
try:
v = np.asarray(v)
except ValueError:
raise ValueError(f"Could not cast {v} to NumPy Array!")
return v
class ValidatableArray(Sequence[dtype]):
@classmethod
def __get_pydantic_core_schema__(cls, source, handler):
"""
We return a pydantic_core.CoreSchema that behaves in the following ways:
* Data will be cast to ndarrays with the correct dtype
* `ndarrays` instances will be parsed as `ndarrays` and cast to the correct dtype
* Serialization will cast the ndarray to list
"""
schema = core_schema.no_info_plain_validator_function(cast_to_np)
return schema
We’ve now established a custom Python type we are calling dtype
, which is a common term in NumPy space, but we’re going to focus on the more general case for now and specialize later.
The new object dtype
is now recognized as a valid Python type, even though nothing in the Python space or any of our modules use this, that’s okay! We’re going to use it as a placeholder for accepting an index/argument to the ValidatableArray
class.
Speaking of, the ValidatableArray
is now a subclass of two things: the Sequence
from Python’s typing
library, and our placeholder dtype
type as an index/argument. Although it is square brackets, []
, we’ll refer to these as “arguments” as they effectively are for types. We chose the Sequence
instead of Generic
from typing
because at its core, NumPy arrays are sequences, just very formatted and specialized ones. This approach would have worked with Generic
too, but we’re opting to be more verbose.
So far, nothing has changed, everything will continue to run exactly as we have designed it previously, however, we can now specify an argument to the ValidatableArray
. Observe:
class ArrMolecule(Molecule):
coordinates: ValidatableArray[float]
print(ArrMolecule(**mol_data).coordinates)
print(ArrMolecule(**mol_data).coordinates)
[[0 0 0]
[1 1 1]
[2 2 2]]
[[0 0 0]
[1 1 1]
[2 2 2]]
So now let’s change our code to actually do something with that new argument in our function to specify what the dtype should be for the arrays.
from typing import Sequence, TypeVar
from typing_extensions import get_args
from pydantic_core import core_schema
dtype = TypeVar("dtype")
def generate_caster(dtype_input):
def cast_to_np(v):
try:
v = np.asarray(v, dtype=dtype_input)
except ValueError:
raise ValueError(f"Could not cast {v} to NumPy Array!")
return v
return cast_to_np
class ValidatableArray(Sequence[dtype]):
@classmethod
def __get_pydantic_core_schema__(cls, source, handler):
"""
We return a pydantic_core.CoreSchema that behaves in the following ways:
* Data will be cast to ndarrays with the correct dtype
* `ndarrays` instances will be parsed as `ndarrays` and cast to the correct dtype
* Serialization will cast the ndarray to list
"""
dtype_arg = get_args(source)[0]
validator = generate_caster(dtype_arg)
schema = core_schema.no_info_plain_validator_function(validator)
return schema
class FloatArrMolecule(Molecule):
coordinates: ValidatableArray[float]
class IntArrMolecule(Molecule):
coordinates: ValidatableArray[int]
print(FloatArrMolecule(**mol_data).coordinates)
print(FloatArrMolecule(**mol_data).coordinates)
print("")
print(IntArrMolecule(**mol_data).coordinates)
print(IntArrMolecule(**mol_data).coordinates)
[[0. 0. 0.]
[1. 1. 1.]
[2. 2. 2.]]
[[0. 0. 0.]
[1. 1. 1.]
[2. 2. 2.]]
[[0 0 0]
[1 1 1]
[2 2 2]]
[[0 0 0]
[1 1 1]
[2 2 2]]
Ta-da! We’ve now used the get_args
function from typing_extensions
(native in typing
in Python 3.9+) to get the argument we fed into the ValidatableArray
, established a generator function for dtypes in generate_caster
, and then used all of that information to make our NumPy arrays of a specific type. All of this shows the power of the customization we can do with pydantic. There are some less-boilerplate ways to do this in pydantic, but we leave that up to you to read the docs to find out more.
Before we move past this, there are a couple of notes to make about this approach:
This is a relatively slow process in that the generator will be made for every validation, that could be faster.
As written, you MUST pass an arg to
ValidatableArray
, but it could be rewritten to avoid that.
We’ve specifically written this example to use generic Python type objects and methods. However, NumPy does have its own native types as of 1.20 and 1.21 we can use instead of the Generic
and Sequence
, or defining our own arbitrary type with TypeVar
to make IDE’s happy. Below is the example of this.
from typing_extensions import get_args, Annotated
from pydantic_core import core_schema
from numpy.typing import NDArray
def generate_caster(dtype):
def cast_to_np(v):
try:
v = np.asarray(v, dtype=dtype)
except ValueError:
raise ValueError(f"Could not cast {v} to NumPy Array!")
return v
return cast_to_np
class ValidatableArrayAnnotation:
@classmethod
def __get_pydantic_core_schema__(cls, source, handler):
"""
We return a pydantic_core.CoreSchema that behaves in the following ways:
* Data will be cast to ndarrays with the correct dtype
* `ndarrays` instances will be parsed as `ndarrays` and cast to the correct dtype
* Serialization will cast the ndarray to list
"""
shape, dtype_alias = get_args(source)
dtype = get_args(dtype_alias)[0]
validator = generate_caster(dtype)
schema = core_schema.no_info_plain_validator_function(validator)
return schema
ValidatableArray = Annotated[NDArray, ValidatableArrayAnnotation]
class FloatArrMolecule(Molecule):
coordinates: ValidatableArray[float]
class IntArrMolecule(Molecule):
coordinates: ValidatableArray[int]
print(FloatArrMolecule(**mol_data).coordinates)
print(FloatArrMolecule(**mol_data).coordinates)
print("")
print(IntArrMolecule(**mol_data).coordinates)
print(IntArrMolecule(**mol_data).coordinates)
[[0. 0. 0.]
[1. 1. 1.]
[2. 2. 2.]]
[[0. 0. 0.]
[1. 1. 1.]
[2. 2. 2.]]
[[0 0 0]
[1 1 1]
[2 2 2]]
[[0 0 0]
[1 1 1]
[2 2 2]]
This approach now annotates the NDArray
with additional information as per the pydantic docs which is then passed to the ValidatableArrayAnnotation
, and takes use of the NumPy type hint format and behavior for NDArray
. This also has problems as in the end we are trying to reverse engineer a type hint into a formal dtype
for NumPy, which isn’t exactly clear-cut. E.g.:
print(NDArray)
print(NDArray[int])
default_second = get_args(get_args(NDArray)[1])[0]
print(default_second)
print(type(default_second))
numpy.ndarray[typing.Any, numpy.dtype[+ScalarType]]
numpy.ndarray[typing.Any, numpy.dtype[int]]
+ScalarType
<class 'typing.TypeVar'>
Its very unclear how, if you provide no arguments, to convert the TypeVar
(which is what you get from the TypeVar
function of “+ScalarType” into None
which would be the default behavior of dtype=...
style arguments. Sure you could hard code it, but will that always be the case? That’s up to you and beyond what this example hopes to show you.
Metaclasses of the Past#
At one point in this lesson back in Pydantic v1, we talked about Python Metaclass as a way to define a class generator whose properties are set dynamically, then usable by the class. BUT…
Metaclasses be Forbidden Magics
“Metaclasses are deeper magic than 99% of users should ever worry about. If you wonder whether you need them, you don’t (the people who actually need them know with certainty that they need them, and don’t need an explanation about why).”
— Tim Peters, Author of Zen of Python, PEP 20
Metaclasses are usually not something you want to touch, because you don’t need to. The above methods provide a fine way to generate type hints dynamically. However, if you want to be fancy, you can use a Metaclass. The best primer I, Levi Naden, have found on Metaclasses at the time of writing this section (Fall 2022) was through this Stack Overflow answer.
To be honest: you’re probably better off writing a custom core schema as pydantic suggests as above than messing with a Metaclass.
Do what makes sense, and only if you need to#
All of these methods are equally valid, with upsides and downsides alike. Your use case may not even need dtype
specification and you can just accept the normal NumPy handling of casting to array plus your own custom validator
functions to make sure the data look correct. Hopefully though this supplemental section has given you ideas to inspire your own code design and give you ideas on interesting and helpful things you can do.