Type Hints in Python#
Starting File: 00_base_molecule.py
This chapter will start from the 00_base_molecule.py
and end on the 01_typed_molecule.py
.
Python allows the ability to annotate variables and outputs through the power of “Type Hints.” These augment the arguments and optionally the functional returns provide additional data about what types are expected for a particular argument.
Additional Reading
Learn more about Type Hints and Typing from Python’s docs themselves
class Molecule:
def __init__(self, name, charge, symbols, coordinates):
self.name = name
self.charge = charge
self.symbols = symbols
self.coordinates = coordinates
self.num_atom = len(symbols)
def __str__(self):
return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"
Inserting Your First Type Hints#
Let’s start from our base Molecule
again, posted above. We know what types of variables we want each of these arguments to accept, and so let’s provide type hints to do this.
A type hint is done by adding a : type
after a variable in the arguments fields, replacing type
with whatever Python type you expect. So if we wanted to annotate name
to be a string type, we would write it like (truncated example):
def __init__(self, name: str, charge, symbols, coordinates):
self.name = name
Now we have augmented the name
argument to say this is a should should be a string
type. Note that we only provided the type hint to the argument, not to the assignment of self.name = name
because that code is internal and not interacted with by a user or by other code. Similarly, if we were to call this code, we don’t provide the type hint on the call, only the definition.
class Molecule:
def __init__(self, name: str , charge, symbols, coordinates):
self.name = name # Note no Type Hint here
self.charge = charge
self.symbols = symbols
self.coordinates = coordinates
self.num_atom = len(symbols)
def __str__(self):
return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"
water = Molecule("water", 0.0, ["H", "H", "O"], [0, 0, 0]) # Note no Type Hint Here
If you try to augment and argument on call, you get an error.
water = Molecule("water": str, 0.0, ["H", "H", "O"], [0, 0, 0])
Input In [5]
water = Molecule("water": str, 0.0, ["H", "H", "O"], [0, 0, 0])
^
SyntaxError: invalid syntax
Now let’s assign a type hint to charge
. Just like with name
, we expect charge
to only be certain types. We won’t worry about units in this workshop and instead stick to dimensionless or implicit units. We didn’t specify if this charge is per-atom or net charge, so we don’t know if this needs to be of type int
or type float
. In the name of being safe, let’s assume we want to assign float
as a type hint to charge
. Try it now.
Check Your Understanding
Just as we assigned the type hint str
to name
. What would the def __init__
constructor look like if we assigned the type hint float
to charge
?
Solution
def __init__(self, name: str, charge: float, symbols, coordinates):
Compound types, what NOT to do, and the typing
library#
We showed how to assign a single accepted type to charge
by giving it a float
type hint, but we also discussed how the type might also be an integer, depending on use case. So, somebody could pass an integer that would also be accepted. How do we represent multiple valid types through type hints?
Naive compound type hints and why they don’t work.#
The naive representation would be to simply extend the type hints like we might a normal set of arguments as such.
What we can’t just do, and because this this, represents an extension of the arguments and obviously if we put in here that would overload of the end type that is native to python
def __init__(self, name: str, charge: float, int, symbols, coordinates):
pass
Which doesn’t throw an error, but isn’t correct. float
is still the type hint to charge
, but int
is now the name of a variable assigned as an argument and would overload the int
type that is native to Python. We can check that with the inspect
library.
from inspect import getfullargspec
getfullargspec(__init__).args
['self', 'name', 'charge', 'int', 'symbols', 'coordinates']
Where int
is now an argument variable, and trying to use the native int
function/type inside that function would not work as expected.
Another idea for grouping arguments is to group them in something like a list or tuple. This does not technically fail, but as the name “type hint” implies, you’re only writing hints, not functional code (we’ll revisit this in Introduction to Pydantic). So long as you don’t write anything syntactically wrong, the hint will be accepted. Here’s a couple examples of syntactically correct but incomprehensible type hints.
def __init__(self, name: str, charge: (float, int), symbols, coordinates):
pass
def __init__(self, name: str, charge: 123456789, symbols, coordinates):
pass
def __init__(self, name: str, charge: [1 if "wood" in i else 0 for i in "How much wood could a woodchuck chuck if a woodchuck could chuck wood?".split()], symbols, coordinates):
pass
Compound type hints, the right way#
The correct way to define multiple valid types is to define a Union
of them, just like the concept of “union” from set theory. In this context, “union” simply means “any of the items in this collection are valid.
We’ll need the native Python library called typing
to access the Union
object. This module contains many different objects that you can bring into your code to represent and create these type hints. If we want to say that charge
could be multiple types, we need to import the Union
class.
from typing import Union
This Union
(and other classes from typing
) is a special class of defining generic types which we do NOT call with parentheses, ()
. Instead, these generic containers wrap their arguments in square brackets, []
, with multiple arguments separated by commas. Let’s apply that now to the charge
so it is now the union of float
and int
.
class Molecule:
def __init__(self, name: str, charge: Union[float, int], symbols, coordinates):
self.name = name
self.charge = charge
self.symbols = symbols
self.coordinates = coordinates
self.num_atom = len(symbols)
def __str__(self):
return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"
Great! Let’s move on to symbols
which will be more complicated. symbols
is not a simple type where it’s a singular immutable static object like a str
, float
or int
; it is instead a sequence, specifically a list
.
Differences between >=Python 3.9 and before.
Before Python 3.9, you could not use the native types of list, tuple, dict, set, etc. to specify type hints of those types, you had to import equivalently named, but capitalized, classes from the typing
module.
So symbols: list[str]
didn’t work before python 3.9 and you instead had to do the following:
from typing import List
symbols: List[str]
This chapter will explicitly mention BOTH ways to do it, as both are still valid as of Python 3.10. Later chapters will have a note at their start to alert the user about this, but otherwise will assume the >=Python 3.9 approach.
from typing import Union
class Molecule:
def __init__(self, name: str, charge: Union[float, int], symbols: list[str], coordinates):
self.name = name
self.charge = charge
self.symbols = symbols
self.coordinates = coordinates
self.num_atom = len(symbols)
def __str__(self):
return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"
OR
# Python <=3.8 compatible
from typing import Union, List
class Molecule:
def __init__(self, name: str, charge: Union[float, int], symbols: List[str], coordinates):
self.name = name
self.charge = charge
self.symbols = symbols
self.coordinates = coordinates
self.num_atom = len(symbols)
def __str__(self):
return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"
Just like with Union
, when we use list
as a type hint, we provide the arguments in square brackets []
. We have now specified that symbols
is a “list of strings”, and that list can be of any length.
There is an important point to make about the difference between list
and tuple
as type hint. list
are considered mutable objects in Python, so in our example the list[str]
hint indicates the “list is of arbitrary length and every object is a string.” Because of this, list
only accepts one argument to its []
, and adding more will throw an error.
Heads Up Question
How would you specify multiple valid types for a list
using compound types and what we've already covered?
tuple
is immutable in Python, so its length is fixed. That also means that the argument positions of tuple
’s type hints are exact 1-to-1 matches to the elements of a tuple
. Therefore: the number of arguments given to tuple[]
should match the tuple
’s length.
def some_function(my_tuple: tuple[str, int, float]):
pass
# Python <=3.8 version
from typing import Tuple
def some_function(my_tuple: Tuple[str, int, float]):
pass
This type hinting would indicate that my_tuple
is length 3 and accepts a string, an integer, and a float; in that order.
Finally, let’s apply a type hint to coordinates
and actually define some dimensionality here. We noted back in Type Hints in Python that the coordinates
themselves are not the correct dimension to define multiple points in three dimensional space (without some reference). Instead, we want to specify 3 coordinates for an X,Y,Z system for each atom in the symbols
list. The more-correct way would be to use a proper 2 dimensional shaped object like a numpy
array, but for now let’s think only in pure Python types. What we want is a list of an X,Y,Z coordinate for each atom, wrapped inside of another list.
The Python types hint containers like list
, Union
, tuple
, etc. can all contain other container types within, so it is perfectly valid to have something like a “list of lists”. Let’s assume that our nested lists follow this logic:
The outer list loops over atoms/symbols
The inner list contains the X, Y, Z coordinates of that specific atom/symbol
The coordinates themselves are
float
s.
Give this logic, we can create a “list of lists of floats” type hint for coordinates
. Let’s also take this moment to actually wrap each argument on a new line for visual clarity.
from typing import Union
class Molecule:
def __init__(self,
name: str,
charge: Union[float, int],
symbols: list[str],
coordinates: list[list[float]]):
self.name = name
self.charge = charge
self.symbols = symbols
self.coordinates = coordinates
self.num_atom = len(symbols)
def __str__(self):
return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"
OR
# Python <=3.8
from typing import Union, List
class Molecule:
def __init__(self,
name: str,
charge: Union[float, int],
symbols: List[str],
coordinates: List[List[float]]):
self.name = name
self.charge = charge
self.symbols = symbols
self.coordinates = coordinates
self.num_atom = len(symbols)
def __str__(self):
return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"
Check Your Understanding
The Heads Up Question asked "How would you specify multiple valid types for a list
using compound types?" Given that we now know container types can be in other types, what is the answer to this question?
Solution
Yopu will need to make a list of the union of float and int
symbols: list[Union[float, int]] # >=Python 3.9
symbols: List[Union[float, int]] # <=Python 3.8
Type hints don’t get in the way of default arguments either. You just assign the default after the hint like normal:
def __init__(self,
name: str = "Some Name"
):
pass
Non-enforcement and Duplicated Work#
We’ve seen the basics of type hints but there are some problems with them. Namely, they are just that: “hints.” There is no native enforcement of the hints, even if we were to feed in gibberish.
# All the arguments are mangled
water = Molecule(Union, set("ABCDefg"), "SOOOOOUUUPPPP!!!", [0, 0, 0])
print(water)
name: typing.Union
charge: {'C', 'B', 'f', 'g', 'e', 'D', 'A'}
symbols: SOOOOOUUUPPPP!!!
The IDE you are using might complain about some of the arguments because the type hinting suggests otherwise, but nothing stops you or anyone from feeding junk into the function. Sure, in a more complex code there would be a syntax error eventually, maybe, but on its face the hints are just that, for now.
We’ve also been using this Molecule
class to assign attributes of the same name as the arguments so we can use them later in code. This looks like a fair amount of duplicated work. Real molecule’s, and their computational representation, could have dozens or hundreds of arguments or attributes to assign, and re-writing self.attribute = attribute
-like lines over and over is tedious.
In the next chapter, we will look at a native Python class decorator we can use to help reduce the duplication effort and make our code more legible. This will also help our understanding of the structure before we move onto validation and eventually pydantic
.