Type Hints in Python#

Starting File: 00_base_molecule.py

This chapter will start from the 00_base_molecule.py and end on the 01_typed_molecule.py.

Python allows the ability to annotate variables and outputs through the power of “Type Hints.” These augment the arguments and optionally the functional returns provide additional data about what types are expected for a particular argument.

Additional Reading

Learn more about Type Hints and Typing from Python’s docs themselves

class Molecule:
    def __init__(self, name, charge, symbols, coordinates):
        self.name = name
        self.charge = charge
        self.symbols = symbols
        self.coordinates = coordinates
        self.num_atom = len(symbols)

    def __str__(self):
        return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"

Inserting Your First Type Hints#

Let’s start from our base Molecule again, posted above. We know what types of variables we want each of these arguments to accept, and so let’s provide type hints to do this.

A type hint is done by adding a : type after a variable in the arguments fields, replacing type with whatever Python type you expect. So if we wanted to annotate name to be a string type, we would write it like (truncated example):

def __init__(self, name: str, charge, symbols, coordinates):
    self.name = name

Now we have augmented the name argument to say this is a should should be a string type. Note that we only provided the type hint to the argument, not to the assignment of self.name = name because that code is internal and not interacted with by a user or by other code. Similarly, if we were to call this code, we don’t provide the type hint on the call, only the definition.

class Molecule:
    def __init__(self, name: str , charge, symbols, coordinates):
        self.name = name  # Note no Type Hint here
        self.charge = charge
        self.symbols = symbols
        self.coordinates = coordinates
        self.num_atom = len(symbols)

    def __str__(self):
        return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"
water = Molecule("water", 0.0, ["H", "H", "O"], [0, 0, 0])  # Note no Type Hint Here

If you try to augment and argument on call, you get an error.

water = Molecule("water": str, 0.0, ["H", "H", "O"], [0, 0, 0])
  Input In [5]
    water = Molecule("water": str, 0.0, ["H", "H", "O"], [0, 0, 0])
                            ^
SyntaxError: invalid syntax

Now let’s assign a type hint to charge. Just like with name, we expect charge to only be certain types. We won’t worry about units in this workshop and instead stick to dimensionless or implicit units. We didn’t specify if this charge is per-atom or net charge, so we don’t know if this needs to be of type int or type float. In the name of being safe, let’s assume we want to assign float as a type hint to charge. Try it now.

Check Your Understanding

Just as we assigned the type hint str to name. What would the def __init__ constructor look like if we assigned the type hint float to charge?

Compound types, what NOT to do, and the typing library#

We showed how to assign a single accepted type to charge by giving it a float type hint, but we also discussed how the type might also be an integer, depending on use case. So, somebody could pass an integer that would also be accepted. How do we represent multiple valid types through type hints?

Naive compound type hints and why they don’t work.#

The naive representation would be to simply extend the type hints like we might a normal set of arguments as such.

What we can’t just do, and because this this, represents an extension of the arguments and obviously if we put in here that would overload of the end type that is native to python

def __init__(self, name: str, charge: float, int, symbols, coordinates):
    pass

Which doesn’t throw an error, but isn’t correct. float is still the type hint to charge, but int is now the name of a variable assigned as an argument and would overload the int type that is native to Python. We can check that with the inspect library.

from inspect import getfullargspec
getfullargspec(__init__).args
['self', 'name', 'charge', 'int', 'symbols', 'coordinates']

Where int is now an argument variable, and trying to use the native int function/type inside that function would not work as expected.

Another idea for grouping arguments is to group them in something like a list or tuple. This does not technically fail, but as the name “type hint” implies, you’re only writing hints, not functional code (we’ll revisit this in Introduction to Pydantic). So long as you don’t write anything syntactically wrong, the hint will be accepted. Here’s a couple examples of syntactically correct but incomprehensible type hints.

def __init__(self, name: str, charge: (float, int), symbols, coordinates):
    pass
def __init__(self, name: str, charge: 123456789, symbols, coordinates):
    pass
def __init__(self, name: str, charge: [1 if "wood" in i else 0 for i in "How much wood could a woodchuck chuck if a woodchuck could chuck wood?".split()], symbols, coordinates):
    pass

Compound type hints, the right way#

The correct way to define multiple valid types is to define a Union of them, just like the concept of “union” from set theory. In this context, “union” simply means “any of the items in this collection are valid.

We’ll need the native Python library called typing to access the Union object. This module contains many different objects that you can bring into your code to represent and create these type hints. If we want to say that charge could be multiple types, we need to import the Union class.

from typing import Union

This Union (and other classes from typing) is a special class of defining generic types which we do NOT call with parentheses, (). Instead, these generic containers wrap their arguments in square brackets, [], with multiple arguments separated by commas. Let’s apply that now to the charge so it is now the union of float and int.

class Molecule:
    def __init__(self, name: str, charge: Union[float, int], symbols, coordinates):
        self.name = name
        self.charge = charge
        self.symbols = symbols
        self.coordinates = coordinates
        self.num_atom = len(symbols)

    def __str__(self):
        return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"

Great! Let’s move on to symbols which will be more complicated. symbols is not a simple type where it’s a singular immutable static object like a str, float or int; it is instead a sequence, specifically a list.

Differences between >=Python 3.9 and before.

Before Python 3.9, you could not use the native types of list, tuple, dict, set, etc. to specify type hints of those types, you had to import equivalently named, but capitalized, classes from the typing module.

So symbols: list[str] didn’t work before python 3.9 and you instead had to do the following:

from typing import List

symbols: List[str]

This chapter will explicitly mention BOTH ways to do it, as both are still valid as of Python 3.10. Later chapters will have a note at their start to alert the user about this, but otherwise will assume the >=Python 3.9 approach.

from typing import Union

class Molecule:
    def __init__(self, name: str, charge: Union[float, int], symbols: list[str], coordinates):
        self.name = name
        self.charge = charge
        self.symbols = symbols
        self.coordinates = coordinates
        self.num_atom = len(symbols)

    def __str__(self):
        return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"

OR

# Python <=3.8 compatible
from typing import Union, List

class Molecule:
    def __init__(self, name: str, charge: Union[float, int], symbols: List[str], coordinates):
        self.name = name
        self.charge = charge
        self.symbols = symbols
        self.coordinates = coordinates
        self.num_atom = len(symbols)

    def __str__(self):
        return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"

Just like with Union, when we use list as a type hint, we provide the arguments in square brackets []. We have now specified that symbols is a “list of strings”, and that list can be of any length.

There is an important point to make about the difference between list and tuple as type hint. list are considered mutable objects in Python, so in our example the list[str] hint indicates the “list is of arbitrary length and every object is a string.” Because of this, list only accepts one argument to its [], and adding more will throw an error.

Heads Up Question

How would you specify multiple valid types for a list using compound types and what we've already covered?

tuple is immutable in Python, so its length is fixed. That also means that the argument positions of tuple’s type hints are exact 1-to-1 matches to the elements of a tuple. Therefore: the number of arguments given to tuple[] should match the tuple’s length.

def some_function(my_tuple: tuple[str, int, float]):
    pass

# Python <=3.8 version
from typing import Tuple
def some_function(my_tuple: Tuple[str, int, float]):
    pass

This type hinting would indicate that my_tuple is length 3 and accepts a string, an integer, and a float; in that order.

Finally, let’s apply a type hint to coordinates and actually define some dimensionality here. We noted back in Type Hints in Python that the coordinates themselves are not the correct dimension to define multiple points in three dimensional space (without some reference). Instead, we want to specify 3 coordinates for an X,Y,Z system for each atom in the symbols list. The more-correct way would be to use a proper 2 dimensional shaped object like a numpy array, but for now let’s think only in pure Python types. What we want is a list of an X,Y,Z coordinate for each atom, wrapped inside of another list.

The Python types hint containers like list, Union, tuple, etc. can all contain other container types within, so it is perfectly valid to have something like a “list of lists”. Let’s assume that our nested lists follow this logic:

  • The outer list loops over atoms/symbols

  • The inner list contains the X, Y, Z coordinates of that specific atom/symbol

  • The coordinates themselves are floats.

Give this logic, we can create a “list of lists of floats” type hint for coordinates. Let’s also take this moment to actually wrap each argument on a new line for visual clarity.

from typing import Union


class Molecule:
    def __init__(self, 
                 name: str, 
                 charge: Union[float, int], 
                 symbols: list[str], 
                 coordinates: list[list[float]]):
        self.name = name
        self.charge = charge
        self.symbols = symbols
        self.coordinates = coordinates
        self.num_atom = len(symbols)

    def __str__(self):
        return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"

OR

# Python <=3.8
from typing import Union, List


class Molecule:
    def __init__(self, 
                 name: str, 
                 charge: Union[float, int], 
                 symbols: List[str], 
                 coordinates: List[List[float]]):
        self.name = name
        self.charge = charge
        self.symbols = symbols
        self.coordinates = coordinates
        self.num_atom = len(symbols)

    def __str__(self):
        return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"

Check Your Understanding

The Heads Up Question asked "How would you specify multiple valid types for a list using compound types?" Given that we now know container types can be in other types, what is the answer to this question?

Type hints don’t get in the way of default arguments either. You just assign the default after the hint like normal:

def __init__(self, 
             name: str = "Some Name"
            ):
    pass

Non-enforcement and Duplicated Work#

We’ve seen the basics of type hints but there are some problems with them. Namely, they are just that: “hints.” There is no native enforcement of the hints, even if we were to feed in gibberish.

# All the arguments are mangled
water = Molecule(Union, set("ABCDefg"), "SOOOOOUUUPPPP!!!", [0, 0, 0])
print(water)
name: typing.Union
charge: {'C', 'B', 'f', 'g', 'e', 'D', 'A'}
symbols: SOOOOOUUUPPPP!!!

The IDE you are using might complain about some of the arguments because the type hinting suggests otherwise, but nothing stops you or anyone from feeding junk into the function. Sure, in a more complex code there would be a syntax error eventually, maybe, but on its face the hints are just that, for now.

We’ve also been using this Molecule class to assign attributes of the same name as the arguments so we can use them later in code. This looks like a fair amount of duplicated work. Real molecule’s, and their computational representation, could have dozens or hundreds of arguments or attributes to assign, and re-writing self.attribute = attribute-like lines over and over is tedious.

In the next chapter, we will look at a native Python class decorator we can use to help reduce the duplication effort and make our code more legible. This will also help our understanding of the structure before we move onto validation and eventually pydantic.