Python Package Set-Up#
Overview
Questions:
What is the layout of a Python package?
How can I quickly create the structure of a Python package?
What license should I choose for my project?
Objectives:
Explain Python package structure.
Use the CMS CookieCutter to build a Python package.
For this workshop, we are going to create a Python package that performs analysis and creates visualizations for molecules. We will start with a Jupyter Notebook that has some functions and analysis. (You should have downloaded the Jupyter Notebook during setup (Set Up).
The idea is that we would like to take this Jupyter Notebook and convert the functions we have created into a Python package. That way, if anyone (a lab-mate, for example) would like to use our functions, they can do so by installing the package and importing it into their own scripts.
To start, we will first use a tool called CookieCutter, which will set up a Python package structure and several tools we will use during the workshop.
Examples of Python package structure#
If you look at the GitHub repositories for several large Python packages such as numpy, scipy, or scikit-learn, you will notice a lot of similarities between the directory layouts of these projects.
Having a similar way to lay out Python packages allows people to more easily understand and contribute to your code.
The molecool
directory#
We will first examine some files in the molecool
directory.
This is the directory that contains our package code.
The __init__.py
file#
The __init__.py
file is a special file recognized by the Python interpreter which makes a directory into a package.
This file can be blank in some cases, however, we will use it to define how the user interacts with the functions in our package.
Contents of molecool/molecool/__init__.py
:
"""A Python package for analyzing and visualizing xyz files."""
# Add imports here
from .functions import *
from ._version import __version__
The very first section of this file contains a string opened and closed with three quotations. This is a docstring, and has a short description of the file.
The section we will be concerned with is under # Add imports here
.
This is how we define the way functions from modules are used.
In particular, the line
from .functions import *
goes to the functions.py
file, and brings everything that is defined there into the file.
When we use our function defined in functions.py
,
that means we will be able to just say molecool.canvas()
instead of giving the full path molecool.functions.canvas()
.
If that’s confusing, don’t worry too much for now.
We will be returning to __init__.py
in a few minutes.
For now, just note that it exists and makes our directory into a package.
Our first module#
Once inside the molecool
folder (molecool/molecool
), examine the files that are there.
View the module (functions.py
) in a text editor.
We see a few things about this file.
The top begins with a description of this module surrounded by three quotations ("""
).
Right now, that is the sentence “Provides the primary functions”.
We will change this to be more descriptive later.
CookieCutter has also created a placeholder function called canvas
.
At the start of the canvas
function, we have a docstring
(more about this in [documentation]),
which describes the function.
We will be moving all of the functions we defined in the Jupyter notebook into python modules (.py
files) like these.
Before proceeding, make sure your pip and setuptools packages are up-to-date
conda update pip setuptools
Installing from local source.#
You may be accustomed to pip
automatically retrieving packages from the internet.
To develop this package, we will want to use what is called “development mode”, or an “editable install”,
so that we can try out our functions and package as we develop it.
We access development mode using the -e
option to pip
.
Reviewing the generated config files#
Return to the top directory (molecool
).
Two of the files CookieCutter generated are pyproject.toml
and setup.cfg
.
These are the configuration files for our packaging and testing tools.
pyproject.toml
tells setuptools about your package (such as the name and version) as well as which code files to include.
We’ll be using this file in the next section.
Installing your package#
A development install, also known as an “editable” install, will allow you to import your package and use it from anywhere on your computer.
You will then be able to import your package into scripts in the same way you import matplotlib
or numpy
.
This setup is particularly useful during development, as it ensures that any changes you make to your package’s code are immediately reflected in your Python environment, without the need for reinstallation.
To perform a development installation, you use pip with the -e
option, which stands for “editable”.
This tells pip to install the package in such a way that it links directly to your project’s source code.
pip install -e .
Here, the -e
indicates that we are installing this project in editable mode
(i.e. setuptools
development mode),
while .
indicates to install from the local directory (you could also specify a path here).
When you install a Python package using either pip
or conda
, that package is installed in your Python
environment’s site-packages
folder.
You can see where this is by checking your Python system path.
To do this, open Python (type python
into your terminal window), and type
>>> import sys
>>> sys.path
This will output a list of locations that Python searches for packages.
One of these will typically end with site-packages
, indicating the directory where Python looks for installed packages.
If you examine site-packages
, you are likely to see a folder with your package’s name followed by a .dist-info
.
Inside, the direct_url.json
file signifies an editable installation by pointing back to your project’s directory.
Python Packaging’s Rapidly Evolving Landscape
In recent years, the Python packaging ecosystem has seen the development of numerous tools designed to streamline the process.
While the MolSSI CookieCutter primarily utilizes setuptools
and pyproject.toml
for packaging, alternatives like poetry and flit offer different features and workflows.
Depending on your tool of choice or even your Python version, you may encounter various files or configurations within site-packages
following an editable installation.
After our install, we can use our package from any directory, similar to how we can use other installed packages like numpy
.
Open Python, and type
>>> import molecool
>>> molecool.canvas()
This should print a quote.
'The code is but a canvas to our imagination.\n\t- Adapted from Henry David Thoreau'
A development installation inserts a link to your project into your Python
site-packages
folder so that updates are immediately available the next time
you launch Python, without having to reinstall your package.
Check Your Understanding
What happens if we use conda deactivate
and attempt to execute the code above?
What if we switch directories?
Solution
If you are in the project directory, the code will still work. However, it will not work in any other location.
Key Points#
Key Points
There is a common way to structure Python packages.
You can use the CMS CookieCutter to quickly create the layout for a Python package.
An editable installation allows you to use your package from anywhere on your computer, and with updates immediately available.