{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Working with NumPy Arrays\n", "=========================\n", "\n", "``````{admonition} Overview\n", ":class: overview\n", "\n", "Questions:\n", "\n", "* When should I use NumPy arrays instead of Pandas dataframes?\n", "\n", "* What does the shape of a NumPy array mean?\n", "\n", "* How can I reshape arrays?\n", "\n", "Objectives:\n", "\n", "* Convert data from a Pandas dataframe to a NumPy array.\n", "\n", "* Use the `reshape` function to reshape a NumPy array.\n", "\n", "``````" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas dataframes are built on top of a data structure known as the NumPy Array. If you completed the first MolSSI Python scripting workshop, you are already familiar with some properties of NumPy arrays.\n", "\n", "In general, you should use pandas dataframe when working with data which is:\n", " - Two dimensional (rows and columns).\n", " - Labeled.\n", " - Mixed type.\n", " - Something for which you would like to be able to easily get statistics.\n", " \n", "You should work with NumPy arrays when:\n", " - You have higher dimensional data (collection of two dimensional arrays).\n", " - You need to perform advanced mathematics like linear algebra.\n", " - You are using a library which requires NumPy arrays (scikitlearn)." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "For our discussion of NumPy, we are first going to load our data into a pandas dataframe then convert this data into a NumPy array." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Compound ID | \n", "ESOL predicted log solubility in mols per litre | \n", "Minimum Degree | \n", "Molecular Weight | \n", "Number of H-Bond Donors | \n", "Number of Rings | \n", "Number of Rotatable Bonds | \n", "Polar Surface Area | \n", "measured log solubility in mols per litre | \n", "smiles | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Amigdalin | \n", "-0.974 | \n", "1 | \n", "457.432 | \n", "7 | \n", "3 | \n", "7 | \n", "202.32 | \n", "-0.77 | \n", "OCC3OC(OCC2OC(OC(C#N)c1ccccc1)C(O)C(O)C2O)C(O)... | \n", "
1 | \n", "Fenfuram | \n", "-2.885 | \n", "1 | \n", "201.225 | \n", "1 | \n", "2 | \n", "2 | \n", "42.24 | \n", "-3.30 | \n", "Cc1occc1C(=O)Nc2ccccc2 | \n", "
2 | \n", "citral | \n", "-2.579 | \n", "1 | \n", "152.237 | \n", "0 | \n", "0 | \n", "4 | \n", "17.07 | \n", "-2.06 | \n", "CC(C)=CCCC(C)=CC(=O) | \n", "
3 | \n", "Picene | \n", "-6.618 | \n", "2 | \n", "278.354 | \n", "0 | \n", "5 | \n", "0 | \n", "0.00 | \n", "-7.87 | \n", "c1ccc2c(c1)ccc3c2ccc4c5ccccc5ccc43 | \n", "
4 | \n", "Thiophene | \n", "-2.232 | \n", "2 | \n", "84.143 | \n", "0 | \n", "1 | \n", "0 | \n", "0.00 | \n", "-1.33 | \n", "c1ccsc1 | \n", "