{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "hollywood-packaging",
   "metadata": {},
   "source": [
    "Preparing to Plot\n",
    "==============\n",
    "\n",
    "\n",
    "In this workshop, we will use `matplotlib` to visualize Hydrogen Atom orbitals. If you've taken an introductory quantum chemistry class, you will have seen visualizations of these before, most likely in your textbook. \n",
    "\n",
    "You can see some visualizations [here](https://en.wikipedia.org/wiki/Atomic_orbital#/media/File:Hydrogen_Density_Plots.png). Even if you haven't yet taken quantum chemistry, the shapes of the s and p orbitals will probably be familar to you from your introductory chemistry classes.\n",
    "\n",
    "We will be working with pre-calculated data that is in text files. As part of the set-up you should have downloaded these files, and we will explain what is in them as we visualize them. For the purpose of this workshop, it's not important that you have a deep understanding of the data, or that you understand how it was calculated."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "finished-profession",
   "metadata": {},
   "source": [
    "Reading Data using Pandas\n",
    "---------------------------------\n",
    "\n",
    "In order to plot and create visualizations with our data, we first have to get it into a form that python can recongize and work with. We will be using the python library [pandas](https://pandas.pydata.org/). To read in our text files. Pandas is a library that is very widely used in data science.\n",
    "\n",
    "Using `pandas`, you can read data into python and work with your data. The data we want to work with are in `.csv` files, or `comma separated value` files. The first file we will work with is called `s_orbitals_1D.csv`. This text file contains the value of the 1s, 2s and 3s orbitals in the xy plane for different values of x. If you examine this in a text editor, you will see there is a header, rows, and that values are separated by commas.\n",
    "\n",
    "Pandas can easily read this type of file using a function called `read_csv`. You can also open files like this in Excel, or even save csv files from Excel.\n",
    "\n",
    "First, we will need to import pandas. The pandas library is usually shortened to `pd`. We will use the function `pd.read_csv` to read the csv file. We give the `read_csv` function the path to the file we want to read."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "supported-vacuum",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "s_orbitals = pd.read_csv(\"s_orbitals_1D.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "blessed-elder",
   "metadata": {},
   "source": [
    "Our data is now in the variable called `s_orbitals`. This variable is something called a pandas dataframe. It resembles a spreadsheet - it has rows and columns. We can see a preview of what is in the variable by using `s_orbitals.head()`. It will show us the first five rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "genetic-title",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>r</th>\n",
       "      <th>1s</th>\n",
       "      <th>2s</th>\n",
       "      <th>3s</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.564190</td>\n",
       "      <td>0.199471</td>\n",
       "      <td>0.108578</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.517241</td>\n",
       "      <td>0.336349</td>\n",
       "      <td>0.088163</td>\n",
       "      <td>0.061683</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.034483</td>\n",
       "      <td>0.200519</td>\n",
       "      <td>0.034225</td>\n",
       "      <td>0.029966</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.551724</td>\n",
       "      <td>0.119542</td>\n",
       "      <td>0.009473</td>\n",
       "      <td>0.009313</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2.068966</td>\n",
       "      <td>0.071266</td>\n",
       "      <td>-0.000869</td>\n",
       "      <td>-0.003390</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          r        1s        2s        3s\n",
       "0  0.000000  0.564190  0.199471  0.108578\n",
       "1  0.517241  0.336349  0.088163  0.061683\n",
       "2  1.034483  0.200519  0.034225  0.029966\n",
       "3  1.551724  0.119542  0.009473  0.009313\n",
       "4  2.068966  0.071266 -0.000869 -0.003390"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s_orbitals.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ambient-hunter",
   "metadata": {},
   "source": [
    "You can see from the above preview that we have something that resembles a spreadsheet. Our first column contains x values, while the following columns contain values for the `1s`, `2s`, and `3s` orbitals. We are going to plot these eventualy, but first we have to understand a little more about how we can use the data we have read in."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "adaptive-punishment",
   "metadata": {},
   "source": [
    "A Brief Introduction to Dataframes\n",
    "-----------------------------------------\n",
    "As stated previously, pandas stores data in rows and columns. You will see above that the rows are numbered and the columns have names. There are a few ways we can access information in a dataframe.\n",
    "\n",
    "To get a column, we use the syntax\n",
    "\n",
    "`dataframe[\"column_name\"]`\n",
    "\n",
    "For example, to get the `r` column of the dataframe we have read in, we would put the column name (\"x\"). It is **very important** that this column name be in quotes and that capitalization match what is in the dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "simple-texas",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      0.000000\n",
       "1      0.517241\n",
       "2      1.034483\n",
       "3      1.551724\n",
       "4      2.068966\n",
       "5      2.586207\n",
       "6      3.103448\n",
       "7      3.620690\n",
       "8      4.137931\n",
       "9      4.655172\n",
       "10     5.172414\n",
       "11     5.689655\n",
       "12     6.206897\n",
       "13     6.724138\n",
       "14     7.241379\n",
       "15     7.758621\n",
       "16     8.275862\n",
       "17     8.793103\n",
       "18     9.310345\n",
       "19     9.827586\n",
       "20    10.344828\n",
       "21    10.862069\n",
       "22    11.379310\n",
       "23    11.896552\n",
       "24    12.413793\n",
       "25    12.931034\n",
       "26    13.448276\n",
       "27    13.965517\n",
       "28    14.482759\n",
       "29    15.000000\n",
       "Name: r, dtype: float64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s_orbitals[\"r\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "documentary-embassy",
   "metadata": {},
   "source": [
    "We can also \"slice\" the dataframe to get only a portion of it. We would do this if we wanted only some of the rows, for example. The following syntax shows how to get information if we want to use *row and column **numbers***.\n",
    "\n",
    "```\n",
    "dataframe.iloc[row_slice, column_slice]\n",
    "```\n",
    "\n",
    "We use the same slicing syntax that we use for python lists and numpy arrays. For example, to get the first 10 rows of the first two columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "handy-hardwood",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>r</th>\n",
       "      <th>1s</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.564190</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.517241</td>\n",
       "      <td>0.336349</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.034483</td>\n",
       "      <td>0.200519</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.551724</td>\n",
       "      <td>0.119542</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2.068966</td>\n",
       "      <td>0.071266</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2.586207</td>\n",
       "      <td>0.042486</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>3.103448</td>\n",
       "      <td>0.025329</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>3.620690</td>\n",
       "      <td>0.015100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>4.137931</td>\n",
       "      <td>0.009002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>4.655172</td>\n",
       "      <td>0.005367</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          r        1s\n",
       "0  0.000000  0.564190\n",
       "1  0.517241  0.336349\n",
       "2  1.034483  0.200519\n",
       "3  1.551724  0.119542\n",
       "4  2.068966  0.071266\n",
       "5  2.586207  0.042486\n",
       "6  3.103448  0.025329\n",
       "7  3.620690  0.015100\n",
       "8  4.137931  0.009002\n",
       "9  4.655172  0.005367"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s_orbitals.iloc[:10, :2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fourth-stereo",
   "metadata": {},
   "source": [
    "Pandas is a very useful library for manipulating data in python. We won't be using it extensively for this workshop, but we will need to use it a little to select the data which we would like to plot. If you would like to learn more about the capabilities of pandas, see [this lesson](https://education.molssi.org/python-data-analysis/02-pandas/index.html) from The Molecular Sciences Software Institute."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}