Numerical Data#

Overview

This notebook focuses on the use of libraries in Julia to manipulate numerical data, such as matrices and vectors. It covers the basics of creating and manipulating arrays, as well as performing mathematical operations on them.

Besides that, we also explore the use of DataFrames for handling tabular data, and Plots for visualizing data.

Key Libraries

  • LinearAlgebra: For linear algebra operations.

  • DataFrames: For handling tabular data.

  • Plots: For data visualization.

Linear Algebra#

Here we’ll demonstrate some basic linear algebraic functionality using the Base (i.e. built into Julia and doesn’t need to be installed, only imported) LinearAlgebra package.

using LinearAlgebra

Let’s initialize a nice matrix:

A = [1 2 3; 4 5 6; 7 8 9]
3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9

What’s its determinant? (Oops, maybe it’s not so nice)

det(A)
0.0

Okay, let’s make a new matrix and take the inverse (NB: this should of course be avoided in numerical computing generally, but the function does exist so we’re showing it to you):

B = [2 1 1; 1 2 1; 1 1 2]
inv(B)
3×3 Matrix{Float64}:
  0.75  -0.25  -0.25
 -0.25   0.75  -0.25
 -0.25  -0.25   0.75

How about eigendecomposition?

eig_B = eigen(B)
Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
3-element Vector{Float64}:
 0.9999999999999998
 1.0
 3.9999999999999987
vectors:
3×3 Matrix{Float64}:
 -0.408248   0.707107  -0.57735
 -0.408248  -0.707107  -0.57735
  0.816497   0.0       -0.57735

By default an Eigen object is returned, which we can pull out the fields of as eig_B.values or eig_B.vectors, but we can also pre-assign them like this:

vals, vecs = eigen(B)
Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
3-element Vector{Float64}:
 0.9999999999999998
 1.0
 3.9999999999999987
vectors:
3×3 Matrix{Float64}:
 -0.408248   0.707107  -0.57735
 -0.408248  -0.707107  -0.57735
  0.816497   0.0       -0.57735

Note that the eigenvectors are the columns…

vecs
3×3 Matrix{Float64}:
 -0.408248   0.707107  -0.57735
 -0.408248  -0.707107  -0.57735
  0.816497   0.0       -0.57735

Let’s say we want to solve the following system of equations (and maybe find out some other things about the matrix \(A\) while we’re at it…):

\[\begin{split} x_1 + 2x_2 - x_3 + x_4 = 1 \\ 2x_1 - x_2 + 3x_3 - 2x_4 = 5 \\ -3x_1 + 4x_2 + 2x_3 + x_4 = 7 \\ x_1 - 3x_2 + 2x_3 - 4x_4 = -2 \\ \end{split}\]

This can be written in matrix form Ax = b as:

\[\begin{split}A = \begin{bmatrix} 1 & 2 & -1 & 1 \\ 2 & -1 & 3 & -2 \\ -3 & 4 & 2 & 1 \\ 1 & -3 & 2 & -4 \\ \end{bmatrix} \end{split}\]
\[\begin{split}x = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \\ \end{bmatrix}\end{split}\]
\[\begin{split}b = \begin{bmatrix} 1 \\ 5 \\ 7 \\ -2 \\ \end{bmatrix}\end{split}\]

We can solve this with a backslash, just like in MATLAB.

# Define matrix A and vector b
A = [1 2 -1 1; 2 -1 3 -2; -3 4 2 1; 1 -3 2 -4]
b = [1, 5, 7, -2]

# Solve for x using the backslash operator
x = A \ b

# Compute the rank of A
rank_A = rank(A)

# Compute the null space of A
null_space_A = nullspace(A)

# Compute the condition number of A
cond_A = cond(A)

# Display results
println("Solution x: ", x)
println("Rank of A: ", rank_A)
println("Null space of A: ", null_space_A)
println("Condition number of A: ", cond_A)
Solution x:
[0.6235294117647058, 0.7176470588235293, 2.3529411764705883, 1.2941176470588236]
Rank of A: 4
Null space of A: Matrix
{Float64}(undef, 4, 0)
Condition number of A: 7.463516139995443

Special Matrices#

Julia provides for a bunch of special matrix types on which standard operations are further optimized; you can read more about them here. We explore a few examples below.

Symmetric Matrices: Julia provides the Symmetric type for creating symmetric matrices, where the matrix is equal to its transpose. This saves storage space by storing only the upper triangular part of the matrix.

# Create a symmetric matrix
A = [1 2 3; 2 4 5; 3 5 6]
S = Symmetric(A)

println(S)
[1 2 3; 2 4 5; 3 5 6]

Sparse Matrices: Sparse matrices are useful when you have a large matrix with mostly zero elements. Julia provides efficient storage and operations for sparse matrices via the SparseArrays package.

using SparseArrays

# Create a sparse matrix by specifying rows, columns, and values of nonzero entries as well as dimensions
S = sparse([1, 3, 4], [2, 1, 3], [10, 20, 30], 5, 5)
5×5 SparseMatrixCSC{Int64, Int64} with 3 stored entries:
  ⋅  10   ⋅  ⋅  ⋅
  ⋅   ⋅   ⋅  ⋅  ⋅
 20   ⋅   ⋅  ⋅  ⋅
  ⋅   ⋅  30  ⋅  ⋅
  ⋅   ⋅   ⋅  ⋅  ⋅

Diagonal Matrices: Julia has a Diagonal type that stores only the diagonal elements of the matrix, making it memory-efficient and fast for certain operations.

# Create a diagonal matrix
D = Diagonal([1, 2, 3])
3×3 Diagonal{Int64, Vector{Int64}}:
 1  ⋅  ⋅
 ⋅  2  ⋅
 ⋅  ⋅  3

Block Diagonal Matrices: Using the BlockDiagonals package (or other similar libraries), you can create block diagonal matrices that store each diagonal block separately.

using BlockDiagonals

# Create a block diagonal matrix
D1 = Diagonal([1, 2, 3])
D2 = Diagonal([3, 4, 5])
BD = BlockDiagonal([D1, D2])
6×6 BlockDiagonal{Int64, Diagonal{Int64, Vector{Int64}}}:
 1  0  0  0  0  0
 0  2  0  0  0  0
 0  0  3  0  0  0
 0  0  0  3  0  0
 0  0  0  0  4  0
 0  0  0  0  0  5

Hermitian Matrices: (for the quantum mechanicians among you) The Hermitian type in Julia is used for Hermitian matrices, which are complex square matrices that are equal to their own conjugate transpose.

# Create a Hermitian matrix
B = [1+im 2 3; 2 4 5; 3 5-im 6] # not Hermitian as-written
H = Hermitian(B) # but this constructor will enforce it...
3×3 Hermitian{Complex{Int64}, Matrix{Complex{Int64}}}:
 1+0im  2+0im  3+0im
 2+0im  4+0im  5+0im
 3+0im  5+0im  6+0im

DataFrames#

The DataFrames package provides a similar set of functionality to pandas in Python (albeit with fairly different and arguably more principled syntax). We’ll import the package and start by creating a simple DataFrame to experiment with.

using DataFrames, Statistics
df = DataFrame(Name=["John", "Jane", "Jim"], Age=[28, 34, 45], Salary=[50000, 62000, 72000])
3×3 DataFrame
RowNameAgeSalary
StringInt64Int64
1John2850000
2Jane3462000
3Jim4572000

Add a new column:

df.Status = ["Single", "Married", "Single"]
3-element Vector{String}:
 "Single"
 "Married"
 "Single"

Let’s filter for people over 30…

filtered_df = filter(row -> row.Age > 30, df)
2×4 DataFrame
RowNameAgeSalaryStatus
StringInt64Int64String
1Jane3462000Married
2Jim4572000Single

The describe function gives us some summary statistics…

describe(df)
4×7 DataFrame
Rowvariablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64DataType
1NameJaneJohn0String
2Age35.66672834.0450Int64
3Salary61333.35000062000.0720000Int64
4StatusMarriedSingle0String

There’s also grouping and aggregate calculation functionality…

grouped = groupby(df, :Status)
agg_df = combine(grouped, :Salary => mean => :AvgSalary)
2×2 DataFrame
RowStatusAvgSalary
StringFloat64
1Single61000.0
2Married62000.0

Random Number (and more!) generation#

As with any self-respecting scientific programming language, Julia has extensive functionality for randomness. The core base function is rand, which we demonstrate in a few (of many) variations below…

rand() # without arguments, will draw a single float from U(0,1)
0.39818004938839724
rand(3,2) # now make it a matrix
3×2 Matrix{Float64}:
 0.284743  0.983289
 0.51749   0.276224
 0.998884  0.8045
rand(Int, 2, 2) # we can also specify a type; now it will draw from the full range of values for that type
2×2 Matrix{Int64}:
 -5891096881748942509  7515340938221196441
  1735387024512748016  6734517492616934484
rand(['a', 'b', 'c']) # can also draw from a provided collection of objects
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
rand(0:2:100) # anything iterable counts
28
randn(1,4) # draw from standard normal
1×4 Matrix{Float64}:
 -0.677751  0.627806  -1.14872  -0.308282
using Plots

Let us use rand and randn to generate and plot distributions…

# Generate sample data for plotting
normal_samples = randn(1000) * 2 .+ 5    # Normal distribution N(5, 2)
uniform_samples = rand(1000) * 10       # Uniform distribution U(0, 10)

# Plot histograms for comparison
histogram(normal_samples, bins=30, alpha=0.5, label="Normal(5, 2)", xlabel="Value", ylabel="Frequency")
histogram!(uniform_samples, bins=30, alpha=0.5, label="Uniform(0, 10)")

In this case, we see the use of ! in a slightly different way than was explored in the syntax section, though conceptually the same. The histogram! function modifies the existing plot, but we don’t actually have to pass it as an argument (though we can), so the thing being modified “in-place” is not explicitly being passed in this case.

More plot examples#

(there are many more available, these are just a few examples of commonly-desired plot types!)

Line Plot: A basic line plot with labels and title.

# Generate some data
x = 1:10
y = x .^ 2

# Line plot
plot(x, y, label="y = x^2", xlabel="x", ylabel="y", title="Line Plot")

Scatter Plot: A scatter plot to show individual points.

# Scatter plot
scatter(x, y, label="Scatter", xlabel="x", ylabel="y", title="Scatter Plot")

Bar Plot: A bar plot to visualize categorical data.

# Bar plot
categories = ["A", "B", "C", "D"]
values = [5, 9, 3, 7]

bar(categories, values, label="Values", title="Bar Plot", xlabel="Category", ylabel="Value")

Heatmap: A heatmap to show a matrix of values.

# Heatmap data
z = rand(10, 10)

# Heatmap plot
heatmap(z, title="Heatmap", xlabel="X-axis", ylabel="Y-axis")

3D Plot: A 3D plot to visualize functions or data in three dimensions.

# 3D plot data
x = -5:0.1:5
y = -5:0.1:5
z = [sin(sqrt(xi^2 + yi^2)) for xi in x, yi in y]

# 3D surface plot
plot(x, y, z, st=:surface, title="3D Plot", xlabel="X", ylabel="Y", zlabel="Z")

Note: The Plots package is a powerful and flexible plotting library in Julia, and it supports many different plot types and customization options. You can refer to the Plots.jl documentation for more information on how to create different types of plots and customize them to suit your needs.

Unitful#

Unitful.jl is a Julia package for handling units and dimensions. It can be very useful for doing unit conversions and catching dimensional errors, but is also sometimes more trouble than it’s worth to actually store every quantity in your code with units…

using Unitful
1.0u"m/s"
1.0 m s^-1
1.0u"N*m"
1.0 m N
u"m,kg,s"
(m, kg, s)
typeof(1.0u"m/s")
Quantity{Float64, 𝐋 𝐓^-1, Unitful.FreeUnits{(m, s^-1), 𝐋 𝐓^-1, nothing}}
u"ħ"
1.0545718176461565e-34 J s

Converting between units#

Convert a Unitful.Quantity to different units. The conversion will fail if the target units a have a different dimension than the dimension of the quantity x. You can use this method to switch between equivalent representations of the same unit, like N m and J.

uconvert(u"hr",3602u"s")
1801//1800 hr
uconvert(u"m/s", 60u"km/hr")
50//3 m s^-1

Another useful function is: upreferred, which converts a quantity to the preferred unit for its dimension. In this case, units are converted to the preferred SI representation.

upreferred(u"J")
kg m^2 s^-2

We can also make Unitful.Units callable with a Number as an argument, for a unit conversion shorthand:

u"cm"(1u"m")
100 cm

For syntax simplicity, we can use |+> (|>) to chain unit conversions:

1u"m" |> u"cm"
100 cm

Dimensionless quantities#

For dimensionless quantities, we can use NoUnits as an argumert to uconvert:

uconvert(NoUnits, 1.0u"μm/m")
1.0e-6

You can also check if an object represents “no units” by doing:

unit(1.0) == NoUnits
true

It is also possible to convert a quantity to a subtype of Real or Complex to obtain the numerical value without units:

convert(Float64, 1.0u"μm/m")
1.0e-6

It is instructive to compare the output of the cell above with the ustrip function, which merely strips the unit annotation but keeps the numerical value the same:

ustrip(1.0u"μm/m")
1.0

Dispatching on units#

Of course (you get the idea by now), we can also dispatch on units!

whatsit(x::Unitful.Voltage) = "voltage!"
whatsit (generic function with 1 method)
whatsit(x::Unitful.Length) = "length!"
whatsit (generic function with 2 methods)
whatsit(1u"mm")
"length!"
whatsit(1u"kV")
"voltage!"
Creating your own units#

We will not cover this in detail here, but you can also do things like creating your own units and specifying how to convert them to others…(see docs)

Delimited Files#

Still in the context of manipulating numerical data, the DelimitedFiles (also in Julia Base) package provides functionality for reading and writing files with delimited values.

Let’s read the file you created in the previous notebook:

using DelimitedFiles

The second argument specifies the delimiter, the third the type to parse, and the comments=true kwargs says to ignore comment lines.

readdlm("test.txt", ' ', Float64, comments=true)
1×3 Matrix{Float64}:
 1.0  2.0  3.0