Numerical Data#
Overview
This notebook focuses on the use of libraries in Julia to manipulate numerical data, such as matrices and vectors. It covers the basics of creating and manipulating arrays, as well as performing mathematical operations on them.Besides that, we also explore the use of DataFrames for handling tabular data, and Plots for visualizing data.
Key Libraries
LinearAlgebra: For linear algebra operations.
DataFrames: For handling tabular data.
Plots: For data visualization.
Linear Algebra#
Here we’ll demonstrate some basic linear algebraic functionality using the Base (i.e. built into Julia and doesn’t need to be installed, only imported) LinearAlgebra package.
using LinearAlgebra
Let’s initialize a nice matrix:
A = [1 2 3; 4 5 6; 7 8 9]
3×3 Matrix{Int64}:
1 2 3
4 5 6
7 8 9
What’s its determinant? (Oops, maybe it’s not so nice)
det(A)
0.0
Okay, let’s make a new matrix and take the inverse (NB: this should of course be avoided in numerical computing generally, but the function does exist so we’re showing it to you):
B = [2 1 1; 1 2 1; 1 1 2]
inv(B)
3×3 Matrix{Float64}:
0.75 -0.25 -0.25
-0.25 0.75 -0.25
-0.25 -0.25 0.75
How about eigendecomposition?
eig_B = eigen(B)
Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
3-element Vector{Float64}:
0.9999999999999998
1.0
3.9999999999999987
vectors:
3×3 Matrix{Float64}:
-0.408248 0.707107 -0.57735
-0.408248 -0.707107 -0.57735
0.816497 0.0 -0.57735
By default an Eigen
object is returned, which we can pull out the fields of as eig_B.values
or eig_B.vectors
, but we can also pre-assign them like this:
vals, vecs = eigen(B)
Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
3-element Vector{Float64}:
0.9999999999999998
1.0
3.9999999999999987
vectors:
3×3 Matrix{Float64}:
-0.408248 0.707107 -0.57735
-0.408248 -0.707107 -0.57735
0.816497 0.0 -0.57735
Note that the eigenvectors are the columns…
vecs
3×3 Matrix{Float64}:
-0.408248 0.707107 -0.57735
-0.408248 -0.707107 -0.57735
0.816497 0.0 -0.57735
We have a backslash like in MATLAB, too…
b = [1, 2, 3]
x = A \ b
SingularException(3)
Stacktrace:
[1] checknonsingular
@ /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/LinearAlgebra/src/factorization.jl:69 [inlined]
[2] _check_lu_success
@ /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/LinearAlgebra/src/lu.jl:84 [inlined]
[3] #lu!#182
@ /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/LinearAlgebra/src/lu.jl:92 [inlined]
[4] lu!
@ /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/LinearAlgebra/src/lu.jl:90 [inlined]
[5] lu!
@ /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/LinearAlgebra/src/lu.jl:89 [inlined]
[6] _lu
@ /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/LinearAlgebra/src/lu.jl:347 [inlined]
[7] lu(::Matrix{Int64}; kwargs::@Kwargs{})
@ LinearAlgebra /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/LinearAlgebra/src/lu.jl:341
[8] lu
@ /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/LinearAlgebra/src/lu.jl:341 [inlined]
[9] \(A::Matrix{Int64}, B::Vector{Int64})
@ LinearAlgebra /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/LinearAlgebra/src/generic.jl:1132
[10] top-level scope
@ In[8]:2
Let’s say we want to solve the following system of equations (and maybe find out some other things about the matrix \(A\) while we’re at it…):
x_1 + 2x_2 - x_3 + x_4 = 1 \\
2x_1 - x_2 + 3x_3 - 2x_4 = 5 \\
-3x_1 + 4x_2 + 2x_3 + x_4 = 7 \\
x_1 - 3x_2 + 2x_3 - 4x_4 = -2 \\
This can be written in matrix form Ax = b
as:
A =
\begin{bmatrix}
1 & 2 & -1 & 1 \\
2 & -1 & 3 & -2 \\
-3 & 4 & 2 & 1 \\
1 & -3 & 2 & -4 \\
\end{bmatrix}
x =
\begin{bmatrix}
x_1 \\
x_2 \\
x_3 \\
x_4 \\
\end{bmatrix}
b =
\begin{bmatrix}
1 \\
5 \\
7 \\
-2 \\
\end{bmatrix}
# Define matrix A and vector b
A = [1 2 -1 1; 2 -1 3 -2; -3 4 2 1; 1 -3 2 -4]
b = [1, 5, 7, -2]
# Solve for x using the backslash operator
x = A \ b
# Compute the rank of A
rank_A = rank(A)
# Compute the null space of A
null_space_A = nullspace(A)
# Compute the condition number of A
cond_A = cond(A)
# Display results
println("Solution x: ", x)
println("Rank of A: ", rank_A)
println("Null space of A: ", null_space_A)
println("Condition number of A: ", cond_A)
Solution x:
[0.6235294117647058, 0.7176470588235293, 2.3529411764705883, 1.2941176470588236]
Rank of A: 4
Null space of A: Matrix
{Float64}(undef, 4, 0)
Condition number of A: 7.463516139995443
Special Matrices#
Julia provides for a bunch of special matrix types on which standard operations are further optimized; you can read more about them here. We explore a few examples below.
Symmetric Matrices: Julia provides the Symmetric
type for creating symmetric matrices, where the matrix is equal to its transpose. This saves storage space by storing only the upper triangular part of the matrix.
# Create a symmetric matrix
A = [1 2 3; 2 4 5; 3 5 6]
S = Symmetric(A)
println(S)
[1 2 3; 2 4 5; 3 5 6]
Sparse Matrices: Sparse matrices are useful when you have a large matrix with mostly zero elements. Julia provides efficient storage and operations for sparse matrices via the SparseArrays
package.
using SparseArrays
# Create a sparse matrix
I = sparse([1, 3, 4], [2, 1, 3], [10, 20, 30], 5, 5)
println(I)
sparse(
[3, 1, 4], [1, 2, 3], [20, 10, 30], 5, 5)
Diagonal Matrices: Julia has a Diagonal
type that stores only the diagonal elements of the matrix, making it memory-efficient and fast for certain operations.
# Create a diagonal matrix
d = Diagonal([1, 2, 3])
println(d)
[1 0 0; 0 2 0; 0 0 3]
Block Diagonal Matrices: Using the BlockDiagonals
package (or other similar libraries), you can create block diagonal matrices that store each diagonal block separately.
using BlockDiagonals
# Create a block diagonal matrix
D1 = Diagonal([1, 2, 3])
D2 = Diagonal([3, 4, 5])
BD = BlockDiagonal([D1, D2])
println(BD)
ArgumentError: Package BlockDiagonals not found in current path.
- Run `import Pkg; Pkg.add("BlockDiagonals")` to install the BlockDiagonals package.
Stacktrace:
[1] macro expansion
@ ./loading.jl:2296 [inlined]
[2] macro expansion
@ ./lock.jl:273 [inlined]
[3] __require(into::Module, mod::Symbol)
@ Base ./loading.jl:2271
[4] #invoke_in_world#3
@ ./essentials.jl:1089 [inlined]
[5] invoke_in_world
@ ./essentials.jl:1086 [inlined]
[6] require(into::Module, mod::Symbol)
@ Base ./loading.jl:2260
Hermitian Matrices: (for the quantum mechanicians among you) The Hermitian
type in Julia is used for Hermitian matrices, which are complex square matrices that are equal to their own conjugate transpose.
# Create a Hermitian matrix
B = [1+im 2 3; 2 4 5; 3 5 6-im]
H = Hermitian(B)
println(H)
Complex{Int64}[
1 + 0im 2 + 0im 3 + 0im; 2 + 0im 4 + 0im 5 + 0im; 3 + 0im 5 + 0im 6 + 0im]
DataFrames#
The DataFrames package provides a similar set of functionality to pandas in Python (albeit with fairly different and arguably more principled syntax). We’ll import the package and start by creating a simple DataFrame to experiment with.
using DataFrames, Statistics
ArgumentError: Package DataFrames not found in current path.
- Run `import Pkg; Pkg.add("DataFrames")` to install the DataFrames package.
Stacktrace:
[1] macro expansion
@ ./loading.jl:2296 [inlined]
[2] macro expansion
@ ./lock.jl:273 [inlined]
[3] __require(into::Module, mod::Symbol)
@ Base ./loading.jl:2271
[4] #invoke_in_world#3
@ ./essentials.jl:1089 [inlined]
[5] invoke_in_world
@ ./essentials.jl:1086 [inlined]
[6] require(into::Module, mod::Symbol)
@ Base ./loading.jl:2260
df = DataFrame(Name=["John", "Jane", "Jim"], Age=[28, 34, 45], Salary=[50000, 62000, 72000])
UndefVarError: `DataFrame` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
[1] top-level scope
@ In[16]:1
Add a new column:
df.Status = ["Single", "Married", "Single"]
UndefVarError: `df` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
[1] top-level scope
@ In[17]:1
Let’s filter for people over 30…
filtered_df = filter(row -> row.Age > 30, df)
UndefVarError: `df` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
[1] top-level scope
@ In[18]:1
The describe
function gives us some summary statistics…
describe(df)
UndefVarError: `describe` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
[1] top-level scope
@ In[19]:1
There’s also grouping and aggregate calculation functionality…
grouped = groupby(df, :Status)
agg_df = combine(grouped, :Salary => mean => :AvgSalary)
UndefVarError: `groupby` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
[1] top-level scope
@ In[20]:1
Random Number (and more!) generation#
As with any self-respecting scientific programming language, Julia has extensive functionality for randomness. The core base function is rand
, which we demonstrate in a few (of many) variations below…
rand() # without arguments, will draw a single float from U(0,1)
0.25006141143873706
rand(3,2) # now make it a matrix
3×2 Matrix{Float64}:
0.832316 0.48948
0.522031 0.897479
0.609625 0.819179
rand(Int, 2, 2) # we can also specify a type; now it will draw from the full range of values for that type
2×2 Matrix{Int64}:
-8522877022131237848 -6871358748179709772
1567719678029836224 4157292215403458722
rand(['a', 'b', 'c']) # can also draw from a provided collection of objects
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
rand(0:2:100) # anything iterable counts
98
randn(1,4) # draw from standard normal
1×4 Matrix{Float64}:
-0.306526 -0.0201221 -1.14722 -1.51437
using Plots
ArgumentError: Package Plots not found in current path.
- Run `import Pkg; Pkg.add("Plots")` to install the Plots package.
Stacktrace:
[1] macro expansion
@ ./loading.jl:2296 [inlined]
[2] macro expansion
@ ./lock.jl:273 [inlined]
[3] __require(into::Module, mod::Symbol)
@ Base ./loading.jl:2271
[4] #invoke_in_world#3
@ ./essentials.jl:1089 [inlined]
[5] invoke_in_world
@ ./essentials.jl:1086 [inlined]
[6] require(into::Module, mod::Symbol)
@ Base ./loading.jl:2260
Let us use rand
and randn
to generate and plot distributions…
# Generate sample data for plotting
normal_samples = randn(1000) * 2 .+ 5 # Normal distribution N(5, 2)
uniform_samples = rand(1000) * 10 # Uniform distribution U(0, 10)
# Plot histograms for comparison
histogram(normal_samples, bins=30, alpha=0.5, label="Normal(5, 2)", xlabel="Value", ylabel="Frequency")
histogram!(uniform_samples, bins=30, alpha=0.5, label="Uniform(0, 10)")
UndefVarError: `histogram` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
[1] top-level scope
@ In[28]:6
In this case, we see the use of !
in a slightly different way than was explored in the syntax section, though conceptually the same. The histogram!
function modifies the existing plot, but we don’t actually have to pass it as an argument (though we can), so the thing being modified “in-place” is not explicitly being passed in this case.
More plot examples#
(there are many more available, these are just a few examples of commonly-desired plot types!)
Line Plot: A basic line plot with labels and title.
# Generate some data
x = 1:10
y = x .^ 2
# Line plot
plot(x, y, label="y = x^2", xlabel="x", ylabel="y", title="Line Plot")
UndefVarError: `plot` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
[1] top-level scope
@ In[29]:6
Scatter Plot: A scatter plot to show individual points.
# Scatter plot
scatter(x, y, label="Scatter", xlabel="x", ylabel="y", title="Scatter Plot")
UndefVarError: `scatter` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
[1] top-level scope
@ In[30]:2
Bar Plot: A bar plot to visualize categorical data.
# Bar plot
categories = ["A", "B", "C", "D"]
values = [5, 9, 3, 7]
bar(categories, values, label="Values", title="Bar Plot", xlabel="Category", ylabel="Value")
UndefVarError: `bar` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
[1] top-level scope
@ In[31]:5
Heatmap: A heatmap to show a matrix of values.
# Heatmap data
z = rand(10, 10)
# Heatmap plot
heatmap(z, title="Heatmap", xlabel="X-axis", ylabel="Y-axis")
UndefVarError: `heatmap` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
[1] top-level scope
@ In[32]:5
3D Plot: A 3D plot to visualize functions or data in three dimensions.
# 3D plot data
x = -5:0.1:5
y = -5:0.1:5
z = [sin(sqrt(xi^2 + yi^2)) for xi in x, yi in y]
# 3D surface plot
plot(x, y, z, st=:surface, title="3D Plot", xlabel="X", ylabel="Y", zlabel="Z")
UndefVarError: `plot` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
[1] top-level scope
@ In[33]:7
Note: The Plots
package is a powerful and flexible plotting library in Julia, and it supports many different plot types and customization options. You can refer to the Plots.jl documentation for more information on how to create different types of plots and customize them to suit your needs.
Unitful#
Unitful.jl is a Julia package for handling units and dimensions. It can be very useful for doing unit conversions and catching dimensional errors, but is also sometimes more trouble than it’s worth to actually store every quantity in your code with units…
using Unitful
ArgumentError: Package Unitful not found in current path.
- Run `import Pkg; Pkg.add("Unitful")` to install the Unitful package.
Stacktrace:
[1] macro expansion
@ ./loading.jl:2296 [inlined]
[2] macro expansion
@ ./lock.jl:273 [inlined]
[3] __require(into::Module, mod::Symbol)
@ Base ./loading.jl:2271
[4] #invoke_in_world#3
@ ./essentials.jl:1089 [inlined]
[5] invoke_in_world
@ ./essentials.jl:1086 [inlined]
[6] require(into::Module, mod::Symbol)
@ Base ./loading.jl:2260
1.0u"m/s"
LoadError: UndefVarError: `@u_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at In[35]:1
1.0u"N*m"
LoadError: UndefVarError: `@u_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at In[36]:1
u"m,kg,s"
LoadError: UndefVarError: `@u_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at In[37]:1
typeof(1.0u"m/s")
LoadError: UndefVarError: `@u_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at In[38]:1
u"ħ"
LoadError: UndefVarError: `@u_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at In[39]:1
Converting between units#
Convert a Unitful.Quantity
to different units. The conversion will fail if the target units a have a different dimension than the dimension of the quantity x
. You can use this method to switch between equivalent representations of the same unit, like N m
and J
.
uconvert(u"hr",3602u"s")
LoadError: UndefVarError: `@u_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at In[40]:1
uconvert(u"m/s", 60u"km/hr")
LoadError: UndefVarError: `@u_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at In[41]:1
Another useful function is: upreferred
, which converts a quantity to the preferred unit for its dimension. In this case, units are converted to the preferred SI representation.
upreferred(u"J")
LoadError: UndefVarError: `@u_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at In[42]:1
We can also make Unitful.Units
callable with a Number as an argument, for a unit conversion shorthand:
u"cm"(1u"m")
LoadError: UndefVarError: `@u_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at In[43]:1
For syntax simplicity, we can use |+>
(|>
) to chain unit conversions:
1u"m" |> u"cm"
LoadError: UndefVarError: `@u_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at In[44]:1
Dimensioless quantities#
For dimensionless quantities, we can use NoUnits
as an argumert to uconvert
:
uconvert(NoUnits, 1.0u"μm/m")
LoadError: UndefVarError: `@u_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at In[45]:1
You can also check if an object represents “no units” by doing:
unit(1.0) == NoUnits
UndefVarError: `unit` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
[1] top-level scope
@ In[46]:1
It is also possible to convert a quantity to a subtype of Real
or Complex
to obtain the numerical value without units:
convert(Float64, 1.0u"μm/m")
LoadError: UndefVarError: `@u_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at In[47]:1
It is instructive to compare the output of the cell above with the ustrip
function, which merely strips the unit annotation but keeps the numerical value the same:
ustrip(1.0u"μm/m")
LoadError: UndefVarError: `@u_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at In[48]:1
Dispatching on units#
Of course (you get the idea by now), we can also dispatch on units!
whatsit(x::Unitful.Voltage) = "voltage!"
UndefVarError: `Unitful` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
[1] top-level scope
@ In[49]:1
whatsit(x::Unitful.Length) = "length!"
UndefVarError: `Unitful` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
[1] top-level scope
@ In[50]:1
whatsit(1u"mm")
LoadError: UndefVarError: `@u_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at In[51]:1
whatsit(1u"kV")
LoadError: UndefVarError: `@u_str` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
in expression starting at In[52]:1
Creating your own units#
We will not cover this in detail here, but you can also do things like creating your own units and specifying how to convert them to others…(see docs)
Delimited Files#
Still in the context of manipulating numerical data, the DelimitedFiles (also in Julia Base) package provides functionality for reading and writing files with delimited values.
Let’s read the file you just created:
using DelimitedFiles
readdlm("data/read_test.txt")
ArgumentError: Cannot open 'data/read_test.txt': not a file
Stacktrace:
[1] readdlm_auto(input::String, dlm::Char, T::Type, eol::Char, auto::Bool; opts::@Kwargs{})
@ DelimitedFiles /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/DelimitedFiles/src/DelimitedFiles.jl:234
[2] readdlm_auto
@ /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/DelimitedFiles/src/DelimitedFiles.jl:233 [inlined]
[3] readdlm
@ /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/DelimitedFiles/src/DelimitedFiles.jl:170 [inlined]
[4] readdlm(input::String)
@ DelimitedFiles /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/DelimitedFiles/src/DelimitedFiles.jl:118
[5] top-level scope
@ In[54]:1
We can also remove the header and specify the delimiter:
readdlm("data/read_test.txt", ' ', Float64, comments=true)
ArgumentError: Cannot open 'data/read_test.txt': not a file
Stacktrace:
[1] readdlm_auto(input::String, dlm::Char, T::Type, eol::Char, auto::Bool; opts::@Kwargs{comments::Bool})
@ DelimitedFiles /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/DelimitedFiles/src/DelimitedFiles.jl:234
[2] readdlm_auto
@ /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/DelimitedFiles/src/DelimitedFiles.jl:233 [inlined]
[3] readdlm
@ /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/DelimitedFiles/src/DelimitedFiles.jl:226 [inlined]
[4] #readdlm#2
@ /opt/hostedtoolcache/julia/1.11.5/x64/share/julia/stdlib/v1.11/DelimitedFiles/src/DelimitedFiles.jl:86 [inlined]
[5] top-level scope
@ In[55]:1