Why Julia?

Why Julia?#

Overview

This notebook will provide an introduction into Julia, highlighting its uniqueness and features, such as:

High-level syntax
Performance comparable to C
Multiple dispatch
Dynamic typing
Method overloading

Julia is known for its speed, ease of use, and strong support for scientific computing. Some particularly notable features include:

Capacity for high speed
Multiple dispatch (more soon!)
Interoperability
Dynamic type system
Metaprogramming functionality (“Julia all the way down”)
Built-in features for:
- Package/environment management
- Unit testing
- Documentation
- Parallelism, distributed computing, multithreading, GPU support

Julia’s Speed and Performance#

Julia compiles code just-in-time (“JIT”) using LLVM, which allows it to achieve speeds similar to low-level languages like C. Here are some benchmark results with runtime (referenced against C) of various algorithmic benchmarks compared across languages (from this page).

Reference: Why Julia? A manifesto

Julia has just-in-time (JIT) compilation. This means that the code is dynamically compiled during the execution of the program, also known as the program run time. In this way, the previous step of compiling the code into an executable is completely excluded from consideration.

The idea behind JIT compilation is to bring the benefits of both (static) compilation and interpretation.

Reference: The Julia Compilation Process

Julia is (or at least can be) fast because of how it is designed. Many choices went into this design, but the core paradigm of multiple dispatch as a way to enable type stability is what many of its high-performance features rely upon, allowing Julia to be very easy for a compiler to make into efficient code, but also allowing the code to be concise/readable and “look like a scripting language”.

Type-stability and Code Introspection (at a glance…)#

Type stability: the reasonable type to output from *(::Int64,::Int64) is an Int64:

a₁ = 3
b₁ = 2
c₁ = a₁ * b₁

typeof(a₁), typeof(b₁), typeof(c₁)

(Int64, Int64, Int64)

In Julia, both @code_llvm and @code_native macros are used to inspect the generated code at different stages of compilation. However, they provide output at different levels of abstraction:

@code_llvm provides a high-level view of the LLVM Intermediate Representation (IR) code generated by the Julia compiler.

@code_llvm a₁ * b₁

; Function Signature: *(Int64, Int64)
;  @ int.jl:88 within `*`
define i64 @"julia_*_4599"(i64 signext %"x::Int64", i64 signext %"y::Int64") #0 {
top:
  %0 = mul i64 %"y::Int64", %"x::Int64"
  ret i64 %0
}

@code_native shows the native assembly code generated for the target machine.

NOTE: The output of @code_native can vary between different machines, depending on several factors related to the hardware and the Julia installation.

@code_native a₁ * b₁

.text
	.file	"*"
	.globl	"julia_*_4800"                  # -- Begin function julia_*_4800
	.p2align	4, 0x90
	.type	"julia_*_4800",@function
"julia_*_4800":                         # @"julia_*_4800"
; Function Signature: *(Int64, Int64)
; ┌ @ int.jl:88 within `*`
# %bb.0:                                # %top
; │ @ int.jl within `*`
	#DEBUG_VALUE: *:x <- $rdi
	#DEBUG_VALUE: *:y <- $rsi
	push	rbp
	mov	rax, rdi
	mov	rbp, rsp
; │ @ int.jl:88 within `*`
	imul	rax, rsi
	pop	rbp
	ret
.Lfunc_end0:
	.size	"julia_*_4800", .Lfunc_end0-"julia_*_4800"
; └
                                        # -- End function
	.section	".note.GNU-stack","",@progbits

The reasonable type to output from *(::Float64,::Float64) is a Float64:

a₂ = 3.0
b₂ = 2.0
c₂ = a₂ * b₂

typeof(a₂), typeof(b₂), typeof(c₂)

(Float64, Float64, Float64)

@code_llvm a₂ * b₂

; Function Signature: *(Float64, Float64)
;  @ float.jl:493 within `*`
define double @"julia_*_4867"(double %"x::Float64", double %"y::Float64") #0 {
top:
  %0 = fmul double %"x::Float64", %"y::Float64"
  ret double %0
}

@code_native a₂ * b₂

	.text
	.file	"*"
	.globl	"julia_*_4871"                  # -- Begin function julia_*_4871
	.p2align	4, 0x90
	.type	"julia_*_4871",@function
"julia_*_4871":                         # @"julia_*_4871"
; Function Signature: *(Float64, Float64)
; ┌ @ float.jl:493 within `*`
# %bb.0:                                # %top
; │ @ float.jl within `*`
	#DEBUG_VALUE: *:x <- $xmm0
	#DEBUG_VALUE: *:y <- $xmm1
	push	rbp
	mov	rbp, rsp
; │ @ float.jl:493 within `*`
	vmulsd	xmm0, xmm0, xmm1
	pop	rbp
	ret
.Lfunc_end0:
	.size	"julia_*_4871", .Lfunc_end0-"julia_*_4871"
; └
                                        # -- End function
	.type	".L+Core.Float64#4873",@object  # @"+Core.Float64#4873"
	.section	.rodata,"a",@progbits
	.p2align	3, 0x0
".L+Core.Float64#4873":
	.quad	".L+Core.Float64#4873.jit"
	.size	".L+Core.Float64#4873", 8

.set ".L+Core.Float64#4873.jit", 140013584623760
	.size	".L+Core.Float64#4873.jit", 8
	.section	".note.GNU-stack","",@progbits

What happens if the code is not type-stable? The compiler has to generate code that can handle any type, which is slower (usually both slower to generate and to run) than code that handles only a specific type. (More on this in the optimization section!)

a₃ = 3
b₃ = 2.0
c₃ = a₃ + b₃

typeof(a₃), typeof(b₃), typeof(c₃)

(Int64, Float64, Float64)

@code_llvm a₃ * b₃

; Function Signature: *(Int64, Float64)
;  @ promotion.jl:430 within `*`
define double @"julia_*_4880"(i64 signext %"x::Int64", double %"y::Float64") #0 {
top:
; ┌ @ promotion.jl:400 within `promote`
; │┌ @ promotion.jl:375 within `_promote`
; ││┌ @ number.jl:7 within `convert`
; │││┌ @ float.jl:239 within `Float64`
      %0 = sitofp i64 %"x::Int64" to double
; └└└└
;  @ promotion.jl:430 within `*` @ float.jl:493
  %1 = fmul double %0, %"y::Float64"
  ret double %1
}

@code_native a₃ * b₃

	.text
	.file	"*"
	.globl	"julia_*_4885"                  # -- Begin function julia_*_4885
	.p2align	4, 0x90
	.type	"julia_*_4885",@function
"julia_*_4885":                         # @"julia_*_4885"
; Function Signature: *(Int64, Float64)
; ┌ @ promotion.jl:430 within `*`
# %bb.0:                                # %top
; │ @ promotion.jl within `*`
	#DEBUG_VALUE: *:x <- $rdi
	#DEBUG_VALUE: *:y <- $xmm0
	push	rbp
; │ @ promotion.jl:430 within `*`
; │┌ @ promotion.jl:400 within `promote`
; ││┌ @ promotion.jl:375 within `_promote`
; │││┌ @ number.jl:7 within `convert`
; ││││┌ @ float.jl:239 within `Float64`
	vcvtsi2sd	xmm1, xmm1, rdi
	mov	rbp, rsp
; │└└└└
; │ @ promotion.jl:430 within `*` @ float.jl:493
	vmulsd	xmm0, xmm1, xmm0
	pop	rbp
	ret
.Lfunc_end0:
	.size	"julia_*_4885", .Lfunc_end0-"julia_*_4885"
; └
                                        # -- End function
	.type	".L+Core.Float64#4887",@object  # @"+Core.Float64#4887"
	.section	.rodata,"a",@progbits
	.p2align	3, 0x0
".L+Core.Float64#4887":
	.quad	".L+Core.Float64#4887.jit"
	.size	".L+Core.Float64#4887", 8

.set ".L+Core.Float64#4887.jit", 140013584623760
	.size	".L+Core.Float64#4887.jit", 8
	.section	".note.GNU-stack","",@progbits

Julia’s type system is dynamic, but gains some of the advantages of static type systems by making it possible to indicate that certain values are of specific types. This can be of great assistance in generating efficient code, but even more significantly, it allows method dispatch on the types of function arguments to be deeply integrated with the language.

Type hierarchy: Abstract types cannot be instantiated, and serve only as nodes in the type graph (see example above for the built-in number types in Julia), thereby describing sets of related concrete types: those concrete types which are their descendants. We begin with abstract types even though they have no instantiation because they are the backbone of the type system: they form the conceptual hierarchy which makes Julia’s type system more than just a collection of object implementations.

abstract type Number end
abstract type Real          <: Number end
abstract type AbstractFloat <: Real end
abstract type Integer       <: Real end
abstract type Signed        <: Integer end
abstract type Unsigned      <: Integer end

subtypes(Number)

1-element Vector{Any}:
 Real

subtypes(Real)

2-element Vector{Any}:
 AbstractFloat
 Integer

supertype(Real)

Number

Composite Types: A composite type is a collection of named fields, an instance of which can be treated as a single value. In many languages, composite types are the only kind of user-definable type, and they are by far the most commonly used user-defined type in Julia as well. (See discussion of structs in the previous notebook covering syntax)

subtypes(Any)

651-element Vector{Any}:
 AbstractArray
 AbstractChannel
 AbstractChar
 AbstractDict
 AbstractDisplay
 AbstractMatch
 AbstractPattern
 AbstractSet
 AbstractString
 Any
 Base.ANSIDelimiter
 Base.ANSIIterator
 Base.AbstractBroadcasted
 ⋮
 Tuple
 Type
 TypeVar
 UndefInitializer
 Val
 VecElement
 VersionNumber
 WeakRef
 ZMQ.Context
 ZMQ.Socket
 ZMQ.lib.zmq_msg_t
 ZMQ.lib.zmq_pollitem_t

length(subtypes(Any))

All types in Julia are subtypes of Any, which is the top of the type hierarchy. Any is the default supertype of any type that is not explicitly declared to be a subtype of some other type.

struct Foo
    bar
    baz::Int
    qux::Float64
end

Foo <: Any

true

In Julia, all values are objects, but functions are not bundled with the objects they operate on. This is necessary since Julia chooses which method of a function to use by multiple dispatch, meaning that the types of all of a function’s arguments are considered when selecting a method, rather than just the first one. Let’s dive more into the idea of multiple dispatch next…

More information on Julia’s Type System

Multiple Dispatch in Julia#

One of the most powerful features of Julia is multiple dispatch, where functions are chosen based on the types of all arguments, making Julia highly flexible and extensible. Multiple dispatch allows for a language to dispatch function calls onto type-stable functions. This is a key paradigm of Julia, so let’s take some time to dig into it. If you have type stability inside of a function (meaning, any function call within the function is also type-stable), then the compiler can know the types of the variables at every step. Therefore, it can compile highly optimized code that runs just as fast as C or Fortran. Multiple dispatch works into this story because it means that * can be a type-stable function: it just means different things for different inputs. But if the compiler can know the types of a and b before calling *, then it knows which * method to use, and therefore it knows the output type of c=a*b.

The dispatch system will generally choose the most specific applicable dispatch:

x = 2 + 3
typeof(x)

Int64

Now let’s use the + operator as an example to illustrate multiple dispatch in Julia.

x = 2.0 + 3
typeof(x)

Float64

length(methods(+))

Example of Multiple Dispatch#

Let’s define a simple function that behaves differently based on the types of its arguments.

# Define functions using multiple dispatch
function add(a::Int, b::Int)
    return a + b
end

add (generic function with 1 method)

Now we define a function with the same name, but to concatenate strings.

function add(a::String, b::String)
    return a * b # Concatenates strings
end

add (generic function with 2 methods)

# Test multiple dispatch
println(add(3, 4))       # Int addition
println(add("Hello, ", "World!"))  # String concatenation

7
Hello, World!

Here, Julia selects the appropriate function to run based on the type of the inputs, whether it’s integers or strings.

methods(add)

# 2 methods for generic function add from [35mMain[39m:

add(a::String, b::String) in Main at In[22]:1
add(a::Int64, b::Int64) in Main at In[21]:2

Performance Benefits of Multiple Dispatch#

In Julia, multiple dispatch allows highly optimized code paths to be selected at runtime, providing both flexibility and performance.

# Example of more complex dispatch based on argument types
function process_data(x::Array{Int})
    println("Processing integer array")
end

process_data (generic function with 1 method)

function process_data(x::Array{Float64})
    println("Processing float array")
end

process_data (generic function with 2 methods)

# Test with different types
process_data([1, 2, 3])         # Dispatches to integer array method
process_data([1.1, 2.2, 3.3])   # Dispatches to float array method

Processing integer array
Processing float array

Create an example of multiple dispatch (i.e. a function with at least two call signatures) below.

Custom Interfaces with Multiple Dispatch#

We can also use multiple dispatch to define custom interfaces by implementing functions for specific types. Let’s see how this works by defining an abstract type called Shape:

# Define an abstract type and a concrete subtype
abstract type Shape end

We will work now define composite types for Circle and Square:

struct Circle <: Shape
    radius::Float64
end

struct Square <: Shape
    side::Float64
end

Let’s define a function that calculates the area of a shape using multiple dispatch:

# Define a generic area function using dispatch
area(s::Circle) = π * s.radius^2
area(s::Square) = s.side^2

area (generic function with 2 methods)

Check that the function works as we expect:

circle = Circle(5.0)
square = Square(4.0)

# Test the area function
println(area(circle))  # Circle with radius 5
println(area(square))  # Square with side 4

78.53981633974483

16.0

Now here are a few examples of the power, expressiveness, and conciseness afforded by the multiple dispatch paradigm:

area_squared(s::Shape) = area(s)^2
area_squared(circle)
# area_squared(square)

6168.502750680849

area.([circle, square])

2-element Vector{Float64}:
 78.53981633974483
 16.0

How would this work in Python?#

In Python, we need to use isinstance, which checks to determine the type of an object and then call the appropriate method.

import math

# Define the base class Shape
class Shape:
    pass

# Define Circle as a subclass of Shape
class Circle(Shape):
    def __init__(self, radius):
        self.radius = radius

# Define Square as a subclass of Shape
class Square(Shape):
    def __init__(self, side):
        self.side = side

# Define an area function that attempts to mimic Julia's multiple dispatch
def area(shape):
    if isinstance(shape, Circle):
        return math.pi * shape.radius ** 2
    elif isinstance(shape, Square):
        return shape.side ** 2
    else:
        raise TypeError("Unknown shape!")

# Create instances of Circle and Square
circle = Circle(5.0)
square = Square(4.0)

# Test the area function
print(f"Area of Circle with radius 5: {area(circle)}")
print(f"Area of Square with side 4: {area(square)}")

# Attempt to handle multiple shapes in a list (mimic dispatch over arrays of shapes)
shapes = [circle, square]

# This part is manual in Python unlike Julia
for shape in shapes:
    print(f"Area: {area(shape)}")

In Julia, this combination – the idea of a shape (and potentially, though not crucially, a corresponding abstract type) and functions like area that dispatch on it – is a very simple example of a powerful idea that is enabled by multiple dispatch: that of an interface. The core idea of an interface is an informal contract that says, effectively, “if you dispatch these functions on your object, then a bunch of other functionality that depends only on those functions will just work!”

Other interfaces in Julia include:

Indexing and iteration, upon which operations like sorting and slicing are built in a generalized way
Graphs (the mathematical objects with nodes and edges), which allows for functionality like graph traversal, computation of centrality measures, etc. on a wide variety of graph types
AtomsBase, for specifying atomistic system geometries

Extra: Advantages of Julia Dispatch#

Improved extensibility: With Multiple Dispatch, it becomes easier to extend functionality by adding new functions or methods that handle specific argument types. This makes it straightforward to accommodate new types without modifying existing code, resulting in better code organization and modularity.

Avoidance of complex branching and conditionals: When dealing with different argument types, traditional approaches often involve long chains of if-else or switch-case statements to determine the appropriate action. Multiple dispatch eliminates the need for such complex branching, leading to cleaner, more readable code.

Caution: Method Ambiguity#

A “gotcha” that can happen is if there is not one clear “most specific” method upon which to dispatch (see example below). This situation should be avoided as it will generally lead to errors.

Reference: More about multiple dispatch