CUDA C/C++ Programming and GPU Architecture: A Closer Look

This tutorial by the Molecular Sciences Software Institute (MolSSI) adopts a profile-driven approach toward CUDA C/C++ programming at the intermediate level and blends it with deeper insights from GPU architecture in order to improve the performance of the heterogeneous parallel applications.

The MolSSI’s full education mission statement can be found here.

Prerequisites

Previous knowledge of High-performance Computing (HPC) basic concepts are helpful but not required for starting this course. Nevertheless, we encourage students to take a glance at our Parallel Programming tutorial, specifically, Chapters 1, 2 and 5 for a brief overview of some of the fundamental concepts in HPC.

Basic familiarity with Bash, C and C++ programming languages is required.

MolSSI’s Fundamentals of Heterogeneous Parallel Programming with CUDA C/C++ at the beginner level is a pre-requisite for the present tutorial.

Software/Hardware Specifications

The following NVIDIA CUDA-enabled GPU devices have been used throughout this tutorial:

Device 0: GeForce GTX 1650 with Turing architecture (Compute Capability = 7.5)

Device 1: GeForce GT 740M with Kepler architecture (Compute Capability = 3.5)

Linux 18.04 (Bionic Beaver) OS is the target platform for CUDA Toolkit v11.2.0 on the two host machines armed with devices 0 and 1.

	Setup	Download files required for the lesson
00:00	1. NVIDIA Nsight Profilers	What is profiling? Why and how is it useful for parallelization? What are NVIDIA Nsight Systems and Nsight Compute? What do they do and how can I use them? What is the difference between Nsight Systems/Compute’s command line interface (CLI) and graphical user interface (GUI) profilers?
01:00	2. CUDA Memory Model: Basics	What is CUDA Memory Model? What is the principle of locality and how does it reduce the memory access latency? Why is there a memory hierarchy and how is it defined?
02:00	3. Performance Guidelines and Optimization Strategies
03:00	Finish

CUDA C/C++ Programming and GPU Architecture: A Closer Look

Prerequisites

Software/Hardware Specifications

Schedule