This tutorial by the Molecular Sciences Software Institute (MolSSI) adopts a profile-driven approach toward CUDA C/C++ programming at the intermediate level and blends it with deeper insights from GPU architecture in order to improve the performance of the heterogeneous parallel applications.
The MolSSI’s full education mission statement can be found here.
Prerequisites
- Previous knowledge of High-performance Computing (HPC) basic concepts are helpful but not required for starting this course. Nevertheless, we encourage students to take a glance at our Parallel Programming tutorial, specifically, Chapters 1, 2 and 5 for a brief overview of some of the fundamental concepts in HPC.
- Basic familiarity with Bash, C and C++ programming languages is required.
- MolSSI’s Fundamentals of Heterogeneous Parallel Programming with CUDA C/C++ at the beginner level is a pre-requisite for the present tutorial.
Software/Hardware Specifications
The following NVIDIA CUDA-enabled GPU devices have been used throughout this tutorial:
- Device 0: GeForce GTX 1650 with Turing architecture (Compute Capability = 7.5)
- Device 1: GeForce GT 740M with Kepler architecture (Compute Capability = 3.5)
Linux 18.04 (Bionic Beaver) OS is the target platform for CUDA Toolkit v11.2.0 on the two host machines armed with devices 0 and 1.