This lesson is still being designed and assembled (Pre-Alpha version)

CUDA C/C++ Programming and GPU Architecture: A Closer Look

This tutorial by the Molecular Sciences Software Institute (MolSSI) adopts a profile-driven approach toward CUDA C/C++ programming at the intermediate level and blends it with deeper insights from GPU architecture in order to improve the performance of the heterogeneous parallel applications.

The MolSSI’s full education mission statement can be found here.

Prerequisites

Software/Hardware Specifications

The following NVIDIA CUDA-enabled GPU devices have been used throughout this tutorial:

  • Device 0: GeForce GTX 1650 with Turing architecture (Compute Capability = 7.5)
  • Device 1: GeForce GT 740M with Kepler architecture (Compute Capability = 3.5)

Linux 18.04 (Bionic Beaver) OS is the target platform for CUDA Toolkit v11.2.0 on the two host machines armed with devices 0 and 1.

Schedule

Setup Download files required for the lesson
00:00 1. NVIDIA Nsight Profilers What is profiling? Why and how is it useful for parallelization?
What are NVIDIA Nsight Systems and Nsight Compute? What do they do and how can I use them?
What is the difference between Nsight Systems/Compute’s command line interface (CLI) and graphical user interface (GUI) profilers?
01:00 2. CUDA Memory Model: Basics What is CUDA Memory Model?
What is the principle of locality and how does it reduce the memory access latency?
Why is there a memory hierarchy and how is it defined?
02:00 3. Performance Guidelines and Optimization Strategies

03:00 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.