Fundamentals of Accelerated Computing with CUDA C/C++

Instructor:   Asoc. prof. Dr. sc. ing. Arnis Lektauers (NVIDIA Ambassador in Latvia).


  • Basic C/C++ competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations;
  • No previous knowledge of CUDA programming is assumed.

Duration:   8h

Format:      Training course

Certificate: Upon successful completion of the assessment, participants will receive an NVIDIA DLI certificate to recognize their subject matter competency and support professional career growth.

Hardware Requirements: Desktop or laptop computer capable of running the latest version of Chrome or Firefox. Each participant will be provided with dedicated access to a fully configured, GPU-accelerated server in the cloud.

Aims and tasks of the course

At the conclusion of the workshop, you’ll have an understanding of the fundamental tools and techniques for GPU-accelerating C/C++ applications with CUDA and be able to:

  • Write code to be executed by a GPU accelerator;
  • Expose and express data and instruction-level parallelism in C/C++ applications using CUDA;
  • Utilize CUDA-managed memory and optimize memory migration using asynchronous prefetching;
  • Leverage command-line and visual profilers to guide your work;
  • Utilize concurrent streams for instruction-level parallelism;
  • Write GPU-accelerated CUDA C/C++ applications, or refactor existing CPU-only applications, using a profile-driven approach.
  • Topics

    1. Accelerating Applications with CUDA C/C++.Learn the essential syntax and concepts to be able to write GPU-enabled C/C++ applications with CUDA:

  • Write, compile, and run GPU code;
  • Control parallel thread hierarchy;
  • Allocate and free memory for the GPU.
  • 2. Managing Accelerated Application Memory with CUDA C/C++.Learn the command-line profiler and CUDA-managed memory, focusing on observation-driven application improvements and a deep understanding of managed memory behavior:

  • Profile CUDA code with the command-line profiler;
  • Go deep on unified memory;
  • Optimize unified memory management.
  • 3. Asynchronous Streaming and Visual Profiling for Accelerated Applications with CUDA C/C++.Identify opportunities for improved memory management and instruction-level parallelism:

  • Profile CUDA code with NVIDIA Nsight Systems;
  • Use concurrent CUDA streams.
  • 4. Final review:

  • Review key learnings and wrap up questions;
  • Complete the assessment to earn a certificate;
  • Take the workshop survey.