Fundamentals of Accelerated Computing with CUDA C/C++

Instructor: Asoc. prof. Dr. sc. ing. Arnis Lektauers (NVIDIA Ambassador in Latvia).

Prerequisites:

Basic C/C++ competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations;
No previous knowledge of CUDA programming is assumed.

Duration: 8h

Format: Training course

Certificate: Upon successful completion of the assessment, participants will receive an NVIDIA DLI certificate to recognize their subject matter competency and support professional career growth.

Hardware Requirements: Desktop or laptop computer capable of running the latest version of Chrome or Firefox. Each participant will be provided with dedicated access to a fully configured, GPU-accelerated server in the cloud.

Aims and tasks of the course

At the conclusion of the workshop, you’ll have an understanding of the fundamental tools and techniques for GPU-accelerating C/C++ applications with CUDA and be able to:

Write code to be executed by a GPU accelerator;

Expose and express data and instruction-level parallelism in C/C++ applications using CUDA;

Utilize CUDA-managed memory and optimize memory migration using asynchronous prefetching;

Leverage command-line and visual profilers to guide your work;

Utilize concurrent streams for instruction-level parallelism;

Write GPU-accelerated CUDA C/C++ applications, or refactor existing CPU-only applications, using a profile-driven approach.

Topics

1. Accelerating Applications with CUDA C/C++.Learn the essential syntax and concepts to be able to write GPU-enabled C/C++ applications with CUDA:

Write, compile, and run GPU code;

Control parallel thread hierarchy;

Allocate and free memory for the GPU.

2. Managing Accelerated Application Memory with CUDA C/C++.Learn the command-line profiler and CUDA-managed memory, focusing on observation-driven application improvements and a deep understanding of managed memory behavior:

Profile CUDA code with the command-line profiler;

Go deep on unified memory;

Optimize unified memory management.

3. Asynchronous Streaming and Visual Profiling for Accelerated Applications with CUDA C/C++.Identify opportunities for improved memory management and instruction-level parallelism:

Profile CUDA code with NVIDIA Nsight Systems;

Use concurrent CUDA streams.

4. Final review:

Review key learnings and wrap up questions;

Complete the assessment to earn a certificate;

Take the workshop survey.

Apply for the course

Fundamentals of Accelerated Computing with CUDA C/C++

Aims and tasks of the course

Topics

Contacts

Follow us