Fundamentals of Accelerated Computing with CUDA C/C++

NVIDIA Deep Learning Institute sagatavotais kurss, ko pasniedz NVIDIA Vēstnieks Latvijā Asociētais profesors Dr. sc. ing. Arnis Lektauers. Kursa materiāli tiek rādīti angļu valodā, bet mācību valda – latviešu.

Kursa saturs atbilst CUDA ievadkursam.


  • Pamatzināšanas C/C++, tai skaitā zināšanas par mainīgo veidiem, cilpām, nosacījumu priekšstatu, funkcijām un  masīvu manipulācijām;
  • Priekšzināšanas par CUDA programmēšanu nav nepieciešamas.

Mācību ilgums:   8h.

Dalības maksu nosegs EuroCC 2 projekts.

Formāts:      Klātiene vai attālināti.

Sertifikāts: tiek izsniegts NVIDIA DLI sertifikāts, ja tiek nokārtots gala pārbaudījums kursa beigās.

Datora prasības: katram dalībniekam tiks nodrošināta darbavieta pie datora ar piekļuvi GPU serverim.

Kursa mērķis un uzdevumi

At the conclusion of the workshop, you’ll have an understanding of the fundamental tools and techniques for GPU-accelerating C/C++ applications with CUDA and be able to:
  • Write code to be executed by a GPU accelerator;
  • Expose and express data and instruction-level parallelism in C/C++ applications using CUDA;
  • Utilize CUDA-managed memory and optimize memory migration using asynchronous prefetching;
  • Leverage command-line and visual profilers to guide your work;
  • Utilize concurrent streams for instruction-level parallelism;
  • Write GPU-accelerated CUDA C/C++ applications, or refactor existing CPU-only applications, using a profile-driven approach.


1. Accelerating Applications with CUDA C/C++.

Learn the essential syntax and concepts to be able to write GPU-enabled C/C++ applications with CUDA:

  • Write, compile, and run GPU code;
  • Control parallel thread hierarchy;
  • Allocate and free memory for the GPU.

2. Managing Accelerated Application Memory with CUDA C/C++.

Learn the command-line profiler and CUDA-managed memory, focusing on observation-driven application improvements and a deep understanding of managed memory behavior:

  • Profile CUDA code with the command-line profiler;
  • Go deep on unified memory;
  • Optimize unified memory management.

3. Asynchronous Streaming and Visual Profiling for Accelerated Applications with CUDA C/C++.

Identify opportunities for improved memory management and instruction-level parallelism:

  • Profile CUDA code with NVIDIA Nsight Systems;
  • Use concurrent CUDA streams.

4. Final review:

  • Review key learnings and wrap up questions;
  • Complete the assessment to earn a certificate;
  • Take the workshop survey.