Fundamentals of Accelerated Computing with CUDA C/C++
NVIDIA Deep Learning Institute sagatavotais kurss, ko pasniedz NVIDIA Vēstnieks Latvijā Asociētais profesors Dr. sc. ing. Arnis Lektauers. Kursa materiāli tiek rādīti angļu valodā, bet mācību valda – latviešu.
Kursa saturs atbilst CUDA ievadkursam.
Priekšnoteikums:
- Pamatzināšanas C/C++, tai skaitā zināšanas par mainīgo veidiem, cilpām, nosacījumu priekšstatu, funkcijām un masīvu manipulācijām;
- Priekšzināšanas par CUDA programmēšanu nav nepieciešamas.
Mācību ilgums: 8h.
Dalības maksu nosegs EuroCC 2 projekts.
Formāts: Klātiene vai attālināti.
Sertifikāts: tiek izsniegts NVIDIA DLI sertifikāts, ja tiek nokārtots gala pārbaudījums kursa beigās.
Datora prasības: katram dalībniekam tiks nodrošināta darbavieta pie datora ar piekļuvi GPU serverim.
Kursa mērķis un uzdevumi
- Write code to be executed by a GPU accelerator;
- Expose and express data and instruction-level parallelism in C/C++ applications using CUDA;
- Utilize CUDA-managed memory and optimize memory migration using asynchronous prefetching;
- Leverage command-line and visual profilers to guide your work;
- Utilize concurrent streams for instruction-level parallelism;
- Write GPU-accelerated CUDA C/C++ applications, or refactor existing CPU-only applications, using a profile-driven approach.
Tēmas
1. Accelerating Applications with CUDA C/C++.
Learn the essential syntax and concepts to be able to write GPU-enabled C/C++ applications with CUDA:
- Write, compile, and run GPU code;
- Control parallel thread hierarchy;
- Allocate and free memory for the GPU.
2. Managing Accelerated Application Memory with CUDA C/C++.
Learn the command-line profiler and CUDA-managed memory, focusing on observation-driven application improvements and a deep understanding of managed memory behavior:
- Profile CUDA code with the command-line profiler;
- Go deep on unified memory;
- Optimize unified memory management.
3. Asynchronous Streaming and Visual Profiling for Accelerated Applications with CUDA C/C++.
Identify opportunities for improved memory management and instruction-level parallelism:
- Profile CUDA code with NVIDIA Nsight Systems;
- Use concurrent CUDA streams.
4. Final review:
- Review key learnings and wrap up questions;
- Complete the assessment to earn a certificate;
- Take the workshop survey.