Introduction to Massively Parallel GPU Computing with CUDA

The course covers theoretical and practical principles of massively parallel GPU computing with CUDA technology based on hands-on exercises, highlighting the capabilities of parallel computing on the way with increasing complexity. We will look into CUDA hardware and software architecture, memory management, parallel computing with C/C++, common application libraries, and tools. Additional attention will be paid to the possibilities and advantages of using CUDA in machine-learning solutions.

This course will take place online. The link to the streaming platform will be provided to the registered participants only.

A remote access to Riga Technical University’s training machines will be provided (RDP and SSH client software is needed on the participant computers). Alternatively, the attendees can use their own computers with CUDA (>=11.0) compatible GPUs for the course.

Prerequisites: some programming experience in C/C++ and knowledge of parallel/threaded programming models would be useful.

Timetable:

Tuesday, March 25, 2025, 15:00-18:00 CET

Tuesday, April 1, 2025, 15:00-18:00 CET

Course Content

Day I

  1. Overview of CUDA architecture and programming model:
    • GPU evolution
    • CUDA GPU architecture
  2. Basic CUDA programming:
    • Brief revise of CUDA programming model
    • Key principles
    • Introduction to the concept of threads & blocks
    • Host-device data transfer
  3. Hands-on exercises on writing simple CUDA programs:
    • Using CUDA on HPC cluster
    • Simple programs with C/C++

Day II

  1. Overview of CUDA memory hierarchy:
    • An overview of memory levels
    • Global memory
    • Registers, constant memory, texture memory
    • Shared memory and synchronization
  2. Introduction to CUDA Deep Neural Network library (cuDNN):
    • Using cuDNN for deep neural networks
    • Convolutional neural networks in cuDNN
    • Integration with other CUDA libraries (cuBLAS, cuSOLVER, cuRAND, cuTENSOR, TensorRT)
  3. Exercises on CUDA techniques: image processing and neural networks
    • Image convolution filtering
    • Implementation of neural network from scratch with C/C++
    • Implementation of neural network using cuDNN

Learning outcomes: At the end of the workshop, attendees should be able to make an informed decision on how to approach GPU parallelization in their applications in an efficient and portable manner.

Prerequisites: some programming experience in C/C++, as well as knowledge of parallel/threaded programming models, would be useful.

Target audience: scientists and programmers who want to use CUDA for scientific application development.

Instructor: Arnis Lektauers, Dr.sc.ing.

Venue: Online via Zoom platform

Organizers: The course is organized by EuroCC-Latvia and EuroCC-Poland in collaboration with Riga Technical University’s HPC Center.