CUDA Training Courses for NVIDIA GPUs


Since 2007, AccelerEyes has developed computational software for NVIDIA GPUs. We have mastered all the essentials for developing great CUDA code and have developed engaging courses to share our knowledge with you.

Our CUDA training courses are the fastest way for developers to become proficient at programming CUDA for NVIDIA GPUs. AccelerEyes is uniquely equipped to provide training for NVIDIA CUDA GPUs due to our extensive experience programming ArrayFire. In our training courses, we are able to share important first-hand experience that will greatly benefit your development efforts.

NVIDIA GPUs lead the market in performance for accelerated computation. Contact us today to schedule a training course for your organization.

Attendees will receive the latest industry knowledge and techniques for computing with CUDA GPUs. We have helped thousands of organizations speedup their code and our primary objective is to help you increase productivity while maximizing the return on your hardware. AccelerEyes training will empower you with the knowledge your organization needs in order to achieve success in accelerated computing.

     Training picture

Upcoming Training Courses and Locations




CUDA Training Course Syllabus

  • Day 1: Introduction to CUDA
    • Lectures:
    • GPU Computing Overview
    • The CUDA Programming Model
    • Basic Dataset Mapping Techniques
    • CUDA Libraries, ArrayFire
    • Asynchronos Operation
    • Profiling Tools

    • Practice:
    • A Simple CUDA Kernel
    • Equivalent ArrayFire Example
    • Using CUDA Libraries
    • Monte Carlo Pi Estimation
    • Timing CUDA and ArrayFire
    • Debugging CUDA Code

  • Day 3: Multi-GPU
    • Lectures:
    • Multi-GPU Use Cases
    • CUDA on Multi-GPUs: CUDA Contexts
    • Existing Libraries
    • Scaling Across Multiple GPUs

    • Practice:
    • Out of Core Problems: Matrix Multiply
    • Task Level Parallelism: Optimization
    • ArrayFire Multi-GPU


  • Day 2: CUDA Optimization
    • Lectures:
    • CUDA Architecture: Grids,Blocks,and Threads
    • CUDA Memory Model: Global,Shared and Constant Memory
    • Advanced Mapping Techniques
    • CUDA Streams: Asynchronos Launches and Concurrent Execution
    • ArrayFire: Lazy Evaluation and Code Vectorization

    • Practice:
    • Matrix Transpose
    • Optimization Using Shared Memory
    • Median Filter
    • Optimization Using Constant Memory
    • CUDA Stream Example
    • ArrayFire Example: Nearest Neighbor Algorithm

  • Day 4: CUDA Algorithm Problems
    • Lectures and Practice:
    • Reductions
    • Scan Algorithms
    • Sort
    • Convolution
    • Customer-Specific Problem

Interested in OpenCL Training for AMD Devices

Fill out this form and we'll send you an email to get started.

* Required Fields

First Name *
Last Name *
Organization *
Email *
How can we help?