-
Nov 26, 2024
xGeMM Chapter 6: Vectorized Memory Accesses
Explaining vectorized memory accesses in GPUs.
-
Nov 25, 2024
xGeMM Chapter 5: 2D Thread Coarsening using GPU Registers
Explaining 2D thread coarsening using GPU registers.
-
Nov 24, 2024
xGeMM Chapter 4: 1D Thread Coarsening using GPU Registers
Explaining 1D thread coarsening using GPU registers.
-
Nov 23, 2024
xGeMM Chapter 3: GPU Shared Memory
Explaining shared memory in GPUs.
-
Nov 22, 2024
xGeMM Chapter 2: GPU Global Memory Coalescing
Explaining memory coalescing in GPUs.
-
Nov 21, 2024
xGeMM Chapter 1: Getting Started with CUDA Programming
A simple introduction explaining how to get started with CUDA.
-
Nov 20, 2024
xGeMM: GPU Accelerated Matrix Multiplication (almost) like cuBLAS
Programming general matrix multiplication from scratch in CUDA.
-
Oct 30, 2024
Programming NVIDIA Tensor Cores
A simple introduction explaining how to program NVIDIA's tensor cores.
-
Dec 23, 2022
Creating custom Julia Packages
A simple introduction explaining how to create a custom Julia Package.
-
Nov 13, 2022
Github pages for the repository
Setting up individual website for the repository
-
Feb 26, 2022
Pointers for GPGPU
Refresher on Pointers.
-
Nov 20, 2021
Automatic differentiation (forward mode)
Forward mode Automatic differentiation from scratch.
-
Nov 16, 2021
Finite Differences to evaluate derivatives
Finite Differences is a stepping stone to understand Automatic Differentiation.
-
Oct 31, 2021
A high level introduction to Cache Memories
Why understanding Caches is very important for any programmer who wants to write an efficient code.