Compute Shading - DirectX Documentation

Compute Shading in DirectX

Compute shaders provide a powerful and flexible way to leverage the massively parallel processing capabilities of modern GPUs for general-purpose computation, not just graphics rendering. They are a fundamental part of the DirectX API for tasks such as physics simulations, image processing, AI computations, and data analysis.

What is Compute Shading?

Unlike traditional pixel or vertex shaders that operate within the graphics rendering pipeline, compute shaders are dispatched independently. This allows them to perform arbitrary computations on data stored in GPU memory. They operate on groups of threads, known as thread groups, which can synchronize their work and share data.

Key Concepts

Threads: The smallest unit of execution, running on the GPU.
Thread Groups: A collection of threads that can synchronize and share data using Shared Memory.
Dispatch: The process of launching a compute shader with a specified number of thread groups in each dimension (X, Y, Z).
Unordered Access Views (UAVs): Allow compute shaders to read and write to textures and buffers non-sequentially.
Shader Resource Views (SRVs): Provide read-only access to resources.
Constant Buffers: Used to pass parameters to the compute shader.
Shared Memory: Fast, on-chip memory accessible by all threads within a thread group, crucial for inter-thread communication and optimization.

When to Use Compute Shaders?

Compute shaders are ideal for scenarios where:

Massive parallelism is available and beneficial.
Data can be processed independently or with limited synchronization.
High memory bandwidth is required.
Tasks are not directly tied to rasterization.

Common applications include:

Particle systems and fluid dynamics simulations.
Image filtering and post-processing (e.g., depth of field, ambient occlusion).
AI inference (e.g., neural network computations).
Complex data sorting and manipulation.
Procedural content generation.

Compute Shader Pipeline Steps

The compute shader execution differs from the traditional graphics pipeline:

Dispatching: An application calls a dispatch function (e.g., ID3D11DeviceContext::Dispatch or ID3D12GraphicsCommandList::Dispatch) specifying the number of thread groups to launch.
Thread Group Execution: The GPU schedules and executes thread groups. Threads within a group can synchronize using barriers (e.g., GroupMemoryBarrierWithGroupSync() in HLSL).
Resource Access: Threads within a group read from SRVs and write to UAVs. Data can be temporarily stored and shared in shared memory.
Completion: Once all dispatched thread groups have finished, the computation is complete.

Example HLSL Compute Shader

Here's a basic example of a compute shader that processes an image:


// Define the thread group size
#define THREAD_GROUP_SIZE_X 8
#define THREAD_GROUP_SIZE_Y 8

// Input texture (read-only)
Texture2D InputTexture;
// Output texture (read-write)
RWTexture2D OutputTexture;

// Sampler state
SamplerState Sampler;

[numthreads(THREAD_GROUP_SIZE_X, THREAD_GROUP_SIZE_Y, 1)]
void CSMain(uint3 dispatchThreadID : SV_DispatchThreadID)
{
    // Calculate the texture coordinates for the current thread
    uint2 textureCoords = dispatchThreadID.xy;

    // Ensure we don't go out of bounds (important if dispatch size doesn't perfectly match texture)
    uint width, height;
    OutputTexture.GetDimensions(width, height);
    if (textureCoords.x >= width || textureCoords.y >= height)
    {
        return;
    }

    // Read a color from the input texture
    float4 color = InputTexture.Sample(Sampler, float2(textureCoords) / float2(width, height));

    // Apply a simple effect (e.g., invert the color)
    float4 invertedColor = 1.0f - color;

    // Write the processed color to the output texture
    OutputTexture[textureCoords] = invertedColor;
}

Shader Resources and Views

Compute shaders heavily rely on different types of shader resources:

ID3D11ShaderResourceView / ID3D12Resource (SRV): Read-only access to textures or buffers.

ID3D11UnorderedAccessView / ID3D12Resource (UAV): Read/write access to textures or buffers. Crucial for compute shaders to modify data.

ID3D11Buffer / ID3D12Resource (Constant Buffer): Pass parameters and constants to the shader.

ID3D11Texture1D, ID3D11Texture2D, ID3D11Texture3D / ID3D12Resource: The underlying GPU resources.

Synchronization and Shared Memory

For algorithms requiring threads within a group to coordinate, HLSL provides:

GroupMemoryBarrierWithGroupSync(): Ensures that all memory operations (reads and writes) that occurred before the barrier are visible to all threads within the group after the barrier.
Shared memory arrays (e.g., groupshared float data[...];): A fast, local memory pool for a thread group.

Mastering these concepts is key to efficient and correct compute shader programming.