Compute Shader Resources

This page details the resources available and utilized by compute shaders in DirectX. Compute shaders offer a powerful way to leverage the GPU for general-purpose parallel computation, extending beyond traditional graphics rendering.

Introduction to Compute Shaders

Compute shaders are a programmable stage in the DirectX graphics pipeline that allows developers to execute arbitrary parallel computations on the GPU. Unlike vertex or pixel shaders, they are not tied to the traditional rendering pipeline and can be used for a wide range of tasks, such as:

Physics simulations
Image processing and filtering
Data analysis and machine learning
Complex geometric tessellation
AI computations

Compute Shader Execution Model

Compute shaders execute in grids of thread groups. Each thread group consists of multiple threads that can cooperate and share data through shared memory. The execution is launched by calling the Dispatch or DispatchIndirect function.

// Example Dispatch call
ID3D11DeviceContext* pDeviceContext; // Assume this is initialized
pDeviceContext->Dispatch(numGroupsX, numGroupsY, numGroupsZ);

The dimensions of the thread group are defined in the compute shader itself using the numthreads attribute.

[numthreads(32, 32, 1)]
void CSMain(uint3 dispatchThreadID : SV_DispatchThreadID)
{
    // Compute shader logic
}

Compute Shader Resources

Compute shaders interact with the GPU's memory and hardware through various resource types. These are typically bound to specific shader stages or accessible directly.

Unordered Access Views (UAVs)

UAVs are crucial for compute shaders as they provide read-and-write access to resources. This allows threads to modify data in place, write results to textures or buffers, and enable inter-thread communication within a thread group.

Textures: 2D, 3D, Cube, Array textures can be bound as UAVs.
Buffers: Structured buffers, raw buffers, and byte address buffers can be used.

In HLSL, UAVs are declared using the RWTexture2D, RWBuffer, etc. types.

RWTexture2D g_OutputTexture;
RWStructuredBuffer g_DataBuffer;

Shader Resource Views (SRVs)

SRVs provide read-only access to resources. Compute shaders can read data from textures and buffers without modifying them.

Textures
Buffers

In HLSL, SRVs are declared using Texture2D, StructuredBuffer, etc. types.

Texture2D g_InputTexture;
StructuredBuffer g_ReadOnlyData;

Constant Buffers

Constant buffers are used to pass small amounts of frequently accessed data to the shader. They are ideal for parameters that define the computation, such as simulation parameters, transformation matrices, or control values.

cbuffer ConstantBuffer : register(b0)
{
    float g_SimulationTime;
    float g_GridSize;
};

Samplers

Samplers are used to define how texture data is sampled, including filtering modes (linear, anisotropic) and address modes (clamp, wrap). While less common in pure compute tasks than in graphics, they are still available if texture sampling is required.

SamplerState g_LinearSampler : register(s0);

Shared Memory

Shared memory is a fast, on-chip memory accessible by all threads within a single thread group. It's essential for inter-thread communication and data sharing for cooperative computations.

groupshared float sharedData[256];

Synchronization using GroupMemoryBarrierWithGroupSync() is often required when using shared memory to ensure all threads have completed their writes before other threads read the data.

Common Compute Shader Patterns

Parallel Reduction

A common pattern is to reduce a large dataset to a single value (e.g., sum, maximum). This is efficiently done using shared memory and multiple passes of computation.

Parallel Sorting

Algorithms like bitonic sort or merge sort can be implemented on the GPU using compute shaders.

Data Parallelism

Applying the same operation to many data elements simultaneously, such as element-wise operations on arrays or matrices.

Simulation Updates

Updating the state of a simulation based on previous states, often involving neighbor lookups and complex interactions.

Performance Considerations

Thread Group Size: Choose a thread group size that is a multiple of the hardware's warp size (typically 32 or 64 threads) for optimal occupancy.
Memory Access Patterns: Coalesced memory access to global memory is critical. Accessing shared memory in a coalesced manner is also important.
Synchronization: Minimize the use of expensive synchronization primitives like GroupMemoryBarrierWithGroupSync().
Resource Binding: Ensure resources are correctly bound to the appropriate shader slots.

Compute shaders require specific hardware support and DirectX feature levels. Ensure your target hardware meets the requirements.

DirectX Documentation

Compute Shader Resources

Introduction to Compute Shaders

Compute Shader Execution Model

Compute Shader Resources

Unordered Access Views (UAVs)

Shader Resource Views (SRVs)

Constant Buffers

Samplers

Shared Memory

Common Compute Shader Patterns

Parallel Reduction

Parallel Sorting

Data Parallelism

Simulation Updates

Performance Considerations

Further Reading