Dispatching Compute Shaders

This document explains how to dispatch compute shaders from your application using the DirectX API. Dispatching is the process of instructing the GPU to execute a compute shader.

Understanding Dispatching

Compute shaders are executed in thread groups. When you dispatch a compute shader, you specify the number of thread groups to launch in three dimensions (X, Y, and Z). Each thread group executes concurrently on the GPU. The total number of threads launched is the product of the dimensions of the thread groups and the number of thread groups dispatched.

Dispatch Parameters

The primary function for dispatching compute shaders is ID3D11DeviceContext::Dispatch (for Direct3D 11) or its equivalent in other Direct3D versions. This function takes the following parameters:

The total number of threads launched is calculated as:

(numGroupsX * groupSizeX) * (numGroupsY * groupSizeY) * (numGroupsZ * groupSizeZ)

where groupSizeX, groupSizeY, and groupSizeZ are defined within the compute shader itself using the [numthreads(...)] attribute.

Dispatching with Direct3D 11

To dispatch a compute shader in Direct3D 11, you first need to bind the compute shader to the pipeline and then call the Dispatch method on the device context.

Example: Dispatching a Compute Shader

Here's a simplified C++ example demonstrating how to dispatch a compute shader:

// Assume 'pComputeShader' is a pointer to your compiled compute shader
        // Assume 'pDeviceContext' is your D3D11 device context

        // Bind the compute shader to the compute shader stage
        pDeviceContext->CSSetShader(pComputeShader, nullptr, 0);

        // Define the number of thread groups to dispatch
        UINT numGroupsX = 16;
        UINT numGroupsY = 16;
        UINT numGroupsZ = 1;

        // Dispatch the compute shader
        pDeviceContext->Dispatch(numGroupsX, numGroupsY, numGroupsZ);

        // Unbind the shader (optional, depending on subsequent operations)
        // pDeviceContext->CSSetShader(nullptr, nullptr, 0);
        

Important Note

The maximum number of thread groups you can dispatch is limited by the GPU hardware. It's good practice to query these limits using D3D11_FEATURE_DATA_D3D11_OPTIONS to ensure your dispatch counts are valid.

Indirect Dispatching

For more advanced scenarios, such as when the number of thread groups to dispatch is determined dynamically by the GPU, you can use indirect dispatch. This involves using a Structured Buffer or Raw Buffer to store the dispatch parameters (numGroupsX, numGroupsY, numGroupsZ) and then having the GPU read these parameters to dispatch the compute shader.

The function for indirect dispatch is ID3D11DeviceContext::DispatchIndirect.

DispatchIndirect Parameters

Performance Tip

Indirect dispatch is particularly useful for scenarios like parallel-for loops on the GPU, or when a compute shader needs to determine how many work items should be processed in a subsequent dispatch.

Managing Dispatch Calls

Dispatch calls should be managed carefully to ensure correct synchronization and resource binding. For example, if your compute shader writes to a resource that is then read by a pixel shader, you'll need to use UAV barriers (ID3D11DeviceContext::UAVBarrier) to ensure the write operations are complete before the read operations begin.

Synchronization Example

A typical workflow involving compute shaders and graphics shaders might look like this:

  1. Bind compute shader and resources.
  2. Dispatch compute shader.
  3. Insert a UAV barrier.
  4. Bind pixel shader and read resources.
  5. Draw geometry.

Caution

Incorrect synchronization can lead to visual artifacts or data corruption. Always ensure that resources are ready before they are accessed by different shader stages.

Conclusion

Dispatching compute shaders is a fundamental operation for leveraging GPU compute power. By understanding the concepts of thread groups, dispatch parameters, and synchronization mechanisms, you can effectively utilize compute shaders for a wide range of computational tasks in your DirectX applications.