DirectCompute API Documentation

This section provides comprehensive documentation for the DirectCompute API, a feature of Direct3D 11 and later that allows the GPU to be used for general-purpose computation.

Introduction to DirectCompute

DirectCompute enables developers to leverage the massively parallel processing power of modern GPUs for tasks beyond traditional graphics rendering. This includes scientific simulations, signal processing, physics calculations, machine learning, and more.

By exposing compute shaders and related structures through the Direct3D 11 API, DirectCompute provides a unified programming model for both graphics and general-purpose GPU (GPGPU) computing.

Core Concepts

Understanding these fundamental concepts is crucial for effective DirectCompute programming:

Compute Shaders

Compute shaders are programmable stages that execute on the GPU. Unlike vertex or pixel shaders, they are not tied to the graphics pipeline and can be used to perform arbitrary computations on arbitrary data.

Threads and Thread Groups

Compute shaders execute as a grid of threads. Threads are organized into thread groups, and threads within a group can share data and synchronize their execution using shared memory and barriers. This allows for efficient data parallelism and complex inter-thread communication within a group.

Unordered Access Views (UAVs)

UAVs provide read and write access to resources such as textures and buffers from compute shaders. This is essential for algorithms that modify data iteratively or require read-modify-write operations.

Shader Resource Views (SRVs)

SRVs allow compute shaders to read from resources like textures and buffers. They define how the GPU interprets the data within a resource.

Constant Buffers

Constant buffers are used to pass small, frequently accessed data (like parameters, matrices, or configuration settings) to the compute shader. They are read-only for the shader.

API Reference

DirectCompute functionality is exposed through various Direct3D 11 functions and objects. Here are some key elements:

ID3D11Device::CreateComputeShader

Creates a compute shader from compiled shader code.

HRESULT CreateComputeShader(
  [in]            const void *pShaderBytecode,
  [in]            SIZE_T BytecodeLength,
  [in, optional]  const D3D11_CLASS_LINKAGE *pClassLinkage,
  [out, optional] ID3D11ComputeShader **ppComputeShader
);
pShaderBytecode
Pointer to the compiled shader code.
BytecodeLength
Size of the compiled shader code in bytes.
pClassLinkage
Optional pointer for linking shader classes.
ppComputeShader
Pointer to receive the created compute shader interface.
Return Value
Returns S_OK on success, or one of the following error codes:
  • E_INVALIDARG: If any of the arguments are invalid.
  • E_OUTOFMEMORY: If the system cannot allocate enough memory.

ID3D11DeviceContext::Dispatch

Dispatches one or more thread groups to execute a compute shader.

void Dispatch(
  [in] UINT ThreadGroupCountX,
  [in] UINT ThreadGroupCountY,
  [in] UINT ThreadGroupCountZ
);
ThreadGroupCountX
Number of thread groups to launch in the X dimension.
ThreadGroupCountY
Number of thread groups to launch in the Y dimension.
ThreadGroupCountZ
Number of thread groups to launch in the Z dimension.

Note: The total number of threads launched is ThreadGroupCountX * ThreadGroupCountY * ThreadGroupCountZ * (threads per group X * threads per group Y * threads per group Z).

ID3D11DeviceContext::CSSetShaderResources

Sets shader resource views (SRVs) to be used by the compute shader.

void CSSetShaderResources(
  [in] UINT StartSlot,
  [in] UINT NumViews,
  [in, optional]  const ID3D11ShaderResourceView *const *ppShaderResourceViews
);
StartSlot
The starting shader resource view slot.
NumViews
The number of SRVs to set.
ppShaderResourceViews
Array of SRV pointers.

ID3D11DeviceContext::CSSetUnorderedAccessViews

Sets unordered access views (UAVs) to be used by the compute shader.

void CSSetUnorderedAccessViews(
  [in] UINT StartUAV,
  [in] UINT NumUAVs,
  [in, optional]  const ID3D11UnorderedAccessView *const *ppUnorderedAccessViews,
  [in, optional]  const UINT *pUAVInitialCounts
);
StartUAV
The starting UAV slot.
NumUAVs
The number of UAVs to set.
ppUnorderedAccessViews
Array of UAV pointers.
pUAVInitialCounts
Optional array of initial counts for append/consume buffers.

ID3D11DeviceContext::CSSetConstantBuffers

Sets constant buffers to be used by the compute shader.

void CSSetConstantBuffers(
  [in] UINT StartSlot,
  [in] UINT NumBuffers,
  [in, optional]  const ID3D11Buffer *const *ppConstantBuffers
);
StartSlot
The starting constant buffer slot.
NumBuffers
The number of constant buffers to set.
ppConstantBuffers
Array of constant buffer pointers.

Compute Shaders

Compute shaders are written using HLSL (High-Level Shading Language). They define the kernel logic that runs on the GPU.

Basic Compute Shader Structure

A typical compute shader defines a kernel function that takes thread group and thread indices as input and operates on resources.


// Define structures for input/output data if needed
struct DataItem {
    float4 value;
};

// Define resources
RWTexture2D<float4> outputTexture : register(u0); // UAV for output texture
RWByteAddressBuffer outputBuffer : register(u1);  // UAV for buffer output
StructuredBuffer<DataItem> inputBuffer : register(t0); // SRV for input buffer
cbuffer Params : register(b0) {
    float scale;
    uint bufferSize;
};

// Define the kernel function
[numthreads(8, 8, 1)] // Define threads per thread group
void CSMain(uint3 dispatchThreadID : SV_DispatchThreadID)
{
    // Calculate the index for accessing resources
    uint index = dispatchThreadID.x + dispatchThreadID.y * bufferSize;

    // Example: Read from input buffer, modify, and write to output texture
    if (index < inputBuffer.Length) {
        DataItem item = inputBuffer[index];
        float4 computedValue = item.value * scale;
        outputTexture[dispatchThreadID.xy] = computedValue;
    }

    // Example: Write to byte address buffer (requires careful indexing)
    if (index < bufferSize) {
        outputBuffer.Store(index * 4, float4(dispatchThreadID.x, dispatchThreadID.y, 0.0f, 1.0f));
    }
}
            

The [numthreads(x, y, z)] attribute defines the dimensions of a single thread group. The SV_DispatchThreadID semantic provides the unique index of the current thread within the entire dispatch grid.

Code Examples

Here are some common use cases and simplified code snippets.

Example: Image Processing (Color Inversion)

This example demonstrates how to invert the colors of an image using a compute shader.

HLSL Compute Shader (invert.hlsl)


RWTexture2D<float4> outputTexture : register(u0);
Texture2D<float4> inputTexture : register(t0);

[numthreads(16, 16, 1)]
void CSMain(uint2 texCoord : SV_DispatchThreadID)
{
    uint2 textureSize;
    inputTexture.GetDimensions(textureSize.x, textureSize.y);

    if (texCoord.x < textureSize.x && texCoord.y < textureSize.y) {
        float4 color = inputTexture[texCoord];
        outputTexture[texCoord] = 1.0f - color;
    }
}
                    

C++ (Simplified)


// Assuming you have D3D11 device and device context initialized

// 1. Compile the HLSL shader
ID3DBlob* shaderBlob = nullptr;
D3DCompileFromFile(L"invert.hlsl", nullptr, nullptr, "CSMain", "cs_5_0", 0, 0, &shaderBlob, nullptr);

ID3D11ComputeShader* computeShader = nullptr;
device->CreateComputeShader(shaderBlob->GetBufferPointer(), shaderBlob->GetBufferSize(), nullptr, &computeShader);
shaderBlob->Release();

// 2. Create input and output textures with appropriate SRV and UAV
// ... (texture creation code) ...
ID3D11ShaderResourceView* inputTextureSRV = ...;
ID3D11UnorderedAccessView* outputTextureUAV = ...;

// 3. Set shader resources and UAVs
deviceContext->CSSetShaderResources(0, 1, &inputTextureSRV);
deviceContext->CSSetUnorderedAccessViews(0, 1, &outputTextureUAV, nullptr);
deviceContext->CSSetShader(computeShader, nullptr, 0);

// 4. Dispatch the compute shader
uint width, height;
// Get texture dimensions
deviceContext->Dispatch(width / 16 + (width % 16 != 0), height / 16 + (height % 16 != 0), 1);

// 5. Unset and clean up
deviceContext->CSSetShader(nullptr, nullptr, 0);
// ... (release resources) ...
                    

Example: Data Processing (Buffer Transformation)

This example shows a basic buffer transformation where each element is scaled.

HLSL Compute Shader (transform.hlsl)


RWStructuredBuffer<float4> outputBuffer : register(u0);
StructuredBuffer<float4> inputBuffer : register(t0);

cbuffer Params : register(b0) {
    float scaleFactor;
    uint numElements;
};

[numthreads(256, 1, 1)]
void CSMain(uint3 dispatchThreadID : SV_DispatchThreadID)
{
    uint index = dispatchThreadID.x;
    if (index < numElements) {
        float4 data = inputBuffer[index];
        outputBuffer[index] = data * scaleFactor;
    }
}
                    

C++ (Simplified)


// Assuming D3D11 device and context, and compiled shader

// 1. Create input and output structured buffers with SRV and UAV
// ... (buffer creation code) ...
ID3D11Buffer* inputBufferCPU = ...;
ID3D11Buffer* outputBufferGPU = ...;
ID3D11ShaderResourceView* inputBufferSRV = ...;
ID3D11UnorderedAccessView* outputBufferUAV = ...;

// 2. Create and set constant buffer for parameters
D3D11_BUFFER_DESC cbd;
cbd.Usage = D3D11_USAGE_DYNAMIC;
cbd.ByteWidth = sizeof(struct { float scale; uint count; });
cbd.BindFlags = D3D11_BIND_CONSTANT_BUFFER;
cbd.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
cbd.MiscFlags = 0;
cbd.StructureByteStride = 0;
device->CreateBuffer(&cbd, nullptr, &constantBuffer);

// Fill and update constant buffer
D3D11_MAPPED_SUBRESOURCE ms;
deviceContext->Map(constantBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &ms);
memcpy(ms.pData, ¶ms, sizeof(params));
deviceContext->Unmap(constantBuffer, 0);

// 3. Bind resources
deviceContext->CSSetShaderResources(0, 1, &inputBufferSRV);
deviceContext->CSSetUnorderedAccessViews(0, 1, &outputBufferUAV, nullptr);
deviceContext->CSSetConstantBuffers(0, 1, &constantBuffer);
deviceContext->CSSetShader(computeShader, nullptr, 0);

// 4. Dispatch
uint numGroups = numElements / 256 + (numElements % 256 != 0);
deviceContext->Dispatch(numGroups, 1, 1);

// 5. Unbind and clean up
deviceContext->CSSetShader(nullptr, nullptr, 0);
// ... (resource cleanup) ...
                    
Back to Top