Introduction to DirectML
DirectML is a high-performance, hardware-accelerated machine learning inference and training API for Windows. It provides a unified way to access the capabilities of various AI accelerators, including GPUs, NPUs, and other specialized hardware.
By abstracting away the underlying hardware differences, DirectML allows developers to build and deploy machine learning models efficiently across a wide range of Windows devices.
Core Concepts
Understanding the fundamental building blocks of DirectML is crucial for developing effective machine learning applications.
Operators
Operators are the basic computational units in DirectML. They represent mathematical operations performed on tensors, such as convolution, matrix multiplication, activation functions, and pooling.
- Element-wise Operations: Operations applied to individual elements of tensors (e.g., ReLU, Sigmoid).
- Reduction Operations: Operations that reduce a tensor along one or more dimensions (e.g., Sum, Max).
- Spatial Operations: Operations that process data based on spatial relationships (e.g., Convolution, Pooling).
- Tensor Transformations: Operations that modify the shape or structure of tensors (e.g., Reshape, Transpose).
DirectML supports a rich set of predefined operators, and also allows for the creation of custom operators when needed.
Tensors
Tensors are the fundamental data structures used in DirectML. They are multi-dimensional arrays that hold the input data, intermediate results, and model parameters.
- Dimensions: The size of a tensor along each axis.
- Data Types: Supported data types include FP32, FP16, and INT8 for efficient computation.
- Layout: The order of dimensions in memory, affecting performance. Common layouts include NCHW and NHWC.
// Example of a tensor description
DML_TENSOR_DESC tensorDesc = {
DML_TENSOR_TYPE_DATA_BOX,
nullptr, // Pointer to memory
{ 1, 3, 224, 224 }, // Sizes (e.g., Batch, Channels, Height, Width)
1, // Dimension count
DML_DATA_TYPE_FLOAT32 // Data type
};
Graphs and Resources
A DirectML graph represents a complete neural network or a part of it, defined as a directed acyclic graph (DAG) of operators. Resources are used to manage the memory for tensors and intermediate computations.
- Graph Compilation: Graphs are compiled for specific hardware and execution contexts to optimize performance.
- Persistent Resources: Tensors that maintain their data between graph executions.
- Temporary Resources: Tensors used for intermediate calculations within a graph execution.
The DirectML device manages these resources and provides an execution context for running compiled graphs.
Execution
Executing a DirectML graph involves binding input and output tensors to the compiled graph and dispatching the computation to the hardware accelerator.
This process is typically managed through a command list and a Direct3D 12 queue, ensuring asynchronous execution and efficient resource utilization.
Benefits of DirectML
- Hardware Acceleration: Leverages GPUs and other accelerators for significantly faster ML inference and training.
- Cross-Platform Compatibility: Works across a wide range of Windows devices with compatible hardware.
- Unified API: Provides a single, consistent API for interacting with diverse ML hardware.
- Low Latency: Optimized for real-time AI applications.
- Integration with Windows: Seamlessly integrates with existing DirectX and Windows development ecosystems.