Windows Video API Reference
Overview
The Windows operating system provides a rich set of APIs for developers to interact with video hardware and software capabilities. These APIs enable applications to capture, process, render, encode, decode, and stream video content efficiently.
Whether you're building a video conferencing application, a media player, a video editing suite, or a system that interacts with cameras, the Windows Video APIs offer the tools you need. The primary frameworks evolved from DirectShow to the more modern and versatile Media Foundation.
Core APIs
Direct3D
While primarily a graphics API, Direct3D plays a crucial role in video rendering and manipulation. It's used for displaying video frames on the screen, applying visual effects, and even hardware-accelerated video processing.
- Key Use Cases: High-performance video playback, real-time video effects, 2D/3D rendering of video overlays.
- Related APIs: Direct3D 11/12, DXGI.
DirectShow (Legacy)
DirectShow is an older but still widely used multimedia framework. It's a component-object model (COM) based API that allows for flexible construction of media processing graphs. Though superseded by Media Foundation for new development, it remains important for existing applications and certain hardware integrations.
- Key Use Cases: Media playback, simple video capture, multimedia streaming for older applications.
- Note: Microsoft recommends using Media Foundation for new development due to its modern architecture and improved performance.
Media Foundation
Media Foundation (MF) is Microsoft's current, preferred framework for multimedia development on Windows. It offers a unified pipeline for audio and video processing, supporting hardware acceleration and a more efficient, asynchronous model.
MF is designed to be flexible and extensible, supporting a wide range of media formats and hardware capabilities.
- Key Use Cases: High-quality video playback, advanced video capture, encoding/decoding, streaming, camera access, modern media pipeline.
- Components: Media Pipeline, Media Sources, Media Sinks, Media Transforms (MFTs).
- Related APIs: Windows Runtime (WinRT) Media APIs (for UWP apps), COM interfaces like
IMFMediaSource,IMFMediaStream,IMFTransform.
Key Concepts
Video Capture
Capturing video involves acquiring frames from a video source, such as a webcam or a video file. Windows provides APIs to enumerate available capture devices and to set up capture sessions.
- Media Foundation: Uses sources like
MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_GUIDto find cameras andIMFVideoCaptureControlfor device-specific settings. - DirectShow: Utilizes Filter Graphs with Video Capture Filters.
Video Rendering
Rendering video means displaying the video frames on a screen or window. This typically involves passing video samples to a graphics subsystem.
- Media Foundation: Outputs video samples to renderers, which can be Direct3D surfaces or Windows GDI surfaces.
IMFVideoRendereris a key interface. - Direct3D: Used extensively for high-performance rendering.
Encoding & Decoding
Video files are often compressed using codecs to reduce file size. Encoding converts raw video frames into a compressed format, while decoding does the reverse.
- Media Foundation: Offers Media Foundation Transforms (MFTs) for hardware-accelerated encoding and decoding of various codecs (H.264, HEVC, VP9, etc.).
- DirectShow: Relies on COM-based codecs.
Streaming
Streaming involves sending or receiving video data over a network in real-time or near real-time.
- Media Foundation: Supports network sinks for streaming output and network sources for input. Can integrate with protocols like RTSP.
Common Tasks
- Enumerating available video capture devices.
- Starting and stopping video capture.
- Setting video resolution and frame rate.
- Playing video files or streams.
- Rendering video to a specific window or element.
- Encoding video to different formats.
- Decoding compressed video.
- Applying real-time video effects.
Sample Code Snippets
Below are illustrative code snippets. Refer to the full documentation for complete examples.
Media Foundation: Basic Capture Initialization
This snippet shows how to find a video capture device using Media Foundation.
#include <mfapi.h>
#include <mfidl.h>
#include <Mferror.h>
#include <vector>
// ... (Initialize COM)
IMFActivate **ppDevices = NULL;
UINT32 count = 0;
HRESULT hr = MFEnumDeviceSources(NULL, &ppDevices, &count);
if (SUCCEEDED(hr) && count > 0) {
// Found devices. ppDevices[0] would be the first one.
// You would then call ppDevices[i]->ActivateObject(&pSource);
// and then query for IMFMediaSource.
// Remember to release ppDevices and the activated object.
// For example:
// for (UINT32 i = 0; i < count; ++i) {
// ppDevices[i]->Release();
// }
// CoTaskMemFree(ppDevices);
}
// ... (Uninitialize COM)
Direct3D: Rendering a Texture
Illustrates conceptual steps for rendering a video frame as a Direct3D texture.
// Assuming you have a D3D11 device, immediate context, and a texture2D
// obtained from a video decoded frame.
ID3D11Device* pDevice = GetD3D11Device();
ID3D11DeviceContext* pContext = GetD3D11DeviceContext();
ID3D11Texture2D* pVideoTexture = GetDecodedVideoTexture(); // From Media Foundation or similar
D3D11_TEXTURE2D_DESC desc;
pVideoTexture->GetDesc(&desc);
// Create a shader resource view and a render target view for rendering
ID3D11ShaderResourceView* pShaderResourceView = NULL;
pDevice->CreateShaderResourceView(pVideoTexture, NULL, &pShaderResourceView);
// ... (Set render target, vertex/pixel shaders, vertex buffers)
pContext->PSSetShaderResources(0, 1, &pShaderResourceView); // Bind texture to pixel shader
pContext->Draw(...); // Draw a quad textured with the video frame
// ... (Clean up resources like pShaderResourceView)