Windows API Reference

Microsoft Developer Network Community

Windows Video API Reference

Overview

The Windows operating system provides a rich set of APIs for developers to interact with video hardware and software capabilities. These APIs enable applications to capture, process, render, encode, decode, and stream video content efficiently.

Whether you're building a video conferencing application, a media player, a video editing suite, or a system that interacts with cameras, the Windows Video APIs offer the tools you need. The primary frameworks evolved from DirectShow to the more modern and versatile Media Foundation.

Core APIs

Direct3D

While primarily a graphics API, Direct3D plays a crucial role in video rendering and manipulation. It's used for displaying video frames on the screen, applying visual effects, and even hardware-accelerated video processing.

  • Key Use Cases: High-performance video playback, real-time video effects, 2D/3D rendering of video overlays.
  • Related APIs: Direct3D 11/12, DXGI.

DirectShow (Legacy)

DirectShow is an older but still widely used multimedia framework. It's a component-object model (COM) based API that allows for flexible construction of media processing graphs. Though superseded by Media Foundation for new development, it remains important for existing applications and certain hardware integrations.

  • Key Use Cases: Media playback, simple video capture, multimedia streaming for older applications.
  • Note: Microsoft recommends using Media Foundation for new development due to its modern architecture and improved performance.

Media Foundation

Media Foundation (MF) is Microsoft's current, preferred framework for multimedia development on Windows. It offers a unified pipeline for audio and video processing, supporting hardware acceleration and a more efficient, asynchronous model.

MF is designed to be flexible and extensible, supporting a wide range of media formats and hardware capabilities.

  • Key Use Cases: High-quality video playback, advanced video capture, encoding/decoding, streaming, camera access, modern media pipeline.
  • Components: Media Pipeline, Media Sources, Media Sinks, Media Transforms (MFTs).
  • Related APIs: Windows Runtime (WinRT) Media APIs (for UWP apps), COM interfaces like IMFMediaSource, IMFMediaStream, IMFTransform.

Key Concepts

Video Capture

Capturing video involves acquiring frames from a video source, such as a webcam or a video file. Windows provides APIs to enumerate available capture devices and to set up capture sessions.

  • Media Foundation: Uses sources like MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_GUID to find cameras and IMFVideoCaptureControl for device-specific settings.
  • DirectShow: Utilizes Filter Graphs with Video Capture Filters.

Video Rendering

Rendering video means displaying the video frames on a screen or window. This typically involves passing video samples to a graphics subsystem.

  • Media Foundation: Outputs video samples to renderers, which can be Direct3D surfaces or Windows GDI surfaces. IMFVideoRenderer is a key interface.
  • Direct3D: Used extensively for high-performance rendering.

Encoding & Decoding

Video files are often compressed using codecs to reduce file size. Encoding converts raw video frames into a compressed format, while decoding does the reverse.

  • Media Foundation: Offers Media Foundation Transforms (MFTs) for hardware-accelerated encoding and decoding of various codecs (H.264, HEVC, VP9, etc.).
  • DirectShow: Relies on COM-based codecs.

Streaming

Streaming involves sending or receiving video data over a network in real-time or near real-time.

  • Media Foundation: Supports network sinks for streaming output and network sources for input. Can integrate with protocols like RTSP.

Common Tasks

  • Enumerating available video capture devices.
  • Starting and stopping video capture.
  • Setting video resolution and frame rate.
  • Playing video files or streams.
  • Rendering video to a specific window or element.
  • Encoding video to different formats.
  • Decoding compressed video.
  • Applying real-time video effects.

Sample Code Snippets

Below are illustrative code snippets. Refer to the full documentation for complete examples.

Media Foundation: Basic Capture Initialization

This snippet shows how to find a video capture device using Media Foundation.


#include <mfapi.h>
#include <mfidl.h>
#include <Mferror.h>
#include <vector>

// ... (Initialize COM)

IMFActivate **ppDevices = NULL;
UINT32 count = 0;
HRESULT hr = MFEnumDeviceSources(NULL, &ppDevices, &count);

if (SUCCEEDED(hr) && count > 0) {
    // Found devices. ppDevices[0] would be the first one.
    // You would then call ppDevices[i]->ActivateObject(&pSource);
    // and then query for IMFMediaSource.
    // Remember to release ppDevices and the activated object.
    // For example:
    // for (UINT32 i = 0; i < count; ++i) {
    //     ppDevices[i]->Release();
    // }
    // CoTaskMemFree(ppDevices);
}

// ... (Uninitialize COM)
                

Direct3D: Rendering a Texture

Illustrates conceptual steps for rendering a video frame as a Direct3D texture.


// Assuming you have a D3D11 device, immediate context, and a texture2D
// obtained from a video decoded frame.

ID3D11Device* pDevice = GetD3D11Device();
ID3D11DeviceContext* pContext = GetD3D11DeviceContext();
ID3D11Texture2D* pVideoTexture = GetDecodedVideoTexture(); // From Media Foundation or similar

D3D11_TEXTURE2D_DESC desc;
pVideoTexture->GetDesc(&desc);

// Create a shader resource view and a render target view for rendering
ID3D11ShaderResourceView* pShaderResourceView = NULL;
pDevice->CreateShaderResourceView(pVideoTexture, NULL, &pShaderResourceView);

// ... (Set render target, vertex/pixel shaders, vertex buffers)

pContext->PSSetShaderResources(0, 1, &pShaderResourceView); // Bind texture to pixel shader
pContext->Draw(...); // Draw a quad textured with the video frame

// ... (Clean up resources like pShaderResourceView)