DRM is not a black box, part 2: encryption and content

Encryption is the tool that enables DRM

The internet is the birthplace of modern DRM. Even if you deploy an offline solution with local playback, DRM technologies operate with least hassle when your solution uses modern adaptive streaming technologies. A content processing workflow designed for adaptive streaming is a content processing workflow ready for DRM.

The two media delivery technologies of relevance here are DASH1 and HLS2. The Apple ecosystem uses HLS, whereas everything else uses DASH. In order to understand encryption, you need to understand how content is structured when using these technologies.

Other lesser known delivery technologies are occasionally used by some vendors. However, these alternative technologies are insignificant in the big picture and their use only leads to vendor lock-in and excessive cost. Use DASH and HLS.

Structure of DASH and HLS

Most solutions wish to target viewers on both Apple and non-Apple platforms and therefore need to present all content using both the DASH and HLS delivery technologies. Thankfully this is not as burdensome as it sounds because the two are in fact different flavors of the same thing.

The link between the two technologies is CMAF3 (a cousin of the well-known MP4 file format) which defines a common format for media data storage compatible with both DASH and HLS. The latter two technologies define the format of the manifest files that contain instructions for players on how to access the content stored in the shared CMAF media files.

Relationships between DASH, HLS and CMAF

DASH manifest files are served to DASH players, which will use the information within to load data from CMAF media files. HLS manifest files are served to HLS players, which will use the information within to load data from CMAF media files. As the bulk of the data is in the shared CMAF files, there is no duplication of storage costs or excessive CDN throughput despite using two different media delivery technologies.

The contents of the HLS and DASH manifest files are quite similar and it is often easy to convert from one to the other. This means that a content processing workflow only capable of working with either DASH or HLS can be easily enhanced by post-processing to produce the other manifest format.

The audio and video data exists in the form of samples, where one sample is one video frame (e.g. 24 samples per second) or a similar duration of audio data. Samples are grouped into self-contained segments of 2-10 second duration, where each segment is independently presentable by players without having to reference data in any other segment. A CMAF header contains the technical parameters required for a decoder to process the data contained in the segments.

Structure of CMAF media data storage

Segments may either be stored in separate files or in one giant CMAF track file (effectively just a concatenated series of segments prefixed with the CMAF header). The storage format largely depends on workflow-specific configuration and content processing service compatibility - different services expect the data to exist in different forms.

Applying DRM to movies

The media samples stored inside the CMAF files are encrypted. DRM technologies rely on encryption to control when and where the media data can be used for playback, creating situations where the decryption key is only provided to authorized DRM clients and only when all the necessary conditions to ensure adequate content protection are met.

Encryption is performed individually for each sample - it is not the files themselves that are encrypted but only the actual audio/video data within them. Even there, encryption is only partially applied - for video data only 10% of each sample is encrypted, leaving 90% in the clear. As the encryption is applied in a repeating pattern of 1:9 blocks, it is sometimes called pattern encryption. One block is 16 bytes.

10% pattern encryption applied to data in video samples
The rationale for using pattern encryption is that encrypting 10% of the data is sufficient to make the samples unusable without the key while minimizing the impact of decryption on processor capacity and battery life. Audio samples are fully encrypted due to their relatively small size.

A small prefix of up to 32 bytes in each sample is typically left in the clear to enable sample headers to be processed. This technique of dividing a sample into clear and encrypted regions is called subsample encryption. For encryption algorithm purposes (e.g. where does the 10% pattern start) the clear prefix is considered not to be a part of the sample.

Subsample encryption leaves an optional clear prefix to expose the sample headers

Subsample encryption is typically only used with video tracks and does not need to be configured by the operator - content processing tools that encrypt data already know when a clear prefix is required and will insert it automatically.

The technical details of encryption are described by the cbcs protection scheme defined in the Common Encryption4 standard. You may see material online referencing the cenc protection scheme but this scheme is effectively becoming obsolete as Apple devices do not support it. Modern devices use cbcs. In addition to cryptographic differences, the cenc protection scheme does not use pattern encryption but encrypts the entire sample.

If your target device set contains devices that do not support the cbcs protection scheme then you will need to create two copies of every encrypted track - one encrypted with cbcs (for Apple and newer non-Apple devices) and one with cenc (for older non-Apple devices).

After applying encryption the movie is safe for publishing over unprotected channels. You do not need to guard access to the files as the content cannot be accessed without the key. In some high-security scenarios, you might still care about protecting access to the files as part of a defense-in-depth strategy (e.g. by using URL signing to control who can download the files) but on a cryptographic level the sample data is now unusable for an attacker.

All modern DRM technologies use the same encryption algorithms that are equally strong - there is no difference in the strength of encryption between DRM technologies. However, different DRM client implementations do have different strengths and weaknesses. This will be explored in depth by the next article in this series.

To apply encryption, you need to know the key and the key ID (or KID). The key is a 128-bit symmetric key (i.e. the same key is used for encryption and decryption). The KID is a separate (non-secret) 128-bit value used elsewhere in DRM workflows to reference a specific key. Key management is a large topic of its own, to be covered in a separate article.

Different data formats are used for representing keys and KIDs, depending on the situation and the specific piece of software. In practice, the KID is typically formatted as a GUID5 (a0a8db17-20c7-4068-bfbd-f8a68e34f1a5) and the key as a base64 or hexadecimal string (FSu0UIH8Z06IOyXVlnmc8Q== or 0x152bb45081fc674e883b25d596799cf1).

GUIDs have multiple different binary representations! The “Linux style” binary format is acheived by removing the dashes and treating a GUID as a hexadecimal string. The “Windows style” binary format is achieved in a more complex fashion, with a different order of bytes.

Media streaming systems typically use the Linux style binary format. This has caused much pain when software platforms assume Windows style and developers only discover that 2 years into production when integrating another system that uses Linux style. In hindsight using GUIDs for KIDs was a mistake but it’s too late to go back now.

A different key is used for tracks carry content with different value (e.g. 4K quality is more valuable than SD quality) or different decoder paths (audio versus video). For example, a typical movie might use the following set of keys:

  1. SD video key
  2. HD video key
  3. UHD video key
  4. audio key

After encrypting the samples, the CMAF header and segment header are extended with encryption-relevant information6. The DASH and HLS manifest are annotated, at minimum, with the KID of each track. Often, DRM client initialization data is also added to the manifests.

Embedded initialization data

A player needs to provide initialization data to a DRM client in order to activate it. The format of this data is specific to the DRM technology but is typically based on the pssh format defined by Common Encryption4.

DRM client initialization data is embedded into the DASH and HLS manifests for every DRM technology that the solution integrates with at the time of content processing. The manifest files can be easily extended with more initialization data when additional DRM technologies are integrated into a solution.

Historically, initialization data was embedded into the CMAF header instead of manifest files but this approach is deprecated due to maintenance problems it caused - it is much harder to change the CMAF header if you need to replace or extend the DRM initialization data.

Here is an example of PlayReady and FairPlay initialization data for the same KID, in DASH format for PlayReady and HLS format for FairPlay. PlayReady uses the pssh format, whereas FairPlay uses a URL-like initialization data format:

<ContentProtection value="MSPR 2.0" schemeIdUri="urn:uuid:9a04f079-9840-4286-ab92-e65be0885f95">

Initialization data is obviously unique for each DRM technology but also for each key, because it typically contains the KID. Because of this, initialization data can make up a surprisingly large portion of a manifest file.

It is most common to provide initialization data as part of the configuration for the content packager that performs the encryption. Different packagers accept configuration in different ways (some even use a hybrid push-pull model to retrieve different parts of the configuration in different ways). No matter how the configuration is defined the end result is the same: DRM initialization data is given to the packager by an external system.

Unique initialization data is associated with each key and each DRM technology

Some content packagers, however are capable of generating DRM initialization data on their own. This can lead to significant simplification in the backend workflows. You should use this capability whenever possible.

In many cases, the packager can simply generate DRM initialization data if it knows what DRM technologies are to be used

Initialization data is legacy

Other than to provide initialization data, there is no reason the content processing workflow even needs to know what DRM technologies will be used with the content! The requirement to embed this data into the manifests complicates data flows and often does so needlessly. Reducing unnecessary interactions in DRM workflows is a critical part of making it simple to integrate DRM into a solution, so this requirement is best eliminated.

DRM initialization data is a mechanism for configuring DRM client behavior. It exists because there used to be no APIs that players could use to communicate with a DRM client - any configuration of DRM client behavior had to be done by embedding a DRM technology specific blob into the video itself. Later, this blob moved into the manifest files for easier manipulation. Today, the times have changed and modern DRM clients do offer APIs (e.g. Encrypted Media Extensions7 and equivalents). As a result, DRM initialization data often contains no information of value. Yet it persists in the APIs and workflows as an artifact of history.

In principle, providing DRM initialization data in the manifest files is optional - the initialization data could be acquired through other means by the player or even generated on the fly. Unfortunately, few players currently implement DRM client activation without embedded initialization data.

In short, the initialization data is typically just a DRM technology specific way to represent the KID. If possible, use players that can generate DRM client initialization data on the fly - by separating DRM technologies from your content pipeline you will achieve a more robust and maintainable solution architecture.

The player can generate DRM initialization data and greatly reduce content processing complexity and simplify interactions in DRM workflows

DRM client initialization data is specific to each DRM technology and to each key. As such, a unique instance of the initialization data is embedded into the DASH and HLS manifests for each key used to encrypt the movie and for each DRM technology the movie is to be used with. A movie with a typical set of 4 keys, used with 3 DRM technologies, would have 4x3=12 instances of DRM initialization data embedded into it. By generating initialization data at runtime, you also benefit from minimizing the size of the manifest files, leading to greater efficiency, especially in live scenarios where the manifest must be refreshed periodically.

What is encrypted?

Audio and video tracks are encrypted. Text tracks (subtitles) are not.

Text tracks are technically equivalent to audio and video tracks but are entirely ignored in this series of articles because in practice they are never encrypted nor protected by DRM.

What requires configuration?

It can be confusing to tell the difference between what part of the encryption process is handled automatically by media processing products or services and what must be explicitly configured by an operator.

There is no straightforward answer as the details depend on the exact products and services used to perform the encryption - if you use very low-level media processing tools you may need to provide low level details, whereas if you use a fully automated publishing pipeline then you might not need to do anything at all.

Consult an expert to determine the appropriate setup for your specific situation.

Coming up next

Having encrypted the samples and embedded DRM client initialization data, the content is ready to be served. The next article will take a look at the duties of the DRM client.

Key management and other DRM-relevant data flows in the content pipeline will be covered in a separate article.

View all articles in the series.

  1. ISO/IEC 23009-1:2019 (MPEG-DASH) ↩︎

  2. RFC 8216 and HTTP Live Streaming (Apple) ↩︎

  3. ISO/IEC 23000-19:2020 (MPEG-CMAF) ↩︎

  4. ISO/IEC 23001-7:2016 (Common Encryption) ↩︎

  5. How many ways are there to sort GUIDs? How much time do you have? ↩︎

  6. DASH-IF Implementation Guidelines: Content Protection and Security (8.1 Content protection data in CMAF containers) ↩︎

  7. W3C Encrypted Media Extensions ↩︎

Sander Saares
Sander Saares

Expert in media streaming and content security