http://mpeg.chiariglione.org/standards/exploration/high-dynamic-range-and-wide-colour-gamut-content-distribution
Good place to start.:-)
INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11 MPEG2014/N15029
October 2014, Strasbourg, France
Source: Requirements
Status: Approved
Title: Draft Requirements and Explorations for HDR and WCG Content Distribution
Editor(s): Ajay Luthra, Edouard François, Walt Husak
Abstract
Current television systems provide Standard Dynamic Range (SDR), supporting a range of brightness that is significantly smaller than the range that the human eye is capable of discerning. Similarly, current video systems do not support the wide range of colours that the human eye can perceive. Future television and other video distribution environments are expected to give a viewing experience that is closer to a real life experience, to provide the user with a stronger sense of “being there”. This document provides requirements and use cases for higher dynamic ranges and wider colour gamuts than are typically supported today. MPEG has initiated an effort to determine if any changes to the current MPEG standards are needed to meet these requirements.
1. Introduction
Current video distribution environments provide Standard Dynamic Range (SDR), typically supporting a range of brightness of around 0.1 to 100 cd/m2 (often referred to as “nits”). This range is significantly smaller than the range encountered in real life. For example, a light bulb can have more than 10,000 cd/m2, surfaces lit in the sunlight can have brightness upwards of hundreds of thousands of cd/m2, while the night sky can be 0.005 cd/m2 or lower.
One of the key goals of Ultra High Definition Television (UHDTV) is to provide a user a sense of “being there” and “reality” [1]. Increasing resolution alone may not be sufficient to fully attain this goal, without also creating, capturing and displaying content that has much higher peak brightness and much larger contrast values than today’s TV. In addition, a greater sense of reality requires rendering colours that are richer than those provided by the colour gamuts commonly used today, e.g. BT.709 [2]. Thus, new content will not only have orders of magnitude greater brightness and contrast, but also significantly wider colour gamut (e.g. BT.2020 [3] or possibly even wider in the future).
It is not clear at this stage if the existing MPEG video coding standards are able to efficiently support the needs of future content distribution environments with higher dynamic range and wide colour gamut. MPEG will initiate the effort to see if any changes are needed in current MPEG standards to meet these requirements. This document pulls together the needs of future high quality content distribution systems. In this process, a more complete picture that involves the end-to-end chain, from video generation to final destination is considered. That chain includes the stages of creation, capture, intermediate (mezzanine) level distribution, and final distribution to the home (see Annex A).
2. Definitions
2.1. Dynamic Range
Overall, the dynamic range of a scene can be described as the ratio of the maximum light intensity to the minimum light intensity [1]. In digital cameras, the most commonly used unit for measuring dynamic range is in terms of f-stop, which describes total light range by powers of 2. The current ad hoc use of the term f-stop, refers to the following dynamic ranges:
• 10 f-stops = a difference of 210 = 1024: 1 contrast ratio.
• 14 f-stops = a difference of 214 = 16,384: 1 contrast ratio.
• 16 f-stops = a difference of 216 = 65,536: 1 contrast ratio.
100,000:1 is normally regarded as approximately the range that the eye can see in a scene with no adaptation.
• 20 f-stops = a difference of 220 = 1,048,576: 1 contrast ratio.
1,000,000:1 is normally regarded as approximately the range that the eye can see in a scene with minimal (no noticeable) adaptation.
In the categorization of dynamic ranges, the following definitions are typical and will be used in the present document:
• Standard Dynamic Range (SDR) is ≤ 10 f-stops
• Enhanced Dynamic Range (EDR) is > 10 f-stops and ≤ 16 f-stops
• High Dynamic Range (HDR) is > 16 f-stops
2.2. Colour Gamut
Colour gamut, also known as colour space, describes the range of colours that can be represented in a particular circumstance, such as the colour space that humans may perceive or the subset of colours supported by a certain output device or video distribution system [5]. Historically, the colour gamut for content in Standard Definition is defined in ITU-R BT.601 [4] and that for content in High Definition is defined in ITU-R BT.709 [2].
With the proliferation of new display technologies (e.g. OLED and quantum dot) and UHDTV, the industry recognized the need to include colours beyond those available in BT.709 [6]-[9]. Colour gamut larger than BT.709 is referred as Wide Colour Gamut (WCG). Examples of wide colour gamut include ITU-R BT.2020 [3] and Digital Cinema P3.
2.3. Scene Referred and Display Referred pictures
Scene Referred pictures are linearly related to the real luminance values captured from the original scene. In a Scene Referred pipeline the processed image is not directly viewable.
Display Referred values correspond to how an image is rendered on a specific display. The pixel sample values in the captured scene or associated graded sequence may subsequently need to be modified to match the capabilities of the actual display. For example, content represented in BT.2020 colour space would need to be modified to be consistent with the capabilities of a BT.709 display. Similarly, the luminance/luma values in the captured or graded scene may need to be modified to match the dynamic range and the peak luminance capabilities of the actual display.
3. Video Level Considerations and Requirements
3.1. Dynamic Range
The dynamic range of the content and the display should be decoupled.
The achievable and desired brightness and dynamic ranges of various displays may be significantly different from those of the capturing and creating devices. For example, a content creation system may be able to create or capture content with a contrast of 1,000,000:1, to allow a cinematographer or artist to select a sub-range from a wider range to draw out desired detail in post-production grading, but it may be neither desirable nor feasible to have displays with that wider range. Some display dependent mapping of the content’s dynamic range may therefore be required. That mapping may be done at the encoding end or the receiving end. This may also be a function of the distribution mechanism, e.g. point-to-point communication or broadcasting.
The standard should allow for content to be adapted to the target display’s dynamic range. This may occur through the signalling of metadata, but solutions are not limited to this approach. Specific limits may be defined in the context of profiles and levels. The standard should be able to migrate from where the content capturing and displaying ranges are today to where they will be in the medium to the long term.
3.2. Content Input Types and Bit Depths
The content may be distributed to a consumer electronically (i.e. via a broadcasting or other network connection) or physically (on optical media, flash memory, magnetic storage, etc.). The types of content to be supported include:
• Camera captured video
• Still images
• Computer generated content (e.g. animation)
• High contrast security-camera content
The standard shall support integer (8, 10, 12, and 16 bits) and half-floating point (IEEE 754) input video data formats. One or several internal compressed integer formats may be defined. As the standard is likely to operate internally using fixed point arithmetic, mechanisms should be provided that would allow an encoder to map floating point formats to the appropriate integer formats that the encoder considers to be the most efficient.
A mechanism to indicate the mapping used to create the input integer values provided to an encoder should be provided.
A mechanism should be provided that would allow a receiver to map the decoded video format to the one needed for display.
Good place to start.:-)
INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11 MPEG2014/N15029
October 2014, Strasbourg, France
Source: Requirements
Status: Approved
Title: Draft Requirements and Explorations for HDR and WCG Content Distribution
Editor(s): Ajay Luthra, Edouard François, Walt Husak
Abstract
Current television systems provide Standard Dynamic Range (SDR), supporting a range of brightness that is significantly smaller than the range that the human eye is capable of discerning. Similarly, current video systems do not support the wide range of colours that the human eye can perceive. Future television and other video distribution environments are expected to give a viewing experience that is closer to a real life experience, to provide the user with a stronger sense of “being there”. This document provides requirements and use cases for higher dynamic ranges and wider colour gamuts than are typically supported today. MPEG has initiated an effort to determine if any changes to the current MPEG standards are needed to meet these requirements.
1. Introduction
Current video distribution environments provide Standard Dynamic Range (SDR), typically supporting a range of brightness of around 0.1 to 100 cd/m2 (often referred to as “nits”). This range is significantly smaller than the range encountered in real life. For example, a light bulb can have more than 10,000 cd/m2, surfaces lit in the sunlight can have brightness upwards of hundreds of thousands of cd/m2, while the night sky can be 0.005 cd/m2 or lower.
One of the key goals of Ultra High Definition Television (UHDTV) is to provide a user a sense of “being there” and “reality” [1]. Increasing resolution alone may not be sufficient to fully attain this goal, without also creating, capturing and displaying content that has much higher peak brightness and much larger contrast values than today’s TV. In addition, a greater sense of reality requires rendering colours that are richer than those provided by the colour gamuts commonly used today, e.g. BT.709 [2]. Thus, new content will not only have orders of magnitude greater brightness and contrast, but also significantly wider colour gamut (e.g. BT.2020 [3] or possibly even wider in the future).
It is not clear at this stage if the existing MPEG video coding standards are able to efficiently support the needs of future content distribution environments with higher dynamic range and wide colour gamut. MPEG will initiate the effort to see if any changes are needed in current MPEG standards to meet these requirements. This document pulls together the needs of future high quality content distribution systems. In this process, a more complete picture that involves the end-to-end chain, from video generation to final destination is considered. That chain includes the stages of creation, capture, intermediate (mezzanine) level distribution, and final distribution to the home (see Annex A).
2. Definitions
2.1. Dynamic Range
Overall, the dynamic range of a scene can be described as the ratio of the maximum light intensity to the minimum light intensity [1]. In digital cameras, the most commonly used unit for measuring dynamic range is in terms of f-stop, which describes total light range by powers of 2. The current ad hoc use of the term f-stop, refers to the following dynamic ranges:
• 10 f-stops = a difference of 210 = 1024: 1 contrast ratio.
• 14 f-stops = a difference of 214 = 16,384: 1 contrast ratio.
• 16 f-stops = a difference of 216 = 65,536: 1 contrast ratio.
100,000:1 is normally regarded as approximately the range that the eye can see in a scene with no adaptation.
• 20 f-stops = a difference of 220 = 1,048,576: 1 contrast ratio.
1,000,000:1 is normally regarded as approximately the range that the eye can see in a scene with minimal (no noticeable) adaptation.
In the categorization of dynamic ranges, the following definitions are typical and will be used in the present document:
• Standard Dynamic Range (SDR) is ≤ 10 f-stops
• Enhanced Dynamic Range (EDR) is > 10 f-stops and ≤ 16 f-stops
• High Dynamic Range (HDR) is > 16 f-stops
2.2. Colour Gamut
Colour gamut, also known as colour space, describes the range of colours that can be represented in a particular circumstance, such as the colour space that humans may perceive or the subset of colours supported by a certain output device or video distribution system [5]. Historically, the colour gamut for content in Standard Definition is defined in ITU-R BT.601 [4] and that for content in High Definition is defined in ITU-R BT.709 [2].
With the proliferation of new display technologies (e.g. OLED and quantum dot) and UHDTV, the industry recognized the need to include colours beyond those available in BT.709 [6]-[9]. Colour gamut larger than BT.709 is referred as Wide Colour Gamut (WCG). Examples of wide colour gamut include ITU-R BT.2020 [3] and Digital Cinema P3.
2.3. Scene Referred and Display Referred pictures
Scene Referred pictures are linearly related to the real luminance values captured from the original scene. In a Scene Referred pipeline the processed image is not directly viewable.
Display Referred values correspond to how an image is rendered on a specific display. The pixel sample values in the captured scene or associated graded sequence may subsequently need to be modified to match the capabilities of the actual display. For example, content represented in BT.2020 colour space would need to be modified to be consistent with the capabilities of a BT.709 display. Similarly, the luminance/luma values in the captured or graded scene may need to be modified to match the dynamic range and the peak luminance capabilities of the actual display.
3. Video Level Considerations and Requirements
3.1. Dynamic Range
The dynamic range of the content and the display should be decoupled.
The achievable and desired brightness and dynamic ranges of various displays may be significantly different from those of the capturing and creating devices. For example, a content creation system may be able to create or capture content with a contrast of 1,000,000:1, to allow a cinematographer or artist to select a sub-range from a wider range to draw out desired detail in post-production grading, but it may be neither desirable nor feasible to have displays with that wider range. Some display dependent mapping of the content’s dynamic range may therefore be required. That mapping may be done at the encoding end or the receiving end. This may also be a function of the distribution mechanism, e.g. point-to-point communication or broadcasting.
The standard should allow for content to be adapted to the target display’s dynamic range. This may occur through the signalling of metadata, but solutions are not limited to this approach. Specific limits may be defined in the context of profiles and levels. The standard should be able to migrate from where the content capturing and displaying ranges are today to where they will be in the medium to the long term.
3.2. Content Input Types and Bit Depths
The content may be distributed to a consumer electronically (i.e. via a broadcasting or other network connection) or physically (on optical media, flash memory, magnetic storage, etc.). The types of content to be supported include:
• Camera captured video
• Still images
• Computer generated content (e.g. animation)
• High contrast security-camera content
The standard shall support integer (8, 10, 12, and 16 bits) and half-floating point (IEEE 754) input video data formats. One or several internal compressed integer formats may be defined. As the standard is likely to operate internally using fixed point arithmetic, mechanisms should be provided that would allow an encoder to map floating point formats to the appropriate integer formats that the encoder considers to be the most efficient.
A mechanism to indicate the mapping used to create the input integer values provided to an encoder should be provided.
A mechanism should be provided that would allow a receiver to map the decoded video format to the one needed for display.
Comment