From f309fbeac0ff3366efbaeee2dd0e187533851534 Mon Sep 17 00:00:00 2001
From: Tristan Matthews
Date: Wed, 2 Nov 2016 13:42:33 0400
Subject: [PATCH 1/1] draftchonetvcapplypvq: proofreading

doc/ietf/draftchonetvcapplypvq.xml  116 +++++++++++++++++
1 file changed, 58 insertions(+), 58 deletions()
diff git a/doc/ietf/draftchonetvcapplypvq.xml b/doc/ietf/draftchonetvcapplypvq.xml
index 0f15c06..e018df4 100644
 a/doc/ietf/draftchonetvcapplypvq.xml
+++ b/doc/ietf/draftchonetvcapplypvq.xml
@@ 6,7 +6,7 @@

Applying PVQ Outside Daala
@@ 34,11 +34,11 @@ full title is longer than 39 characters >

@@ 48,7 +48,7 @@ specify just the year. >
NETVC (Internet Video Codec)
@@ 61,7 +61,7 @@ output. If you submit your draft to the RFC Editor, the
keywords will be used for the search engine. >
This document describes the Perceptual Vector Quantization (PVQ)
+This document describes the Perceptual Vector Quantization (PVQ)
outside of the Daala video codec, where PVQ was originally developed.
It discusses the issues arising while integrating PVQ into a traditional
video codec, AV1.
@@ 74,7 +74,7 @@ video codec, AV1.
Perceptual Vector Quantization (PVQ)

+
has been proposed
as a quantization and coefficient coding tool for an internet video codec.
PVQ was originally developed for the Daala video codec
@@ 84,11 +84,11 @@ of transform coefficients instead of more traditional scalar quantization.
is now commonly expanded as "Perceptual Vector Quantization".)
The most distinguishing idea of PVQ is the way it references a predictor.
With PVQ, we do not subtract the predictor from the input to produce a residual,
+With PVQ, we do not subtract the predictor from the input to produce a residual,
which is then transformed and coded.
Both the predictor and the input are transformed into the frequency domain.
Then, PVQ applies a reflection to both the predictor and the input such that
+Then, PVQ applies a reflection to both the predictor and the input such that
the prediction vector lies on one of the coordinate axes, and codes the angle between them.
By not subtracting the predictor from the input, the gain of the predictor can be preserved
and is explicitly coded,
@@ 97,29 +97,29 @@ Since DC is not quantized by PVQ, the gain can be viewed as the amount of contra
which is an important perceptual parameter.
Also, an input block of transform coefficients is split into frequency bands
+Also, an input block of transform coefficients is split into frequency bands
based on their spatial orientation and scale.
Then, each band is quantized by PVQ separately.
The 'gain' of a band indicates the amount of contrast in the corresponding orientation and scale.
It is simply the L2 norm of the band. The gain is nonlinearly companded
and then scalar quantized and coded.
+and then scalar quantized and coded.
The remaining information in the band, the 'shape',
is then defined as a point on the surface of a unit hypersphere.
Another benefit of PVQ is activity masking based on the gain,
+Another benefit of PVQ is activity masking based on the gain,
which automatically controls the quantization resolution based on the image contrast
without any signaling.
For example, for a smooth image area (i.e. low contrast thus low gain),
the resolution of quantization will increase, thus fewer qunatization errors will be shown.
A succint summary on the benefits of PVQ can be found in the Section 2.4 of
+the resolution of quantization will increase, thus fewer quantization errors will be shown.
+A succinct summary on the benefits of PVQ can be found in the Section 2.4 of
.
Since PVQ has only been used in the Daala video codec, which contains many nontraditional
design elements, there has not been any chance to see the relative coding performance of
PVQ compared to scalar quantization in a more traditional codec design.
We have tried to apply PVQ in the AV1 video codec, which is currently being developed
+We have tried to apply PVQ in the AV1 video codec, which is currently being developed
by Alliance for Open Media (AOM) as an open source and royaltyfree video codec.
While most of benefits of using PVQ arise from the enhancement of subjective quality of video,
@@ 131,11 +131,11 @@ These results were achieved optimizing solely for PSNR.
Adopting PVQ in AV1 requires replacing both the scalar quantization step and
+Adopting PVQ in AV1 requires replacing both the scalar quantization step and
the coefficient coding of AV1 with those of PVQ.
In terms of inputs to PVQ and the usage of trasnforms
as shown in and
+In terms of inputs to PVQ and the usage of transforms
+as shown in and
,
the biggest conceptual changes required in a traditional coding system, such as AV1, are
@@ 145,8 +145,8 @@ both intrapredicted pixels and interpredicted (i.e., motioncompensated) pixel
This is because PVQ references the predictor in the transform domain,
instead of using a pixeldomain residual as in traditional scalar quantization.
Absence of a difference signal (i.e. residue) defined as "input source  predictor".
Hence AV1 with PVQ does not do any 'subtraction' in order for an input to reference the predictor.
Instead, PVQ takes a different approach to referencing the predictor
+Hence AV1 with PVQ does not do any 'subtraction' in order for an input to reference the predictor.
+Instead, PVQ takes a different approach to referencing the predictor
which happens in the transform domain.
@@ 166,7 +166,7 @@ predictor > ++ signal R ++
decoded <  Inverse  < Inverse  <  Scalar 
R  Transform   Quantizer    Quantizer 
++ ++  ++
 v
+ v
++
bitstream < Coefficient 
of coded T(R)  Coder 
@@ 179,8 +179,8 @@ decoded <  Inverse  < Inverse  <  Scalar 
 Transform T > T(X)>  PVQ 
+ ++ ++
+ input X> Transform T > T(X)>  PVQ 
_____________  Quantizer  ++
+> ++  PVQ 
++  >  Coefficient 
@@ 203,10 +203,10 @@ predictor> Transform T > T(P) v  Coder 
'format'. >
In AV1, a skip flag for a partition block is true if all of quauntized coefficients
+In AV1, a skip flag for a partition block is true if all of the quantized coefficients
in the partition are zeros.
The signaling for the prediction mode in a partition cannot be skipped.
If the skip flag is true with PVQ, the predicted pixels are the final decoded pixels
+If the skip flag is true with PVQ, the predicted pixels are the final decoded pixels
(plus frame wise inloop filtering such as deblocking) as in AV1 then a forward transform of a predictor
is not required.
@@ 228,25 +228,25 @@ The ac_dc_coded flag signals whether DC and/or whole AC coefficients are coded b
PVQ has its own ratedistortion optimization (RDO) that differs from
that of traditional scalar quantization.
This leads the balance of quality between luma and chroma to be different from
+This leads the balance of quality between luma and chroma to be different from
that of scalar quantization.
When scalar quantization of AV1 is done for a block of coefficients,
RDO, such as trellis coding, can be optionally performed.

+
The second pass of 2pass encoding in AV1 currently uses trellis coding.
When doing so it appears a different scaling factor is applied
for each of Y'CbCr channels.
In AV1, to optimize speed, there are inverse transforms that can skip
+In AV1, to optimize speed, there are inverse transforms that can skip
applying certain 1D basis functions based on the distribution of quantized coefficients.
However, this is mostly not possible with PVQ since the inverse transform is applied directly to
a dequantized input, instead of a dequantized difference (i.e. input source  predictor)
+a dequantized input, instead of a dequantized difference (i.e. input source  predictor)
as in traditional video codec. This is true for both encoder and decoder.
PVQ was originally designed for the 2D DCT,
while AV1 also uses a hybrid 2D transform consisting of
a 1D DCT and a 1D ADST. This requires PVQ to have new coefficient scanning orders
+while AV1 also uses a hybrid 2D transform consisting of
+a 1D DCT and a 1D ADST. This requires PVQ to have new coefficient scanning orders
for the two new 2D transforms, DCTADST and ADSTDCT
(ADSTADST uses the same scan order as for DCTDCT).
Those new scan orders have been produced based on that of AV1,
@@ 264,27 +264,27 @@ for each PVQdefinedband of new 2D transforms.
With the encoding options specified by both NETVC
+With the encoding options specified by both NETVC
() and
AOM testing for the high latency case,
PVQ gives similar coding efficiency to that of AV1, which is measured in PSNR BDrate.
Again, PVQ's activity masking is not turned on for this testing.
Also, scalar quantization has matured over decades,
+Also, scalar quantization has matured over decades,
while video coding with PVQ is much more recent.
We compare the coding efficiency for the IETF test sequence set
"objective1fast" defined in ,
which consists of sixteen of 1080p, seven of 720p, and seven of 640x360 sequences
of various types of content, including slow/high motion of people and objects,
+of various types of content, including slow/high motion of people and objects,
animation, computer games and screen casting.
The encoding is done for the first 30 frames of each sequence.
The encoding options used is :
+The encoding options used is :
"endusage=q cqlevel=x passes=2 good cpuused=0 autoaltref=2 laginframes=25 limit=30",
which is official test condition of IETF and AOM for high latency encoding except limiting 30 frames only.
For comparison reasons, some of the lambda values used in RDO are adjusted
to match the balance of luma and chroma quality of the PVQenabled AV1 to that of
current AV1.
+current AV1.
Use half the value of lambda during intra prediction for the chroma channels.
@@ 294,7 +294,7 @@ current AV1.
The results are shown in ,
+The results are shown in ,
which is the BDRate change for several image quality metrics.
(The encoders used to generate this result are available from the author's git repository
and
@@ 332,17 +332,17 @@ such as "passes=2 good cpuused=0 autoaltref=2 laginframes=25", ar
The significant increase in encoding time is due to
the increase of computation by the PVQ.
The PVQ tries to find asymptoticallyoptimal codepoints (in RD optimization sense)
on a hypershpere with a greedy search, which has time complexity close to O(n*n)
+on a hypersphere with a greedy search, which has time complexity close to O(n*n)
for n coefficients. Meanwhile, scalar quantization has time complexity of O(n).
Compared to Daala, the search space for a RDO decision in AV1 is
far larger because AV1 considers ten intra prediction modes
+far larger because AV1 considers ten intra prediction modes
and four different transforms (for the transform block sizes 4x4, 8x8, and 16x16 only),
and the transform block size can be smaller than the prediction block size.
Since the largest transform and the prediction sizes are currently 32x32 and 64x64 in AV1,
PVQ can be called

+and the transform block size can be smaller than the prediction block size.
+Since the largest transform and the prediction sizes are currently 32x32 and 64x64 in AV1,
+PVQ can be called
+
approximately 5,160 times more in AV1 than in Daala.
Also, AV1 applies transform and quantization for each candidate of RDO.
@@ 353,23 +353,23 @@ which corresponds to actual quantizer used for quantization being 38).
So, PVQ was called 165 times more in AV1 than Daala.
shows the frequency of function calls to
PVQ and scaler quantizers in AV1 at each speed level (where AV1 encoding mode is 'good')
+PVQ and scalar quantizers in AV1 at each speed level (where AV1 encoding mode is 'good')
for the same sequence and the QP as used in the above example.
The first column indicates speed level,
the second column shows the number of calls to PVQ's search inside each band
(function pvq_search_rdo_double() in
+the second column shows the number of calls to PVQ's search inside each band
+(function pvq_search_rdo_double() in
),
the third column shows the number of calls to PVQ quantization of a transfrom block
(function od_pvq_encode() in
+the third column shows the number of calls to PVQ quantization of a transfrom block
+(function od_pvq_encode() in
),
and the fourth column shows the number of calls to AV1's block quantizer.
Smaller speed level gives slower encoding but better quality for the same rate
by doing more RDO optimizations.
The major difference from speed level 4 to 3 is enabling a use of the transform block
+The major difference from speed level 4 to 3 is enabling a use of the transform block
smaller than the prediction (i.e. partition) block.
@@ 412,7 +412,7 @@ smaller than the prediction (i.e. partition) block.
Possible future work includes:
Enable activity masking, which also needs a HVStuned quantiztion matrix (bandwise QP scalers).
+Enable activity masking, which also needs a HVStuned quantization matrix (bandwise QP scalars).
Adjust the balance between luma and chroma qualities, tuning for subjective quality.
Optimize the speed of the PVQ code, adding SIMD.
RDO with more modeldriven decision making, instead of full transform + quantization.
@@ 423,17 +423,17 @@ smaller than the prediction (i.e. partition) block.
The ongoing work of integrating PVQ into AV1 video codec is located at
+The ongoing work of integrating PVQ into AV1 video codec is located at
the git repository .
Thanks to Tim Terriberry for his proof reading and valuable comments.
Also thanks to Guillaume Martres for his contibutions to intergrating PVQ into AV1
during his intership at Mozilla and Thomas Daede for providing and maintaining
the testing infrastructure by way of the www.arewecompressedyet.com (AWCY) web site
+Thanks to Tim Terriberry for his proofreading and valuable comments.
+Also thanks to Guillaume Martres for his contributions to integrating PVQ into AV1
+during his internship at Mozilla and Thomas Daede for providing and maintaining
+the testing infrastructure by way of the www.arewecompressedyet.com (AWCY) website.
.

2.10.2