From: sevana on
Modern standard methods for evaluating quality of transmitted speech

Voice quality is one of the main characteristics of speech transmission
systems. When analyzing voice quality one must not only consider audio
signal degradation caused by transmission over telecom channels, but also
specifics of speaker's voice, conditions of listener's hearing and
variation of these parameters in time.

The most known methods for quality evaluation of voice transmission systems
were developed by Telecommunication Standardization Sector of International
Telecommunications Union (ITU-T) in the middle of 90-s. Results of this
work are presented in Recommendation P.800 (P.830) «Methods for subjective
determination of transmission quality» [1, 2]. This document describes
conditions for voice quality testing, audio contents, scoring and methods
to evaluate results. Typically “Methods for subjective determination of
transmission quality” are used to obtain mean subjective quality score
according to five-digit scale (Mean Opinion Score - MOS).

Unfortunately P.800 recommendation tests may lead to ambiguous results.
Recommendation is warning about comparing MOS scores received under
different conditions and consider such approach incorrect. Besides that
preforming tests according to P.800 takes a lot of time and requires a lot
of testers involved in the process.

In order to move from subjective (MOS) scores to objective ones and to
automate the quality measurement, ITU-T has developed the P.861
recommendation, which is based on low level quantitative measurements [3].
Recommendation P.861 is a follow-up of PSQM method (Perceptual Speech
Quality Measurement), developed by KPN Research and devoted to objective
analysis of speech codecs performance with a low level of degradation.

However, it is impossible to utilize PSQM for evaluation of work of a real
communication system because the method does not consider all the important
factors influencing human perception. Among these factors are delay,
jitter, packet loss as well as signal level clipping.

In February 2001 ITU-T has issued another recommendation ITU-T P.862 [4],
which describes a more advanced algorithm for voice quality testing –
PESQ (Perceptual Evaluation of Speech Quality). The algorithm includes
level and time aligning, human perception and cognitive modeling. Due to
these additional operations the approach considers signal amplification/
attenuation in a communication system, time delays and jitter as well as
spectrum bands, which are the most significant for human perception. Based
on cognitive modeling PESQ also recalculates objective quality score into
MOS values.

A disadvantage of PESQ as well as other similar algorithm is the fact that
they are based on comparing of two signals: original and transmitted
through a communication system. This approach may create a range of
difficulties connected with setting and preforming voice quality testing.
One requires to arrange signal recording on both sides of the
telecommunication system as well as records transmission to the test
system. Besides this real time quality monitoring in such approach appears
quite difficult as well.

In order to solve the challenging issues mentioned above ITU-T has
developed a new recommendation P.563 [5] introduced in May 2004. This
recommendation determines algorithm for evaluating speech quality by
listening to communication sessions. The algorithm takes into account
single-side distortions, speech trunk parameters, noise and speech
naturalness. Developers of P.563 call attention that P.563 does not provide
overall quality estimation of speech transmission. Distortions driven by
delays, echo, loss of loudness and everything related to two-sided
interaction cannot be taken into consideration by this method.

It's widely thought that P.563 provides a high level of correlation between
automated and expert quality scores. However, simple tests based on ITU-T
sound database for codec testing [6] may raise some doubts about the
consistence of the algorithm provided together with its description.

Read more at:

Download technology presentation at: