From: Ross on
Dear all,

Does anyone know of any research into the roles of phase and amplitude
of frequency domain representations of sound in terms of human
perception of timbres.

In images, you can take the Fourier transform of two images. You then
use the amplitude information from one image, and the phase
information from the other. The image that results from the inverse
Fourier transform of this mixed data looks pretty strange as you'd
expect, but you see more of the image the phase information came from
than the other, suggesting that in images phase information dominates
over amplitude information.

My wild guess is that for static audio timbres the opposite is true,
but I would very much like to check this out properly. Any ideas/
references/pointers?

I'm guessing that I'm probably asking in the wrong group, but don't
know where to ask. Any recommendations of other places to ask would be
greatly appreciated.
From: Rune Allnor on
On 23 Sep, 13:20, Ross <rossclem...(a)gmail.com> wrote:
> Dear all,
>
> Does anyone know of any research into the roles of phase and amplitude
> of frequency domain representations of sound in terms of human
> perception of timbres.

From http://en.wikipedia.org/wiki/Timbre:

" Timbre has been called ... "the psychoacoustician's
multidimensional wastebasket category for everything that
cannot be qualified as pitch or loudness." "

Rune
From: Richard Dobson on
Ross wrote:
> Dear all,
>
> Does anyone know of any research into the roles of phase and amplitude
> of frequency domain representations of sound in terms of human
> perception of timbres.
>
> In images, you can take the Fourier transform of two images. You then
> use the amplitude information from one image, and the phase
> information from the other. The image that results from the inverse
> Fourier transform of this mixed data looks pretty strange as you'd
> expect, but you see more of the image the phase information came from
> than the other, suggesting that in images phase information dominates
> over amplitude information.
>
> My wild guess is that for static audio timbres the opposite is true,
> but I would very much like to check this out properly. Any ideas/
> references/pointers?
>
> I'm guessing that I'm probably asking in the wrong group, but don't
> know where to ask. Any recommendations of other places to ask would be
> greatly appreciated.


You will find lots of interest in this question in the musicdsp list.
The significance of phase is pretty well canonical in audio processing,
with respect to all and any combinations of sounds. Processes such as
phasers and flangers combine wet and dry sounds to produce dynamic
cancellation effects. There is a pretty direct audio counterpart to your
image example in various techniques of cross-synthesis, hybridising and
morphing of sounds. The simplest example is phase-vocoder processing
where the bin amplitudes of one sound are combined with the frequency
values of another. Most of the famous "problems" of the phase vocoder
arise through the smearing of phase between bins. Phase relationships
(not least, the preservation of them) is also central to most
multi-channel production, in either preserving or modifying the "stereo
image". So in the general case audio applications seek either to
preserve phase relationships, or deliberately distort/modify them.

Human perception of timbre is a slightly different topic; it is
generally asserted that we are insensitive to (static) phase - you can
scramble the phases (while keeping amplitudes the same) of the partials
of, say, a square wave or sawtooth wave and the listener will not notice
(though needless to say there are those who claim they can distinguish
them). So in broad terms your guess is correct. The general principle is
that our ears are drawn to anything changing (which of course is what we
experience most of the time); addition/removal of partials, and changing
phase relationships. The challenge of the subject from a research point
of view is that our hearing tends to be "categorical" - given a
transformation (e.g. in morphing), our perception tends to lock on one
recognition until a certain point where it flips to another; somewhat
akin to the famous optical illusions where we flip from seeing a vase to
seeing a face, etc. In the audio case, this tends to apply even during a
nominally smooth transformation.

See (among other references) "Auditory Scene Analysis" by Albert S Bregman.

See also the work of Diana Deutsch (http://deutsch.ucsd.edu); especially
"The Psychology of Music", which discusses auditory illusions, among
many other things.

And: "Music, Cognition, and Computerized Sound", Perry Cook.

The main sound synthesis lists will also be sources of rich and informed
discussions, e.g. for PD, Csound, Max/Msp, Supercollider, etc.

Richard Dobson