|
Prev: Run Time Stack Usage
Next: Separation of speech from multiple speakers speaking in different microphones connected to a mixer
From: markt on 6 May 2008 17:32 >I have no clue what you are waiting for, but just stand there until it >appears to you. Your apology. Name one thing I said that is incorrect, btw...? And yes, you did imply that said the problem was solved in Hyvarinen's book. >Goodbye for good on this topic. Thank you. Btw all, enter "cocktail party problem" into wikipedia and you'll get redirected to the "Source Separation" page, which not surprisingly refers to both principal component analysis (PCA) as well as ICA as two methods for resolution to this issue. Source separation is a big deal, and means abound for approaching the issue. The wiki page also refers you to the Hyvarinen tutorial, which is also not a surprise since Nokia is sponsoring a lot of the research in Helsinki. The biggest problem with PCA is that it is based on correlations (variances), which offer no insight into independence. ICA allows one to exploit independence using higher-order statistics. Maximizing independence maximizes separation. If you have a vector space defined by a set of independent vectors, and your observations are a linear combination (scaling is allowed, btw) of these vectors, then there is one and only one solution that results in independent outputs. Not so with uncorrelated source vectors since it is very easy to define linear transformations of those vectors that are also uncorrelated, i.e., your solution may not be unique. Is ICA the end-all, be-all? No, and I said so in my second post. Is it worth investigating, particularly with this problem? Yes, and it even works, in spite of complaints to the contrary in this thread. Mark
From: markt on 6 May 2008 15:35 >Can someone give me an answer please? >How many voices or instruments that are singing or playing at one same >moment can some ordinary person distinguish? Assume that person just >listens the recorded mono file and says here are N persons in a choir >or here are M different instruments playing. A person is, say, an >ordinary musician, not Mozart. Also let us assume each voice or each >instrument may play an individual note. >Thanks in advance. >Vladimir. > Sorry, I answered the other part that was "are there any algorithms?" I don't know how many people an individual person can separate. An algorithm, on the other hand, is a different subject. With people, having two ears in a room helps with spatial separation, but once the signal is recorded, i.e., a mono track, spatiality is lost other than perhaps an indication of loudness (variance). Humans have the largest neural net known to exist, btw, and the ICA methods (note the plural) essentially represent neural network processors. They also suffer (often) from scale ambiguity so variance differences between speakers or instruments might not be of help. Frequency (pitch) and formants are the two biggest separators that I can think of in terms of human speech. Mark Mark
From: markt on 7 May 2008 11:22 >Sorry..I am a rookie here... I want to know what is the difference >between > > 1. signals from two ppl from two different microphones(not present >in the same scene) mixed linearly > > 2. two ppl's speech(spoken simultanously) recorded on same >microphone .(does this resemble cocktail party!). > > I dont see any difference between these two problems. If i am wrong >then what is the difference? > >regards >Rajesh > Are you saying that the recordings are mixed after being spoken into the microphone in the first? The "cocktail party problem" is simply a mixture of signals, though typically you assume both speakers are in the same scene when using the room analogy. In general, the biggest difference between these two setups, assuming the answer to my first question is a "yes," is that each speaker,s recording will have independent noise/distortion. In the latter, both speakers are going into the same receiver, and hence both are affected by the same noise/distortion. Mark
From: markt on 7 May 2008 11:42 >This basically disqualifies ICA as a tool for the cocktail party >problem - and yes, that *is* mentioned as one application >for ICA in that article. If you need at least as many mics as >there are people, then you can just as well crowd the hall with >mics and track any conversation on the closest mic. No need >for ICA. This is a common misconception/misinterpretation of what the statement means. If you read a bit deeper into the ICA methods, once you have ordered samples, i.e., a time series, the "more observations than sources" requirement can be relaxed by taking a vector of samples and treating each element of the vector as a separate "observation." You can also relax independence assumptions as well. Recall that radio receivers are typically (though not always) a single element, yet multipath (multiple sources) can be resolved quite well via ICA (or similar) methods. The MOE detector as well as the decorrelating detector, which work quite well for resolving multipath, are both equivalent to a negentropy-based cost function derivation with regularization. >C) "Most ICA methods are not able to extract the actual number of > source signals, the order of the source signals, nor the signs > or the scales of the sources." > >Contemplate that one for a bit, particularly in view of >quote A) at the top: "Most" ICA methods can't do the job >ICA is supposed to do! Mind you, in line with the great >academic tradition, the word 'most' is here used in one >of two functions: The first part of this is a bit nebulous since there is a point at which some sources are simply lost in the background. You have to have some a priori guess with any blind detection method as to how large your spread is, e.g., you sort of need to know how large your delay spread in a comm channel is before deciding how many multipaths there actually are. >1) No working algorithm has yet been found, but the author > is wise enough not to use the term 'all' as in 'all ICA > methods fail.' > >2) Those ICA methods which don't fail, work in highly > idealized situations. Heavy research is still being done in a lot of these areas, so statements made in online tutorials may already be out of date. >The quotes A)-C) above are enough to let the experienced >data analyst understand that ICA for all intents and >purposes is a university funding generator rater than an >operational analysis tool. Perhaps. >As somebody already said in this thread, the skill to spot >idealized test conditions is a valuable one, when looking >for usable (or even working) methods. > >Rune I agree that most methods that are developed are primarily useful in ideal situations, but that goes for many other algorithms as well. ICA is merely a path for research. Algorithms exist that will do what is required, the question is whether they are practically useful. I've not commented on the quality of the result, just that it exists and can work. The work I've done performs outstandingly well for low fade-rate applications, and degrades as fading increases (still a rather large increase over a simple RAKE even with perfect channel estimation, however). This is done on multiple, asynchronous users with arbitrary phase and amplitude as well as overall channel impulse response (multipath conditions). I'm not fully blind as I actually regularize using known code sequences, but I referenced many fully blind applications that have - supposedly - similar performance as I found. The fully blind methods I looked at often use the method of Lagrange multipliers with a dual-optimization on the channel estimate as well as the weight update. I already had a channel estimate so I didn't dig to deep in the fully blind methods. They are harder to balance either way. Mark
From: markt on 7 May 2008 11:54
>hi, > I have used PCA for face recognition and the performance wasnt >that bad . > > Is this same as ICA that you are talking about? > PCA is principal component analysis and it is based on correlations, i.e., second-order statistics. PCA only requires uncorrelated sources, rather than independent sources. It is a cousin of ICA and often used to pre-whiten data prior to the ICA estimation method. The terms PCA and ICA often are used as if they each represent a singular method for resolving a problem when in fact, they really represent a class of algorithms. Singular value decomposition can be used for PCA, for example. The Hyvarinen book uses SVD to pre-whiten the received data vector in the DS-CDMA detection chapter. Mark |