From: csenges on
Hi,
i do similarity queries (kNN) in text mining using feature vectors as
text representation. I'm new to this topic and wonder if there are any
standard index structures which people use to speed up similarity/
distance queries.

I know kNN in high dimensions is a research topic and there are
sopisticated algorithmns and data structures, but i just wanna know
what people use today.

kd- and r-tree do not work in with this number of dimensions. Maybe
local sensitive hashing (LSH) with a proper hashing function is an
option?

thx for any hints,
chris