Multidimensional binary search trees [Theory]

Prev: Free software JFLAP alternative?
Next: Dynamic cycle Detection

From: kingpin on 17 Apr 2007 05:29

Hi,

i've a set S of points in R^n (a vector of n real numbers, where n, in
pratical applications, is between 10 and 60). The number of points is
in the order of 10^5, 10^6. I would like organize the points in an
efficient search tree. The most important operation that i would like
to do is finding the nearest neighbour point (that is, give a point
P=(a1,...,ak), tell me which is the first m nearest points in the set
S to P). P does not necessarely belong to S.

For n=2,3 i know that there are quadtrees and octrees. For k>3 i've
read about kd-trees, but i'm not sure if k represents the
dimensionality of the points (that is, k = n) or the number of points
per node...

If kd-trees are suitable for a multidimensional problem, are they
efficient? Are there any other data structures that i can use to solve
the problem?
Thank you.

From: Babua on 17 Apr 2007 07:09

On Apr 17, 2:29 pm, king...(a)freemail.it wrote:
> Hi,
>
> i've a set S of points in R^n (a vector of n real numbers, where n, in
> pratical applications, is between 10 and 60). The number of points is
> in the order of 10^5, 10^6. I would like organize the points in an
> efficient search tree. The most important operation that i would like

You have to use mth-order Voronoi diagram for the set S. There
are different ways to compute this diagram either by lifting or
by superimposing lower order Voronoi diagrams. Also you
also have to preprocess that structure for point location query
to determine the m-nearest points of the query point P.

Thanks.

--- Pinaki

===============================================

> to do is finding the nearest neighbour point (that is, give a point
> P=(a1,...,ak), tell me which is the first m nearest points in the set
> S to P). P does not necessarely belong to S.
>
> For n=2,3 i know that there are quadtrees and octrees. For k>3 i've
> read about kd-trees, but i'm not sure if k represents the
> dimensionality of the points (that is, k = n) or the number of points
> per node...
>
> If kd-trees are suitable for a multidimensional problem, are they
> efficient? Are there any other data structures that i can use to solve
> the problem?
> Thank you.

From: kingpin on 17 Apr 2007 09:20

On 17 Apr, 13:09, Babua <pin...(a)iitg.ernet.in> wrote:
> On Apr 17, 2:29 pm, king...(a)freemail.it wrote:
>
> > Hi,
>
> > i've a set S of points in R^n (a vector of n real numbers, where n, in
> > pratical applications, is between 10 and 60). The number of points is
> > in the order of 10^5, 10^6. I would like organize the points in an
> > efficient search tree. The most important operation that i would like
>
> You have to use mth-order Voronoi diagram for the set S. There
> are different ways to compute this diagram either by lifting or
> by superimposing lower order Voronoi diagrams. Also you
> also have to preprocess that structure for point location query
> to determine the m-nearest points of the query point P.
>
> Thanks.
>
> --- Pinaki
>
> ===============================================
>
> > to do is finding the nearest neighbour point (that is, give a point
> > P=(a1,...,ak), tell me which is the first m nearest points in the set
> > S to P). P does not necessarely belong to S.
>
> > For n=2,3 i know that there are quadtrees and octrees. For k>3 i've
> > read about kd-trees, but i'm not sure if k represents the
> > dimensionality of the points (that is, k = n) or the number of points
> > per node...
>
> > If kd-trees are suitable for a multidimensional problem, are they
> > efficient? Are there any other data structures that i can use to solve
> > the problem?
> > Thank you.

Can you be more specific and/or point out some references?

From: Torben =?iso-8859-1?Q?=C6gidius?= Mogensen on 17 Apr 2007 10:55

kingpin(a)freemail.it writes:

> Hi,
>
> i've a set S of points in R^n (a vector of n real numbers, where n, in
> pratical applications, is between 10 and 60). The number of points is
> in the order of 10^5, 10^6. I would like organize the points in an
> efficient search tree. The most important operation that i would like
> to do is finding the nearest neighbour point (that is, give a point
> P=(a1,...,ak), tell me which is the first m nearest points in the set
> S to P). P does not necessarely belong to S.
>
> For n=2,3 i know that there are quadtrees and octrees.

Note that quadtrees and octrees are not binary trees but, as their
name indicates, have branching factors of 4 and 8. Generalisations to
N dimensions would have branching factors of 2^N, which make code
using them more cumbersome and some code less efficient than if a
binary tree is used.

You can make binary variants of these even for N dimensions: Each node
has two subtrees that each represent half the space.

For N=2, you would enclose the original space in an 1 x 2^(1/2)
rectangle. Each cut splits this down the middle of the longest
dimension, creating two rectangles with similar relative proportions
bu oriented differently. To get around this, you use a coordinate
transform in the recursive calls, so the longest dimension is always
X.

For N=3, you have a 1 x 2^(2/3) x 2^(1/3) box that you in each step
split along its longest dimension to create two boxes with similar
relative dimensions but different orientation, so you also here do a
coordiante transform to make X the longest dimension and Y the
second-longest.

For general N, you have an N-dimensional hyperbox with dimensions 1 x
2^((N-1)/N) x 2^((N-2)/N) x ... x2^(1/N) and do the same.

I find that you often get much shorter code using such trees than
using traditional quadtrees and octrees. The downside is that you
don't get integral bounds, so they don't work well for subdividing a
fixed grid (such as a bitmap). For for points in a space with
non-integral coordinates, they work fine.

Torben

From: Paul E. Black on 17 Apr 2007 12:08

On Tuesday 17 April 2007 05:29, kingpin(a)freemail.it wrote:
> i've a set S of points in R^n (a vector of n real numbers, where n, in
> pratical applications, is between 10 and 60). ...
> The most important operation that i would like
> to do is finding the nearest neighbour point ...
>
> If kd-trees are suitable for a multidimensional problem, are they
> efficient? Are there any other data structures that i can use to solve
> the problem?

I suggest Volker Gaede and Oliver G�nther, Multidimensional Access
Methods, ACM Computing Surveys, 30(2):170-231, June 1998.

The article catalogs dozens of data structures and notes which are
good at finding the nearest neighbor point.

-paul-
--
Paul E. Black (p.black(a)acm.org)

| Next | Last
Pages: 1 2
Prev: Free software JFLAP alternative?
Next: Dynamic cycle Detection