Sorting in linear time ... [General Programming]

Prev: How is an algorithm non-implementable?
Next: usenet networking

From: mohangupta13 on 28 Apr 2010 06:29

Hello all ,

I am afraid that most of the times books on algorithms just explain
their operations and not their practical applicability much.
Now as i read we have largely 3 linear time sorting algorithms

Name Input number
range Space Time
1) counting sort 0-
k
0(k) O(k+n)
2)Radix Sort 0- (n^d
-1)
O(n) O(dn)
3)Bucket sort 0-k (uniform
distribution) O(n) O(n)

Now i wanted to know in practice which algorithm is used where ( any
example cases ?? ), as all of them largely look the same in terms of
space , time and input range trade off.

And adding to the confusion is that the world says "quicksort" in
practice is much faster because of small constant factors , cache
performance etc.

Please Guide,
Thanks
Mohan

From: mohangupta13 on 28 Apr 2010 06:38

On Apr 28, 3:29 pm, mohangupta13 <mohangupt...(a)gmail.com> wrote:
> Hello all ,
>
> I am afraid that most of the times books on algorithms just explain
> their operations and not their practical applicability much.
> Now as i read we have largely 3 linear time sorting algorithms
>
> Name Input number
> range Space Time
> 1) counting sort 0-
> k
> 0(k) O(k+n)
> 2)Radix Sort 0- (n^d
> -1)
> O(n) O(dn)
> 3)Bucket sort 0-k (uniform
> distribution) O(n) O(n)
>
> Now i wanted to know in practice which algorithm is used where ( any
> example cases ?? ), as all of them largely look the same in terms of
> space , time and input range trade off.
>
> And adding to the confusion is that the world says "quicksort" in
> practice is much faster because of small constant factors , cache
> performance etc.
>
> Please Guide,
> Thanks
> Mohan

Sorry the format of the table got altered when i posted the
message ..
Mohan

From: robertwessel2 on 28 Apr 2010 07:17

On Apr 28, 5:29 am, mohangupta13 <mohangupt...(a)gmail.com> wrote:
> Hello all ,
>
> I am afraid that most of the times books on algorithms just explain
> their operations and not their practical applicability much.
> Now as i read we have largely 3 linear time sorting algorithms
>
> Name Input number
> range Space Time
> 1) counting sort 0-
> k
> 0(k) O(k+n)
> 2)Radix Sort 0- (n^d
> -1)
> O(n) O(dn)
> 3)Bucket sort 0-k (uniform
> distribution) O(n) O(n)
>
> Now i wanted to know in practice which algorithm is used where ( any
> example cases ?? ), as all of them largely look the same in terms of
> space , time and input range trade off.
>
> And adding to the confusion is that the world says "quicksort" in
> practice is much faster because of small constant factors , cache
> performance etc.

The three sorts you listed work well only with significant constraints
on the input keys. For example, a two or three stage bucket or radix
sort might be a good way to sort by zip code (the US postal code - a
five digit number). But none of those will work well if the key is
large. For example, if you wanted to sort by the name of a person.
Without knowing that the range of values is severely restricted, you
cannot use a counting sort to sort a 32 bit key on a 32 bit machine,
since you will not have enough address space to store the counting
array.

The problem with almost all non-comparison based sorts is that they do
not work well as a general purpose algorithm. Consider doing a 256-
way bucket or radix sort on a 25 character name field (and
algorithmically those will be very similar). IOW, do a 256 way
distribution 25 times. You'll end up doing N*25 units of work. An
O(n*log(n)) comparison sort, like Quicksort or Merge sort, will do
less units of work unless there are at least 32 million input
records. If the sort key is 50 characters long, Quicksort will win up
to about 1E+15 records.

In addition, a non-binary radix sort, as well as bucket and counting
sorts usually require substantial amounts of extra storage. And if
you can afford that amount of extra storage, a simple merge sort will
give good performance in both the average and worst cases, no matter
the size, content and distribution of the keys. And Quicksort and
Heapsort both require only minimal amounts of extra storage.

Which is not to say that for a given sorting task one of the above
might not be ideal, it's just that they're not good choices *unless*
the input datasets match the specific properties of the algorithm.

From: Dann Corbit on 28 Apr 2010 15:33

In article <d497933c-f579-4a52-8df2-854e27a94e74
@g11g2000yqe.googlegroups.com>, robertwessel2(a)yahoo.com says...
>
> On Apr 28, 5:29�am, mohangupta13 <mohangupt...(a)gmail.com> wrote:
> > Hello all ,
> >
> > I am afraid that most of the times books on algorithms just explain
> > their operations and not their practical applicability much.
> > Now as i read we have largely 3 linear time sorting algorithms
> >
> > Name � � � � � � � � � � � � � � � � � Input number
> > range � � � � � � � � � � � � � � �Space � � � � � � � � � � � �Time
> > 1) counting sort � � � � � � � � � � � 0-
> > k
> > 0(k) � � � � � � � � � � � � � O(k+n)
> > 2)Radix Sort � � � � � � � � � � � � � �0- (n^d
> > -1)
> > O(n) � � � � � � � � � � � � �O(dn)
> > 3)Bucket sort � � � � � � � � � � � � �0-k (uniform
> > distribution) � � � � � � � � � � � O(n) � � � � � � � � � � � � �O(n)
> >
> > Now i wanted to know in practice which algorithm is used where ( any
> > example cases ?? ), as all of them largely look the same in terms of
> > space , time and input range trade off.
> >
> > And adding to the confusion is that the world says "quicksort" in
> > practice is much faster because of small constant factors , cache
> > performance etc.
To the O.P.:
For sorting literature, this is a very, very good place to perform
searches:
http://citeseerx.ist.psu.edu/

See, for example:
http://citeseerx.ist.psu.edu/search?q=title%3Asorting+AND+keyword%
3Acache&sort=cite

> The three sorts you listed work well only with significant constraints
> on the input keys. For example, a two or three stage bucket or radix
> sort might be a good way to sort by zip code (the US postal code - a
> five digit number). But none of those will work well if the key is
> large. For example, if you wanted to sort by the name of a person.
> Without knowing that the range of values is severely restricted, you
> cannot use a counting sort to sort a 32 bit key on a 32 bit machine,
> since you will not have enough address space to store the counting
> array.
>
> The problem with almost all non-comparison based sorts is that they do
> not work well as a general purpose algorithm. Consider doing a 256-
> way bucket or radix sort on a 25 character name field (and
> algorithmically those will be very similar). IOW, do a 256 way
> distribution 25 times. You'll end up doing N*25 units of work. An
> O(n*log(n)) comparison sort, like Quicksort or Merge sort, will do
> less units of work unless there are at least 32 million input
> records. If the sort key is 50 characters long, Quicksort will win up
> to about 1E+15 records.

A solution to the general purpose Radix sort algorithm is to pass a
"bucket" function instead of a comparison function for comparison based
sorts. The bucket function returns the radix bucket[i] requested from
the data. So, for 8-bit unsigned char, it would return key[i] and for
unsigned long it would be key >> (i*8) & (255) {assuming radix chosen of
256}.

Often, even with long data (e.g. the 50 char example) the data field is
not often fully populated. So something like MSD radix sort can do
quite well. If the keys are very long and also extremely similar, the
comparison based sort will also have to examine a great volume of key
data to differentiate.

It is also possible to use Radix sort ideas as a top level sort (or a
sort on the first field of a multi-key index) and then pass the sort off
to another O(N*log*N)) algorithm for smaller subsets.
[snip]

From: robin on 30 Apr 2010 20:48

| Next | Last
Pages: 1 2
Prev: How is an algorithm non-implementable?
Next: usenet networking