From: Gene on
On May 6, 6:36 am, arnuld <sunr...(a)see.sigs.invalid> wrote:
> PROBLEM:
>
>        I want to read words from standard input and then sort and print
>        the   words in decreasing order of appearance along with the count
>        of how many times each word occurred. The language I am using C. I
>        have come up with following design idea:
>
> 1) Create a singly-linked list and store each word and the count in that
> structure. when the user enters a word which is already seen before, then
> we will just increase the count. That way we will have a list of unique
> words.
>
> 2.) Create an array of pointers to the elements of linked list
> (which are structure actually). Sort the pointers comparing the count of
> each word and print the elements pointed by these pointers. That way
> program will be highly efficient as we will not be sorting any structures,
> we will be sorting pointers only :) . (originally K&R2's idea but I love
> this idea)
>
> I want this program to be fast and consume less memory. I have 2
> questions:
>
>   i.) a double-linked list will be better than this ?
>  ii.) any other data structure that I can use here ?  
>

It's a pretty common mistake for beginning programmers to reach for
linked lists because it's usually hard work to understand them. When
you have a hammer in your hand, everything looks like a nail.

But for this problem, you need to do lookups quickly. Lists are poor
for quick lookups. Rather you want a dictionary data structure that
stores [word, count] pairs.

In pseudocode:

set dictionary D empty
for each word W in the input
look up [W, Count] in D
if lookup succeeds, increment Count
else /*lookup failed*/ insert [W,1] in D
end for
Print the pairs in the dictionary in sorted order

For the dictionary, there are a couple of implementation choices. A
balanced tree like AVL or Red-Black trees is a nice choice because you
can print in sorted order merely by traversing in order. No sorting
needed. You might look at libavl. I've never tried it, but it looks
like an extremely high quality C library for balanced trees. Of
course if you are trying to learn, you may want to roll your own.

Another approach is a hash table, which makes the lookups and
insertions faster, but requires a separate sorting pass.

This is a perfect example of a problem that's best solved with a
programming language designed to solve it. Perl is such a language.
Here's a solution in perl.

sub main {
local $/ = undef;
my %counts;
foreach my $word (split(/\s+/, <>)) { $counts{$word}++ }
print map { "$_->[0] $_->[1]\n" }
sort { $b->[1] <=> $a->[1] }
map { [$_, $counts{$_}] } keys %counts;
}
main;
From: Gene on
On May 6, 6:36 am, arnuld <sunr...(a)see.sigs.invalid> wrote:
> PROBLEM:
>
>        I want to read words from standard input and then sort and print
>        the   words in decreasing order of appearance along with the count
>        of how many times each word occurred. The language I am using C. I
>        have come up with following design idea:
>
> 1) Create a singly-linked list and store each word and the count in that
> structure. when the user enters a word which is already seen before, then
> we will just increase the count. That way we will have a list of unique
> words.
>
> 2.) Create an array of pointers to the elements of linked list
> (which are structure actually). Sort the pointers comparing the count of
> each word and print the elements pointed by these pointers. That way
> program will be highly efficient as we will not be sorting any structures,
> we will be sorting pointers only :) . (originally K&R2's idea but I love
> this idea)
>
> I want this program to be fast and consume less memory. I have 2
> questions:
>
>   i.) a double-linked list will be better than this ?
>  ii.) any other data structure that I can use here ?  

Lists are not what you want here. Lookups will get slower in
proportion to the number of words. A balanced tree or hash table
implementing a dictionary of counts is a better choice. Look at
libavl for a niced balanced tree implementation. Just raverse in
order to print. If you use a hash table, you'll need to sort it in a
separate pass.

I know you said you're using C, but this is a great example of a
problem that's easier to solve in a language meant to solve it. Perl
is such a language:

sub main {
local $/ = undef;
my %counts;
foreach my $word (split(/\s+/, <>)) { $counts{$word}++; }
print map { "$_->[0] $_->[1]\n" }
sort { $b->[1] <=> $a->[1] }
map { [$_, $counts{$_}] } keys %counts;
}
main;
From: Gene on
On May 6, 10:19 pm, Gene <gene.ress...(a)gmail.com> wrote:
> On May 6, 6:36 am, arnuld <sunr...(a)see.sigs.invalid> wrote:
>
>
>
>
>
> > PROBLEM:
>
> >        I want to read words from standard input and then sort and print
> >        the   words in decreasing order of appearance along with the count
> >        of how many times each word occurred. The language I am using C. I
> >        have come up with following design idea:
>
> > 1) Create a singly-linked list and store each word and the count in that
> > structure. when the user enters a word which is already seen before, then
> > we will just increase the count. That way we will have a list of unique
> > words.
>
> > 2.) Create an array of pointers to the elements of linked list
> > (which are structure actually). Sort the pointers comparing the count of
> > each word and print the elements pointed by these pointers. That way
> > program will be highly efficient as we will not be sorting any structures,
> > we will be sorting pointers only :) . (originally K&R2's idea but I love
> > this idea)
>
> > I want this program to be fast and consume less memory. I have 2
> > questions:
>
> >   i.) a double-linked list will be better than this ?
> >  ii.) any other data structure that I can use here ?  
>
> Lists are not what you want here. Lookups will get slower in
> proportion to the number of words.  A balanced tree or hash table
> implementing a dictionary of counts is a better choice.  Look at
> libavl for a niced balanced tree implementation.  Just raverse in
> order to print.  If you use a hash table, you'll need to sort it in a
> separate pass.
>
> I know you said you're using C, but this is a great example of a
> problem that's easier to solve in a language meant to solve it.  Perl
> is such a language:
>
> sub main {
>   local $/ = undef;
>   my %counts;
>   foreach my $word (split(/\s+/, <>)) { $counts{$word}++; }
>   print map { "$_->[0] $_->[1]\n" }
>         sort { $b->[1] <=> $a->[1] }
>         map  { [$_, $counts{$_}] } keys %counts;}
>
> main;- Hide quoted text -

I'm sorry for the dual posts with similar content. The first one
seemed to go to the bit bucket.
Gene
From: arnuld on
> On Tue, 06 May 2008 15:36:27 +0500, arnuld wrote:

> PROBLEM:
>
> I want to read words from standard input and then sort and print
> the words in decreasing order of appearance along with the count
> of how many times each word occurred. The language I am using C. I
> have come up with following design idea:
>
> ... SNIP...


from last 2 hours I am searching comp.lang.c and comp.programming archives
on Google Groups and a few secods before I read your reply I came across
this reply from "Roger Willcocks" to someone who asked the question on
how to reverse the single-linked list:


My take on this is that if you find you need to search a linked
list from its tail, you've got your design wrong. The list
should be chained the other way, or it should be a doubly
linked list. Get the design right and the question just goes away.

One other point - scanning a linked list is inherently
inefficient; if speed is a factor some other data structure - a
(balanced) tree, or hash table for instance - would be
favourite.



original thread:

http://groups.google.com/group/comp.lang.c/browse_thread/thread/7995d740de12a5be/6a91e1d915e179aa?lnk=gst&q=reverse+linked+list#6a91e1d915e179aa



why I gave you this link ?

because banging my head with the function "add_to_snode" I came to know
that it was comparing the word from last node only with the present word
but not with the the words before last node and that is why program was
giving the count of 1 for every word, unless you you enter a word in
succession , say 3 times, but still then because of the array I had, it
will print the word 3 times with count 3 on every line. So I thought I
needed to search the linked list in reverse order in order to see if the
current word was already on the list or not and then I hit that thread.

It only happened when I took a piece of paper and ran the program with my
pen ;) . only then I came to know why the doubly-linked list example from
section 6.5 of K&R2 always returns "p" and never "p->next"

okay, I will use doubly-linked list. Is it fine ?



--
http://lispmachine.wordpress.com/
my email ID is @ the above blog.
just check the "About Myself" page :)

From: Ben Bacarisse on
arnuld <sunrise(a)see.sigs.invalid> writes:

> PROBLEM:
>
> I want to read words from standard input and then sort and print
> the words in decreasing order of appearance along with the count
> of how many times each word occurred. The language I am using C. I
> have come up with following design idea:
>
>
> 1) Create a singly-linked list and store each word and the count in that
> structure. when the user enters a word which is already seen before, then
> we will just increase the count. That way we will have a list of unique
> words.

Fine, but each access involves a linear scan of the list. If you are
learning C (not algorithms and data structures) this would be
perfectly OK. You still need to find a way to print it in order.

> 2.) Create an array of pointers to the elements of linked list
> (which are structure actually). Sort the pointers comparing the count of
> each word and print the elements pointed by these pointers. That way
> program will be highly efficient as we will not be sorting any structures,
> we will be sorting pointers only :) . (originally K&R2's idea but I love
> this idea)

There is no point in use both. You may use an array of strings and
then sort that before printing, or use a linked list and sort that.
You don't gain anything from using both! Once you have a linked list
of word counts, sorting that list *could* be done with an array, but
you could also use one of the sort algorithms that work well for lists
(quick sort and merge sort come to mind).

> I want this program to be fast and consume less memory. I have 2
> questions:
>
> i.) a double-linked list will be better than this ?

No, but if you think so, say why because it will help to understand
how you think it may help.

> ii.) any other data structure that I can use here ?

Binary search trees and hash tables are the most obvious choices. For
simplicity, you could try to keep a sorted array of string pointers.
This will speed up the accesses when a word is read and will give you
a simple way to print the words at the end.

--
Ben.
 |  Next  |  Last
Pages: 1 2 3
Prev: string splitting
Next: Application for web log analysis