|
Prev: string splitting
Next: Application for web log analysis
From: [Jongware] on 7 May 2008 06:37 arnuld wrote: >> On Tue, 06 May 2008 15:32:58 +0200, [Jongware] wrote: > >> I wrote exactly that same program, using only a binary tree. Printing >> the words was done with a recursive routine -- which is also a pretty >> simple way :-) >> Despite heavy use of generalized C functions such as strdup, strcmp, >> etc., its performance was astounding; a matter of seconds, and most of >> this time was input/output. My test file was an ASCII version of the >> King James Bible, so now I know how often people used to say "thee" in >> those times. > > > I have written the a program that is *nearly* like it before I even posted > my problem but that program stores and prints the words alphabetically and > it uses a binary tree for that. > > But now I have to sort and print the input by the number of occurrences in > the input which is not known in advance ( unlike alphabetical sorting > where you already know p < s ) > > any ideas that you can give me from your code ? Ah, no, because that requirement wasn't there. I guess using the binary tree for storing and counting is still the best way to go (never mind the 9-line perl solutions ...). That's because your *primary* requirement is reading and counting words /fast/. You need the huge speed advantage of an alphabetic sorting binary tree for this -- you tack on the count number for, well, I think for /free/! (That's because for every word /w/ you already have to walk down the tree to see if it's there. If it is, increase w->count; if it isn't, insert it in the tree and initialize w->count to 1.) So that's the first half. The second part, sorting on the count, has nothing in common with this binary tree ... so ditch it! If you keep a running count of how many words you have in your tree, you can create an empty array and move your data into this, using a recursive function. Then re-sort, using the count as primary sort key. [Jw]
From: Ben Bacarisse on 7 May 2008 09:30 "[Jongware]" <sorry(a)no.spam.net> writes: > arnuld wrote: >> I have written the a program that is *nearly* like it before I even posted >> my problem but that program stores and prints the words alphabetically and >> it uses a binary tree for that. >> >> But now I have to sort and print the input by the number of >> occurrences in the input which is not known in advance ( unlike >> alphabetical sorting >> where you already know p < s ) >> >> any ideas that you can give me from your code ? > > Ah, no, because that requirement wasn't there. <snip> > So that's the first half. The second part, sorting on the count, has > nothing in common with this binary tree ... so ditch it! If you keep a > running count of how many words you have in your tree, you can create > an empty array and move your data into this, using a recursive > function. Then re-sort, using the count as primary sort key. Another way is to re-use the binary tree code. After reading and counting the words, make a new tree (sharing the strings of course) ordered by word count rather than by word. It is interesting to see how much code you can share between the two parts of the program. I have just done this as an exercise and I am happy to post the code if this is not someones coursework. -- Ben.
From: [Jongware] on 7 May 2008 09:58 Ben Bacarisse wrote: > "[Jongware]" <sorry(a)no.spam.net> writes: > >> arnuld wrote: >>> I have written the a program that is *nearly* like it before I even posted >>> my problem but that program stores and prints the words alphabetically and >>> it uses a binary tree for that. >>> >>> But now I have to sort and print the input by the number of >>> occurrences in the input which is not known in advance ( unlike >>> alphabetical sorting >>> where you already know p < s ) >>> >>> any ideas that you can give me from your code ? >> Ah, no, because that requirement wasn't there. > <snip> >> So that's the first half. The second part, sorting on the count, has >> nothing in common with this binary tree ... so ditch it! If you keep a >> running count of how many words you have in your tree, you can create >> an empty array and move your data into this, using a recursive >> function. Then re-sort, using the count as primary sort key. > > Another way is to re-use the binary tree code. After reading and > counting the words, make a new tree (sharing the strings of course) > ordered by word count rather than by word. It is interesting to see > how much code you can share between the two parts of the program. Ahh yes -- you might even be able to create /pointers/ to the relevant functions, and use these twice in your main loop. Maybe not optimal (then again, maybe it is), but still a nice exercise. > I > have just done this as an exercise and I am happy to post the code if > this is not someones coursework. <g> It's the mental state -- doing stuff by way of flexing your brain, rather than because it's homework -- that confuses the homework doers. You might receive a number of mails "can i c ur codes plz". [Jw]
From: Ben Pfaff on 7 May 2008 10:42
arnuld <sunrise(a)see.sigs.invalid> writes: > I really don't understand the difference between a Hash-Table and a Binary > Tree because I never came across them. I want to understand and implement > them but when I tried to read it using "Introduction to Algorithms - > Corment et al. 2/e", I see he is using some weired words like "satellite > data" About the one thing in common between hash tables and binary trees is that they both can be used for data storage and retrieval. I'd suggest typing those terms into a friendly search engine and trying to understand them again. -- Ben Pfaff http://benpfaff.org |