From: Rich P on
Actually, I have two questions.

I wrote a program which displays images in a slideshow type manner. The
image files are .jpg and .bmp and .gif images. They are stored in a
variety of subfolders under a parent folder. There are about 14,000
image files in these subfolders. I retrieve each image file as follows:

List<string> myList = new List<string>();
foreach (string str1 in My.Computer.FileSystem.GetFiles("C:\\1A",
FileIO.SearchOption.SearchAllSubDirectories,
"*.jpg", "*.bmp", "*.gif"))
{
myList.Add(str1);
}

Currently, I read all the files into this list object and then display
each image for 1 second and then display the next image, ... in a loop
(it is basically a search for something by seeing it program) . The
image files are named in the following manner:

aaa1.jpg
aaa2.jpg
aaa3.jpg
...
aaa100.jpg

abcbbb1.jpg
abcbbb2.jpg
...
abcbbb77.jpg

cccttttt1.jpg
cccttttt2.jpg
...
cccttttt142.jpg

ddd1.bmp
ddd2.jpg
ddd3.gif
ddd4.jpg
...
ddd95.jpg
...

and the subfolders are named fldA, fldB, fldC, ... fldZ where image
files that begin with "a" will be stored in fldA, image files that begin
with "b" will be stored in fldB, ...

Basically, I have groups of image files where a group has the same
beginning text in the filename (alpha chars) and then followed by a
numeric char (incremented as aaa1, aaa2, aaa3, ...aaa100). So group aaa
may have 100 image files that begin with "aaa" before encountering a
numeric char, group "abcbbb" may have 77 files, ...

What I want to do is this: when a group of images begins displaying -
group "aaa" for example - I want to display the count of files in that
group while that group of images is being displayed. I would have a
label reading "Count of 'aaa' is 100". Then when the next group of
images is displayed the label would change to "Count of 'abcbbb' is 77"
and so on.

I could like pick the max count for a given group or I could do a "Group
By" type query on the current group of images being displayed. Then -
for each group I would have to search the filename for the point at
which the char becomes numeric and then find the max number value or do
the "Group By" thing based on the alpha portion of the filenames.

in pseudocode I would have something like this:

class myGroup
{
//alpha part of filename in the group
//count of files in this group
string GroupName;
int GroupCount;
}


string s1, s2;
int Lcount = 0;
//store just the group name in another list object
List<myGroup> myGroups = new List<myGroup>();
myGroups = LinQ magic to get group - parsing out the number part of the
group filenames from the 14,000 files in myList -- which may be only 200
individual groups of image files

for (int i = 0; i < myGroups.Count, i++)
{
//now get the list of filenames for this group
//more LinQ magic to get just the "aaa's" then the "abcbbb's", then
the "bbb's", ...

List<string> newFileList = new List<string>();
//get files from myList where the alpha portion of the filenames
matches the current myGroups.GroupName

LabelCount.Text = myGroups.GroupCount.ToString() + " files in " +
myGroups.GroupName.ToString();

for (int j = 0; j < newfileList; j++)
{
//display image
}
}


Question 1: Could LinQ do this? If yes - may I ask for an example how?

Question 2: would it be more efficient to read the subfolders
individually? Where I would just loop through each subfolder.

Like subfolder fldA may store 1000 image files, fldM may have 3000
files, fldQ may have only 50 image files. Right now I am just reading
everything into memory - all 14,000 filenames. would there be any
performance/efficiency difference between reading everything in one
chunk or reading the subfolders individually?



Rich

*** Sent via Developersdex http://www.developersdex.com ***
From: Peter Duniho on
Rich P wrote:
> Actually, I have two questions.
>
> I wrote a program which displays images in a slideshow type manner. The
> image files are .jpg and .bmp and .gif images. They are stored in a
> variety of subfolders under a parent folder. There are about 14,000
> image files in these subfolders. I retrieve each image file as follows:
>
> List<string> myList = new List<string>();
> foreach (string str1 in My.Computer.FileSystem.GetFiles("C:\\1A",
> FileIO.SearchOption.SearchAllSubDirectories,
> "*.jpg", "*.bmp", "*.gif"))
> {
> myList.Add(str1);
> }

You should rid your code of any VB references. There's really no need
for them in C#, and doing things "the VB way" in C# will only slow you
down in the long run.

Use the System.IO.Directory class, and its GetFiles() method in
particular, to obtain a list of files found at a path.

Also note that the List<T> class has an AddRange() method. It is much
more efficient, especially when adding a large number of items, to use
that method instead of adding items individually.

For the moment, I'll take for granted that storing in memory the names
of 14,000 files all at once makes sense. But that seems potentially
inefficient as well. :)

So, as for the questions:

> [...]
> Question 1: Could LinQ do this? If yes - may I ask for an example how?

LINQ certainly can group data. One question is, is there a particular
order you need the groups to be presented in? And can you confirm that
you do in fact want to display a given group of pictures together? Or
is it simply that you want the count of pictures in a given group to be
displayed with a given picture from that group?

Assuming you have an enumeration of all the files, you can group them
like this�

char[] _rgchDigits = { '0', '1', '2', '3', '4', '5', '6', '7', '8',
'9' };

var grouped = from filename in myList
group filename by Path
.GetFilename(filename)
.Substring(0, filename.IndexOfAny(_rgchDigits));

> Question 2: would it be more efficient to read the subfolders
> individually? Where I would just loop through each subfolder.

The most efficient thing would be for each group of files to be in their
own folder. It's not feasible to try to retrieve individual groups from
a single folder; to enumerate the files individually by group, you'd
have to generate the group names and filter a file enumeration by that.
You might as well in that case just get all the files for a folder and
then group them.

That said, certainly working on one folder at a time rather than trying
to manage everything all at once could be more _memory_ efficient, if
not performance efficient. User perception of performance could be
better, simply because your program isn't trying to do so much all at
once (the big performance hit being all the i/o involved in retrieving
14,000 file names from the directory structure all at once).

Hope that helps.

Pete
From: Rich P on
Thank you for your reply. And "Enumeration" was the word I believe I
was looking for to describer how I have these image files organized.

When I read the files - they are alphabetic. I read all the A's first,
then the B's, C's, ...Q's, W's, X's, Z's.

Confession(bless me almightly one for mixing VB with C# :) I have been
doing VB/VB.Net for several years and have been migrating to C# for the
last couple of years. So I don't have all the C# stuff down yet.
Question:

My.Computer.FileSystem.GetFiles("C:\\1A",
FileIO.SearchOption.SearchAllSubDirectories...

this will search all the subdirectories. How do I search all
subdirectories with System.IO.Directory class - GetFiles() ? Before
My.Computer... I used to have recursive routine that would read each
subfolder\subfolder... using Windows API's. It was pretty fast but way
more lines of code than My.Computer...


Question2:

>
char[] _rgchDigits = { '0', '1', '2', '3', '4', '5', '6', '7', '8',
'9' };

var grouped = from filename in myList
group filename by Path
.GetFilename(filename)
.Substring(0, filename.IndexOfAny(_rgchDigits));
<

lets say I set up a test scenario where I have a list of test text files
in C:\1A\A, C:\1A\B, C:\1A\C

in subfolder A I have the following test text files (no content)

testA1.txt
testA2.txt
testA3.txt
testAB1.txt
testAB2.txt
testAB3.txt
testAB4.txt

then in subfolder B I have

testB1.txt
testB2.txt
testB3.txt
testBC1.txt
testBC2.txt
testBC3.txt
testBC4.txt

and the same in subfolder C for the C's.

I want to read all of these text file names into a list and group them
by testA, testAB, testB, testBC, testC, testCD, and get a count of each
group where group testA is count = 3, testAB is count = 4, testB count =
3, testBC count = 4, ...

Using System.IO.Directory how can I read each subdirectory to populate
my list of the test text file names? and how can I use linQ to group
this list for something like the following?

foreach (myGroupTestTxt grp in Result of LinQ Magic)
console.WriteLine(groupName + " " + groupCount.ToString());



Rich

*** Sent via Developersdex http://www.developersdex.com ***
From: Peter Duniho on
Rich P wrote:
> [...]
> Question:
>
> My.Computer.FileSystem.GetFiles("C:\\1A",
> FileIO.SearchOption.SearchAllSubDirectories...
>
> this will search all the subdirectories. How do I search all
> subdirectories with System.IO.Directory class - GetFiles() ?

See:
http://msdn.microsoft.com/en-us/library/ms143316.aspx

> [...]
> Question2:
>
> char[] _rgchDigits = { '0', '1', '2', '3', '4', '5', '6', '7', '8',
> '9' };
>
> var grouped = from filename in myList
> group filename by Path
> .GetFilename(filename)
> .Substring(0, filename.IndexOfAny(_rgchDigits));
>
> [...]
> I want to read all of these text file names into a list and group them
> by testA, testAB, testB, testBC, testC, testCD, and get a count of each
> group where group testA is count = 3, testAB is count = 4, testB count =
> 3, testBC count = 4, ...
>
> Using System.IO.Directory how can I read each subdirectory to populate
> my list of the test text file names?

You can either enumerate files one directory at a time (see
Directory.GetDirectories() for getting a list of directories in a
directory), or see above for enumerating all files recursively under a
given path.

> and how can I use linQ to group
> this list for something like the following?
>
> foreach (myGroupTestTxt grp in Result of LinQ Magic)
> console.WriteLine(groupName + " " + groupCount.ToString());

Assuming the code I proposed:

foreach (var group in grouped)
{
Console.WriteLine(group.Key + " " + group.Count.ToString());
}

This stuff is all in the documentation. Given the code I proposed
earlier, you could have even used VS's Intellisense to see what the
query result was and figure out how to use it, but of course you could
also have started with the Enumerable.GroupBy() method to see what it
returns and followed the chain of class features from there.

Pete
From: Rich P on
As alwyas, thank you very much for your reply. I am now using
System.IO.Directory for drilling into subdirectories -- very nice! And
I am attempting to use the code sample you have proposed. But VS is
complaining. Here is what I have attempted thus far:

private void GroupFiles()
{
DirectoryInfo di = new DirectoryInfo(@"C:\1A\1AA\");
FileInfo[] files = di.GetFiles("*.txt", SearchOption.AllDirectories);
List<string> myList = new List<string>();
foreach (FileInfo file in files)
myList.Add(file.Name);

foreach (string str1 in myList)
Console.WriteLine(str1);

/*only searching 1 dir for now -- here are my test files
test1.txt
test2.txt
test3.txt
test4.txt
testA1.txt
testA2.txt
testA3.txt
*/

char[] _rgchDigits = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
};

var grouped = from filename in myList
group filename myPath
.GetFilename(filename) <<<--- VS complains here
.Substring(0, filename.IndexOfAny(_rgchDigits));

}


I appologize in advance for my ignorance on the subject of LinQ, but
when I add your proposed code to the routine above - VS complains as
noted. At this point in time I don't have enough experience/intuition
to see what is missing or where to go next with the Linq part of the
exercise. Any suggestions greatly appreciated on how I could list the
count of groups of my test files -- like
group "test" has count = 4, and group "testA" has count = 3. how do I
proceed with Linq to obtain this information?

Thanks again for all the help.

Rich

*** Sent via Developersdex http://www.developersdex.com ***