From: AA2e72E on
Given a list of files, how can I specify the 'where' in a LINQ query to
return the name of the most recent file name?

e.g.

var queryLatestFile = from file in Directory.GetFiles(@"c:\mypath", "*.JPG"))
where file.CreationTime== ??
select file.FullName;

Thanks for your help.
From: Peter Duniho on
AA2e72E wrote:
> Given a list of files, how can I specify the 'where' in a LINQ query to
> return the name of the most recent file name?
>
> e.g.
>
> var queryLatestFile = from file in Directory.GetFiles(@"c:\mypath", "*.JPG"))
> where file.CreationTime== ??
> select file.FullName;

Your question doesn't make much sense to me. The GetFiles() method
returns a string[]. There's no "CreationTime" or "FullName" property
for string.

Even if I make the assmption that you really mean to use
DirectoryInfo.GetFiles() instead of Directory.GetFiles(), the "where"
clause isn't really a way to get an individual element with a specific
relationship to the other elements. Instead, it seems to me you should
use "orderby":

var query = from file in DirectoryInfo.GetFiles(@"c:\mypath", "*.JPG")
orderby file.CreationTime descending
select file.FullName;

string str = query.FirstOrDefault();

Then "str" will either be null (if there are no files), or will contain
the name of the most recent file.

Pete
From: AA2e72E on
Thanks. However, I don't think I asked the right question. I'll try again:

Given:

System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(@"c:\test");
IEnumerable<System.IO.FileInfo> fileList =
dir.GetFiles("*.txt",System.IO.SearchOption.AllDirectories);

fileList will contain files in the directory tree c:\test and files by the
same name will exist in c:\test\one\myfile.txt and c:\test\two\myfile.txt etc.

I would like to be able to pick the myfile.txt that has the latest creation
time; obviously when a file exixts uniquely i.e. in one sub directory in the
tree, it will have the latest creation time (by default) and will get picked.

I hope I have explained this adequately: thanks for your help.

From: Peter Duniho on
AA2e72E wrote:
> Thanks. However, I don't think I asked the right question. I'll try again:
>
> Given:
>
> System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(@"c:\test");
> IEnumerable<System.IO.FileInfo> fileList =
> dir.GetFiles("*.txt",System.IO.SearchOption.AllDirectories);
>
> fileList will contain files in the directory tree c:\test and files by the
> same name will exist in c:\test\one\myfile.txt and c:\test\two\myfile.txt etc.
>
> I would like to be able to pick the myfile.txt that has the latest creation
> time; obviously when a file exixts uniquely i.e. in one sub directory in the
> tree, it will have the latest creation time (by default) and will get picked.
>
> I hope I have explained this adequately: thanks for your help.

Yes, I think that clarifies things a little more. But I'm still not
entirely sure I understand.

Do you already have a specific filename in mind when you execute this
code? Or are you looking to select _all_ of the files in the
enumeration, but only the most recent for any given filename?

That is, is the operation something like "look for any file named X,
return the one file named X that is the most recent"? Or is it "return
all distinct file names, return only the path for the most recent file
with a given name"?

The former seems to me to be best solved simply by providing file name
"X" as the search pattern to the GetFiles() method. Then you can use
the code I posted previously to get the most recent from the beginning
of an ordered enumeration of that search result.

The latter is more complicated, and I'm not sure that a good solution
will use only LINQ. You can do something like this:

var query = from file in fileList
orderby file.Name, file.CreationTime descending;

And then you can run through the list, picking the first unique name as
you go:

string previousFilename = null;
List<string> latest = new List<string>();

foreach (var file in query)
{
if (previousFilename == null || previousFilename != file.Name)
{
latest.Add(file.FullName);
previousFilename = file.Name;
}
}

Then at the end, you'll have a list of the paths to each file within
your original directory search, where any given filename appears only
once, and is the path to the file with the most recent creation
timestamp for that given filename.

If that doesn't answer your question, perhaps you should provide a
specific example of what the input might be (that is, the list of files
after you've called GetFiles()), and what output you want to get.

Pete
From: AA2e72E on
Thanks. I am after

'That is, is the operation something like "look for any file named X, return
the one file named X that is the most recent"?'

I am inclined to agree that there may not be a LINQ only solution although I
thought Linq87 (Max - Grouped) in the 101 Linq samples may provide a suitable
basis for a solution.

I have a solution which is very time consuming: I use DirectoryInfo to
extract the path, filename, and file creation date into three columns which I
write to a SQL Server table and then use SQL to get the file names. This is
taking just under three hours for 650,000 files spread across 6,200
subfolders: I wanted a LINQ solution to see if I could get it done quicker. I
expected it to be quicker (had it been possible!) as it would avoid the
re-iterative calls to SQL Server.

Thanks for looking into this.