From: Peter Sisak on

Given the files in a directory, by what method can I receive the list of
the files that in size is closest to, but not larger than, an arbitrary
x value, given in bytes? Each file can be listed at most once. The file
names and their respective sizes must be read in by Mathematica. (The directory contains files with count of around two-three hundred, thus making a brute force "sum then compare" search of all possible combinations would most probably not finish before the next ice age coming...)
An extension to the above problem: how to find the lists of files, if there are more than one lists permitted, and the aim is to minimise the total difference between the respective groups and the (identical) x value? (Think archiving data using the least amount of identical-type CDs, or wasting the least amount, and leaving the few odd ones out.)

Any ideas how to pull this? Thanks in advance
Peter Sisak
From: Bill Rowe on
On 3/30/10 at 5:02 AM, p-kun80(a)hotmail.com (Peter Sisak) wrote:

>Given the files in a directory, by what method can I receive the
>list of the files that in size is closest to, but not larger than,
>an arbitrary x value, given in bytes?

This part is relatively simple. For example:

Reverse(a)Sort@
Cases[{FileByteCount@#, #} & /@
DeleteCases[
FileNames[], _?(FileType@# === Directory &)], {_?(# <
x &), _}]

will give a list of files with their sizes in descending size
with all files having fewer than x bytes. So, the first file in
the list will be the file whose size is closest to x but not
greater than x

>Each file can be listed at most once. The file names and their
>respective sizes must be read in by Mathematica. (The directory
>contains files with count of around two-three hundred, thus making a
>brute force "sum then compare" search of all possible combinations
>would most probably not finish before the next ice age coming...)

>An extension to the above problem: how to find the lists of files, if
>there are more than one lists permitted, and the aim is to minimise the
>total difference between the respective groups and the (identical) x
>value? (Think archiving data using the least amount of identical-type
>CDs, or wasting the least amount, and leaving the few odd ones out.)

This part is quite difficult. This makes the problem essentially
the knapsack problem which is known to be NP-complete. As far as
I know, there is no algorithm that is both fast and guaranteed
to provide a solution. In fact, the difficulty of this problem
is the basis for some cryptography applications.


From: dr DanW on
There are a number of functions in Mathematica 7 for doing just this
kind of analysis. Look up Nearest[].

Daniel