From: Joseph M. Newcomer on
Back In The Day, late 1980s, a friend was hired by Schumberger to modify Unix to support
files > 4GB. It turns out that there were the world's largest geological survey company,
and would do things like blow up a truckload of explosives in the middle of the Texas
desert, analyze the signals picked up by sensors covering several hundred square miles,
and come back and say "drill here for oil". As my friend told me, "you may think of 4GB
as a whopping lot of data; I think of it as a 3-minute sample". 2GB has been an
unreasonably small limit since the 1970s (ask anyone at GM who maintained the databases of
every car ever sold, including their manufacturing history; they were running
multi-gigabyte databases in the era when this involved millions of dollars of hard
drives). I'm amazed anyone would think 4GB as a reasonable file limit, let alone a tiny
value like 2GB. We live in an era of TB databases.

Illiac IV had a 2GB hard drive, the largest ever made in that era, mid-1960s. They wanted
more, but that was the largest that could be produced. So for over 40 years, it has been
known that 2GB is at best a "modest" size.

The real annoyance is that you end up having to do a lot of gratuitous casts, e.g.,

CByteArray b;
b.SetSize(...some value...);

WriteFile(h, b.GetData(), b.GetSize(), &bytesWritten, NULL);

will not compile, because b.GetSize(), or even sizeof(), in Win64 return SIZE_T values,
which then have to be truncated by a cast to get successful compilation. That's what's
really sad: they only did half the job.

I can't imagine why the 64-bit Windows system (for which all new drivers had to be
written) could not have extended the sizes.
joe

On Thu, 14 Jan 2010 23:55:24 -0500, Hector Santos <sant9442(a)nospam.gmail.com> wrote:

>Alexander Grigoriev wrote:
>
>> Historically, I/O sizes in the kernel drivers has been stored in ULONG. And
>> Length in IO_STACK_LOCATION is ULONG, too. That would be a bit too much
>> hassle to convert everything to SIZE_T...
>>
>
>
>To add to this, the other issue with the misconception of using ULONG
>or DWORD, is that even if your own application is "Unsigned Ready"
>with the hope to handing the positive range representations, i.e, 0 to
>4 MEG with support an extended a file size of an ideal 4 meg size in a
>32 bit memory space, the problem I've seen is interfacing "other" API
>functions, libraries and/or WIN32 API itself where its still working
>and sometimes naturally so in an +/- signed integer world.
>
>In short, for example, once upon a time when we documented our
>requirements and limits, it use to say among its list:
>
> o Up to 4 GigaByte file sizes
>
>That was based on our persistent and consistent usage of DWORD within
>our own code. It was a theoretical limit, but empirically the limit
>was 2 GIG because of the various external interfaces. So today, we use
>the 2 gig limit in our support docs and don't bother taking about what
>we theoretically can handle.
>
>My Point?
>
>As we move into the 64 bit world, the positive range would still be a
>theory and not one to rely on in new 64 bit designs. I guess that
>will design on how an application interfaces with the outside world.
>
>Of course, one might might suggest
>
> "Even dealing in a signed 64 bit world is limitless
> and no one should have problems. So design with INT64 in
> mind"
>
>Two things come to mind:
>
> - We thought that was the case when moving to 32 bit,
> - Quoting the late George Carlin:
>
> "More Space, More Junk"
>
>:)
>
>As as side issue.
>
>One of my beefs with Microsoft and .NET, and I have to review this
>issue again, but under .NET, natural LONG type is 64 bit! I have to
>review this again to see if it was all the .NET languages or one or
>more, C# C++/.NET and/or VB, I seem to recall it was not consistent
>and I thought that was a mistake to create confusion by using a long
>time established LONG type keyword to be 64 bit still within a 32 bit
>compiled world. I can certainly understand the thinking of the design
>team but it definitely was absent of established C/C++ 32 bit
>engineering considerations.
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on
50MB is about 2% of the available 2GB address space. Not even worth discussing. Things
don't get interesting until you start getting an order of magnitude larger, at least.
joe

On Thu, 14 Jan 2010 22:40:19 -0800, "Tom Serface" <tom(a)camaswood.com> wrote:

>I think you need to be really careful not to use up all the real memory as
>well so that you don't start swapping to disk. That is a killer that you
>don't even see coming, although 50MB shouldn't be a problem on most modern
>computers.
>
>Tom
>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
>news:686vk59l8chun24uceekvdc8pt2uj4n811(a)4ax.com...
>> By the way, did anyone really notice that ReadFile and WriteFile in Win64
>> cannot read or
>> write more than 4.2GB? Seems really, really strange the length and bytes
>> read did not
>> become DWORD_PTR values...
>> joe
>>
>> On Thu, 14 Jan 2010 16:37:26 -0500, Joseph M. Newcomer
>> <newcomer(a)flounder.com> wrote:
>>
>>>Yes, but the file size was given as 50MB.
>>> joe
>>>
>>>On Thu, 14 Jan 2010 14:24:30 -0600, Stephen Myers
>>><""StephenMyers\"@discussions(a)microsoft.com"> wrote:
>>>
>>>>Just to verify my (admittedly limited) understanding...
>>>>
>>>>I assume that the code posted will fail for files greater than 2GB or so
>>>>with a 32 bit OS due to available address space.
>>>>
>>>>Steve
>>>>
>>>>Joseph M. Newcomer wrote:
>>>>> See below...
>>>>> On Thu, 14 Jan 2010 09:01:47 -0600, "Peter Olcott"
>>>>> <NoSpam(a)SeeScreen.com> wrote:
>>>>>
>>>>>> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>>>>>> news:OzySgEPlKHA.2132(a)TK2MSFTNGP05.phx.gbl...
>>>>>>> Peter Olcott wrote:
>>>>>>>
>>>>>>>> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>>>>>>>> message news:%23OQCOfNlKHA.1824(a)TK2MSFTNGP04.phx.gbl...
>>>>>>>>> Peter Olcott wrote:
>>>>>>>>>
>>>>>>>>> By File Copy, you mean DOS copy command or the
>>>>>>>>> CopyFile() API?
>>>>>>>> I am using the DOS command prompt's copy command. This
>>>>>>>> is fast.
>>>>>>>>
>>>>>>>>
>>>>>>>> The problem is the contradiction formed by the fact that
>>>>>>>> reading and writng the file is fast, while reading and
>>>>>>>> not wrting this same file is slow.
>>>>>>>> I am currently using fopen() and fread(); I am using
>>>>>>>> Windows XP.
>>>>>>> True, if the DOS copy command is fast,then I believe the
>>>>>>> code you are using is not optimal. The DOS Copy is using
>>>>>>> the same CreateFile() API which fopen() also finally uses
>>>>>>> in the RTL. So you should be able to match the same
>>>>>>> performance of the DOS Copy command.
>>>>>>>
>>>>>>> Have you tried using setvbuf to set a buffer cache?
>>>>>>>
>>>>>>> Here is a small test code that opens a 50 meg file:
>>>>>>>
>>>>>>> // File: V:\wc7beta\testbufsize.cpp
>>>>>>> // Compile with: cl testbufsize.cpp
>>>>>>>
>>>>>>> #include <stdio.h>
>>>>>>> #include <windows.h>
>>>>>>>
>>>>>>> void main(char argc, char *argv[])
>>>>>>> {
>>>>>>> char _cache[1024*16] = {0}; // 16K cache
>>>>>>> BYTE buf[1024*1] = {0}; // 1K buffer
>>>>> ****
>>>>> Reading a 50MB file, why such an incredibly tiny buffer?
>>>>> ****
>>>>>>> FILE *fv = fopen("largefile.dat","rb");
>>>>>>> if (fv) {
>>>>>>> int res = setvbuf(fv, _cache, _IOFBF,
>>>>>>> sizeof(_cache));
>>>>>>> DWORD nTotal = 0;
>>>>>>> DWORD nDisks = 0;
>>>>>>> DWORD nLoops = 0;
>>>>>>> DWORD nStart = GetTickCount();
>>>>>>> while (!feof(fv)) {
>>>>>>> nLoops++;
>>>>>>> memset(&buf,sizeof(buf),0);
>>>>> ****
>>>>> The memset is silly. Wastes time, accomplishes nothing. You are
>>>>> setting a buffer to 0
>>>>> right before completely overwriting it! This is like writing
>>>>> int a;
>>>>>
>>>>> a = 0; // make sure a is 0 before assigning b
>>>>> a = b;
>>>>> ****
>>>>>>> int nRead = fread(buf,1,sizeof(buf),fv);
>>>>>>> nTotal +=nRead;
>>>>>>> if (nRead > 0 && !fv->_cnt) nDisks++;
>>>>>>> }
>>>>>>> fclose(fv);
>>>>>>> printf("Time: %d | Size: %d | Reads: %d | Disks:
>>>>>>> %d\n",
>>>>>>> GetTickCount()-nStart,
>>>>>>> nTotal,
>>>>>>> nLoops,
>>>>>>> nDisks);
>>>>>>> }
>>>>>>> }
>>>>> ****
>>>>> If I were reading a small 50MB file, I would do
>>>>>
>>>>> void tmain(int argc, _TCHAR * argv[])
>>>>> {
>>>>> HANDLE h = CreateFile(_T("largefile.dat"), GENERIC_READ, 0, NULL,
>>>>> OPEN_EXISTING,
>>>>> FILE_ATTRIBUTE_NORMAL, NULL);
>>>>>
>>>>> LARGE_INTEGER size;
>>>>>
>>>>> GetFileSizeEx(h, &size);
>>>>>
>>>>> // This code assumes file is < 4.2GB!
>>>>> LPVOID p = VirtualAlloc(NULL, (SIZE_T)size.LowPart, MEM_COMMIT,
>>>>> PAGE_READWRITE);
>>>>> DWORD bytesRead;
>>>>> ReadFile(h, p, size.LowPart, &bytesRead, NULL);
>>>>> ... process data
>>>>> VirtualFree(p, (SIZE_T)size.LowPart, MEM_DECOMMIT);
>>>>> return 0;
>>>>> }
>>>>>
>>>>> Note that the above does not do any error checking; the obvious error
>>>>> checking is left as
>>>>> an Exercise For The Reader. No read loops, no gratuitous memsets, just
>>>>> simple code that
>>>>> does exactly ONE ReadFile.
>>>>> joe
>>>>>
>>>>>>> What this basically shows is the number of disk hits it
>>>>>>> makes
>>>>>>> by checking the fv->_cnt value. It shows that as long as
>>>>>>> the cache size is larger than the read buffer size, you
>>>>>>> get the same number of disk hits. I also spit out the
>>>>>>> milliseconds. Subsequent runs, of course, is faster since
>>>>>>> the OS API CreateFile() is used by the RTL in buffer mode.
>>>>>>>
>>>>>>> Also do you know what protocol you have Samba using?
>>>>>> I am guessing that the code above will work with a file of
>>>>>> any size?
>>>>>> If that is the case, then you solved my problem.
>>>>>> The only Samba protocol that I am aware of is smb.
>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> HLS
>>>>> Joseph M. Newcomer [MVP]
>>>>> email: newcomer(a)flounder.com
>>>>> Web: http://www.flounder.com
>>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>>Joseph M. Newcomer [MVP]
>>>email: newcomer(a)flounder.com
>>>Web: http://www.flounder.com
>>>MVP Tips: http://www.flounder.com/mvp_tips.htm
>> Joseph M. Newcomer [MVP]
>> email: newcomer(a)flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Hector Santos on
Joseph M. Newcomer wrote:

> Back In The Day, late 1980s, a friend was hired by Schumberger to modify Unix to support
> files > 4GB. It turns out that there were the world's largest geological survey company,
> and would do things like blow up a truckload of explosives in the middle of the Texas
> desert, analyze the signals picked up by sensors covering several hundred square miles,
> and come back and say "drill here for oil". As my friend told me, "you may think of 4GB
> as a whopping lot of data; I think of it as a 3-minute sample". 2GB has been an
> unreasonably small limit since the 1970s (ask anyone at GM who maintained the databases of
> every car ever sold, including their manufacturing history; they were running
> multi-gigabyte databases in the era when this involved millions of dollars of hard
> drives). I'm amazed anyone would think 4GB as a reasonable file limit, let alone a tiny
> value like 2GB. We live in an era of TB databases.



That is a niche situation. It is not the common need. Also it was
what it was. The limits were based on the technology of the day, and
the practical market requirements.

>
> Illiac IV had a 2GB hard drive, the largest ever made in that era, mid-1960s. They wanted
> more, but that was the largest that could be produced. So for over 40 years, it has been
> known that 2GB is at best a "modest" size.


Limits were generally based on the natural word size of the chips used.

> The real annoyance is that you end up having to do a lot of gratuitous casts, e.g.,


Right, which also brings the point that OOPS, VARIANT TYPES were not
the thinking across the board. To satisfy upward compatibility, the
programming world would have to be basically managed. Totally
symbolic, no pointers, no need to know anything about "SIZE"

That was certainly unrealistic back then, and still is today.


--
HLS
From: Tom Serface on
Yeah, I confess in my case I was more concerned with speed of the copy over
number of reads and writes. I need to restore data from optical media as
fast as possible and Windows is not very efficient at reading removable
media.

Tom

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
news:qnc1l5pa33uo1lv0050rik345hp3sq31a1(a)4ax.com...
> 50MB is about 2% of the available 2GB address space. Not even worth
> discussing. Things
> don't get interesting until you start getting an order of magnitude
> larger, at least.
> joe
>
> On Thu, 14 Jan 2010 22:40:19 -0800, "Tom Serface" <tom(a)camaswood.com>
> wrote:
>
>>I think you need to be really careful not to use up all the real memory as
>>well so that you don't start swapping to disk. That is a killer that you
>>don't even see coming, although 50MB shouldn't be a problem on most modern
>>computers.
>>
>>Tom
>>
>>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
>>news:686vk59l8chun24uceekvdc8pt2uj4n811(a)4ax.com...
>>> By the way, did anyone really notice that ReadFile and WriteFile in
>>> Win64
>>> cannot read or
>>> write more than 4.2GB? Seems really, really strange the length and
>>> bytes
>>> read did not
>>> become DWORD_PTR values...
>>> joe
>>>
>>> On Thu, 14 Jan 2010 16:37:26 -0500, Joseph M. Newcomer
>>> <newcomer(a)flounder.com> wrote:
>>>
>>>>Yes, but the file size was given as 50MB.
>>>> joe
>>>>
>>>>On Thu, 14 Jan 2010 14:24:30 -0600, Stephen Myers
>>>><""StephenMyers\"@discussions(a)microsoft.com"> wrote:
>>>>
>>>>>Just to verify my (admittedly limited) understanding...
>>>>>
>>>>>I assume that the code posted will fail for files greater than 2GB or
>>>>>so
>>>>>with a 32 bit OS due to available address space.
>>>>>
>>>>>Steve
>>>>>
>>>>>Joseph M. Newcomer wrote:
>>>>>> See below...
>>>>>> On Thu, 14 Jan 2010 09:01:47 -0600, "Peter Olcott"
>>>>>> <NoSpam(a)SeeScreen.com> wrote:
>>>>>>
>>>>>>> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>>>>>>> news:OzySgEPlKHA.2132(a)TK2MSFTNGP05.phx.gbl...
>>>>>>>> Peter Olcott wrote:
>>>>>>>>
>>>>>>>>> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>>>>>>>>> message news:%23OQCOfNlKHA.1824(a)TK2MSFTNGP04.phx.gbl...
>>>>>>>>>> Peter Olcott wrote:
>>>>>>>>>>
>>>>>>>>>> By File Copy, you mean DOS copy command or the
>>>>>>>>>> CopyFile() API?
>>>>>>>>> I am using the DOS command prompt's copy command. This
>>>>>>>>> is fast.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The problem is the contradiction formed by the fact that
>>>>>>>>> reading and writng the file is fast, while reading and
>>>>>>>>> not wrting this same file is slow.
>>>>>>>>> I am currently using fopen() and fread(); I am using
>>>>>>>>> Windows XP.
>>>>>>>> True, if the DOS copy command is fast,then I believe the
>>>>>>>> code you are using is not optimal. The DOS Copy is using
>>>>>>>> the same CreateFile() API which fopen() also finally uses
>>>>>>>> in the RTL. So you should be able to match the same
>>>>>>>> performance of the DOS Copy command.
>>>>>>>>
>>>>>>>> Have you tried using setvbuf to set a buffer cache?
>>>>>>>>
>>>>>>>> Here is a small test code that opens a 50 meg file:
>>>>>>>>
>>>>>>>> // File: V:\wc7beta\testbufsize.cpp
>>>>>>>> // Compile with: cl testbufsize.cpp
>>>>>>>>
>>>>>>>> #include <stdio.h>
>>>>>>>> #include <windows.h>
>>>>>>>>
>>>>>>>> void main(char argc, char *argv[])
>>>>>>>> {
>>>>>>>> char _cache[1024*16] = {0}; // 16K cache
>>>>>>>> BYTE buf[1024*1] = {0}; // 1K buffer
>>>>>> ****
>>>>>> Reading a 50MB file, why such an incredibly tiny buffer?
>>>>>> ****
>>>>>>>> FILE *fv = fopen("largefile.dat","rb");
>>>>>>>> if (fv) {
>>>>>>>> int res = setvbuf(fv, _cache, _IOFBF,
>>>>>>>> sizeof(_cache));
>>>>>>>> DWORD nTotal = 0;
>>>>>>>> DWORD nDisks = 0;
>>>>>>>> DWORD nLoops = 0;
>>>>>>>> DWORD nStart = GetTickCount();
>>>>>>>> while (!feof(fv)) {
>>>>>>>> nLoops++;
>>>>>>>> memset(&buf,sizeof(buf),0);
>>>>>> ****
>>>>>> The memset is silly. Wastes time, accomplishes nothing. You are
>>>>>> setting a buffer to 0
>>>>>> right before completely overwriting it! This is like writing
>>>>>> int a;
>>>>>>
>>>>>> a = 0; // make sure a is 0 before assigning b
>>>>>> a = b;
>>>>>> ****
>>>>>>>> int nRead = fread(buf,1,sizeof(buf),fv);
>>>>>>>> nTotal +=nRead;
>>>>>>>> if (nRead > 0 && !fv->_cnt) nDisks++;
>>>>>>>> }
>>>>>>>> fclose(fv);
>>>>>>>> printf("Time: %d | Size: %d | Reads: %d | Disks:
>>>>>>>> %d\n",
>>>>>>>> GetTickCount()-nStart,
>>>>>>>> nTotal,
>>>>>>>> nLoops,
>>>>>>>> nDisks);
>>>>>>>> }
>>>>>>>> }
>>>>>> ****
>>>>>> If I were reading a small 50MB file, I would do
>>>>>>
>>>>>> void tmain(int argc, _TCHAR * argv[])
>>>>>> {
>>>>>> HANDLE h = CreateFile(_T("largefile.dat"), GENERIC_READ, 0, NULL,
>>>>>> OPEN_EXISTING,
>>>>>> FILE_ATTRIBUTE_NORMAL, NULL);
>>>>>>
>>>>>> LARGE_INTEGER size;
>>>>>>
>>>>>> GetFileSizeEx(h, &size);
>>>>>>
>>>>>> // This code assumes file is < 4.2GB!
>>>>>> LPVOID p = VirtualAlloc(NULL, (SIZE_T)size.LowPart, MEM_COMMIT,
>>>>>> PAGE_READWRITE);
>>>>>> DWORD bytesRead;
>>>>>> ReadFile(h, p, size.LowPart, &bytesRead, NULL);
>>>>>> ... process data
>>>>>> VirtualFree(p, (SIZE_T)size.LowPart, MEM_DECOMMIT);
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> Note that the above does not do any error checking; the obvious error
>>>>>> checking is left as
>>>>>> an Exercise For The Reader. No read loops, no gratuitous memsets,
>>>>>> just
>>>>>> simple code that
>>>>>> does exactly ONE ReadFile.
>>>>>> joe
>>>>>>
>>>>>>>> What this basically shows is the number of disk hits it
>>>>>>>> makes
>>>>>>>> by checking the fv->_cnt value. It shows that as long as
>>>>>>>> the cache size is larger than the read buffer size, you
>>>>>>>> get the same number of disk hits. I also spit out the
>>>>>>>> milliseconds. Subsequent runs, of course, is faster since
>>>>>>>> the OS API CreateFile() is used by the RTL in buffer mode.
>>>>>>>>
>>>>>>>> Also do you know what protocol you have Samba using?
>>>>>>> I am guessing that the code above will work with a file of
>>>>>>> any size?
>>>>>>> If that is the case, then you solved my problem.
>>>>>>> The only Samba protocol that I am aware of is smb.
>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> HLS
>>>>>> Joseph M. Newcomer [MVP]
>>>>>> email: newcomer(a)flounder.com
>>>>>> Web: http://www.flounder.com
>>>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>>>Joseph M. Newcomer [MVP]
>>>>email: newcomer(a)flounder.com
>>>>Web: http://www.flounder.com
>>>>MVP Tips: http://www.flounder.com/mvp_tips.htm
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer(a)flounder.com
>>> Web: http://www.flounder.com
>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on
It seems to be largely an artifact of delay-to-processing. I've regularly read in massive
files and run all over them, and it is vastly easier than trying to do some of the
computations, particularly if you have to cross buffer boundaries. I usually get away
with this because we are looking at files under 100MB, that is, small files. In some
cases, little tiny files that don't exceed 10MB. But for serious files, in the TB range,
I've typically used file mappings unless there is a motivation to allow concurrent access.
joe

On Thu, 14 Jan 2010 22:38:34 -0800, "Tom Serface" <tom(a)camaswood.com> wrote:

>Joe,
>
>My experience has been that if the buffer gets too large it starts to slow
>down the operation. In my case, I have to read all sizes of files and I've
>found the optimal buffer to be around 16K (I think that's what OP was
>using).
>
>I use the SDK functions CreateFile, ReadFile, WriteFile, rather than MFC and
>my copy routine is around 2-3x what the DOS copy command does (or CopyFile).
>I had to write my own for a special purpose though because I have to glue
>files back together that span multiple volumes, but I was happy to get
>better performance than I was getting with CopyFile previously.
>
>I think there is a trade-off somewhere, but I'm not entirely sure where. I
>just did trial and error with different scenarios until I got it the best I
>could.
>
>Tom
>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
>news:f2suk5dtuqe1drflfbk6e5ol0io4i7qocf(a)4ax.com...
>> See below...
>> On Thu, 14 Jan 2010 09:01:47 -0600, "Peter Olcott" <NoSpam(a)SeeScreen.com>
>> wrote:
>>
>>>
>>>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>>>news:OzySgEPlKHA.2132(a)TK2MSFTNGP05.phx.gbl...
>>>> Peter Olcott wrote:
>>>>
>>>>> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>>>>> message news:%23OQCOfNlKHA.1824(a)TK2MSFTNGP04.phx.gbl...
>>>>>> Peter Olcott wrote:
>>>>>>
>>>>>> By File Copy, you mean DOS copy command or the
>>>>>> CopyFile() API?
>>>>
>>>> >
>>>>
>>>>> I am using the DOS command prompt's copy command. This
>>>>> is fast.
>>>>>
>>>>>
>>>>> The problem is the contradiction formed by the fact that
>>>>> reading and writng the file is fast, while reading and
>>>>> not wrting this same file is slow.
>>>>> I am currently using fopen() and fread(); I am using
>>>>> Windows XP.
>>>>
>>>> True, if the DOS copy command is fast,then I believe the
>>>> code you are using is not optimal. The DOS Copy is using
>>>> the same CreateFile() API which fopen() also finally uses
>>>> in the RTL. So you should be able to match the same
>>>> performance of the DOS Copy command.
>>>>
>>>> Have you tried using setvbuf to set a buffer cache?
>>>>
>>>> Here is a small test code that opens a 50 meg file:
>>>>
>>>> // File: V:\wc7beta\testbufsize.cpp
>>>> // Compile with: cl testbufsize.cpp
>>>>
>>>> #include <stdio.h>
>>>> #include <windows.h>
>>>>
>>>> void main(char argc, char *argv[])
>>>> {
>>>> char _cache[1024*16] = {0}; // 16K cache
>>>> BYTE buf[1024*1] = {0}; // 1K buffer
>> ****
>> Reading a 50MB file, why such an incredibly tiny buffer?
>> ****
>>>>
>>>> FILE *fv = fopen("largefile.dat","rb");
>>>> if (fv) {
>>>> int res = setvbuf(fv, _cache, _IOFBF,
>>>> sizeof(_cache));
>>>> DWORD nTotal = 0;
>>>> DWORD nDisks = 0;
>>>> DWORD nLoops = 0;
>>>> DWORD nStart = GetTickCount();
>>>> while (!feof(fv)) {
>>>> nLoops++;
>>>> memset(&buf,sizeof(buf),0);
>> ****
>> The memset is silly. Wastes time, accomplishes nothing. You are setting
>> a buffer to 0
>> right before completely overwriting it! This is like writing
>> int a;
>>
>> a = 0; // make sure a is 0 before assigning b
>> a = b;
>> ****
>>>> int nRead = fread(buf,1,sizeof(buf),fv);
>>>> nTotal +=nRead;
>>>> if (nRead > 0 && !fv->_cnt) nDisks++;
>>>> }
>>>> fclose(fv);
>>>> printf("Time: %d | Size: %d | Reads: %d | Disks:
>>>> %d\n",
>>>> GetTickCount()-nStart,
>>>> nTotal,
>>>> nLoops,
>>>> nDisks);
>>>> }
>>>> }
>> ****
>> If I were reading a small 50MB file, I would do
>>
>> void tmain(int argc, _TCHAR * argv[])
>> {
>> HANDLE h = CreateFile(_T("largefile.dat"), GENERIC_READ, 0, NULL,
>> OPEN_EXISTING,
>> FILE_ATTRIBUTE_NORMAL, NULL);
>>
>> LARGE_INTEGER size;
>>
>> GetFileSizeEx(h, &size);
>>
>> // This code assumes file is < 4.2GB!
>> LPVOID p = VirtualAlloc(NULL, (SIZE_T)size.LowPart, MEM_COMMIT,
>> PAGE_READWRITE);
>> DWORD bytesRead;
>> ReadFile(h, p, size.LowPart, &bytesRead, NULL);
>> ... process data
>> VirtualFree(p, (SIZE_T)size.LowPart, MEM_DECOMMIT);
>> return 0;
>> }
>>
>> Note that the above does not do any error checking; the obvious error
>> checking is left as
>> an Exercise For The Reader. No read loops, no gratuitous memsets, just
>> simple code that
>> does exactly ONE ReadFile.
>> joe
>>
>>>>
>>>> What this basically shows is the number of disk hits it
>>>> makes
>>>> by checking the fv->_cnt value. It shows that as long as
>>>> the cache size is larger than the read buffer size, you
>>>> get the same number of disk hits. I also spit out the
>>>> milliseconds. Subsequent runs, of course, is faster since
>>>> the OS API CreateFile() is used by the RTL in buffer mode.
>>>>
>>>> Also do you know what protocol you have Samba using?
>>>
>>>I am guessing that the code above will work with a file of
>>>any size?
>>>If that is the case, then you solved my problem.
>>>The only Samba protocol that I am aware of is smb.
>>>
>>>>
>>>>
>>>> --
>>>> HLS
>>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer(a)flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm