From: Peter Olcott on

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:blruk59utgoph0al9saun1gu93dooccr60(a)4ax.com...
> See below...
> On Wed, 13 Jan 2010 23:55:56 -0600, "Peter Olcott"
> <NoSpam(a)SeeScreen.com> wrote:
>
>>
>>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>>message
>>news:%23OQCOfNlKHA.1824(a)TK2MSFTNGP04.phx.gbl...
>>> Peter Olcott wrote:
>>>
>>>>
>>>> I am doing block I/O, and it is very fast on the local
>>>> drive and much slower on the 1.0 gb LAN, yet file
>>>> copies
>>>> to and from the LAN are still fast.
>>>>
>>>> (1) File copy to and from the LAN is faster than local
>>>> drive copies, 20 seconds for LAN, 25 seconds for local.
>>>> (2) Block I/O is fast on the local drive, 12 seconds
>>>> for
>>>> 632 MB.
>>>> (3) Block I/O is slow on the LAN, 80 seconds for 632
>>>> MB.
>>>> I also tried changing the block size from 4K to 1500
>>>> bytes and 9000 bytes (consistent with Ethernet frame
>>>> size), this did not help.
>>>
>>> By File Copy, you mean DOS copy command or the
>>> CopyFile()
>>> API?
>>I am using the DOS command prompt's copy command. This is
>>fast.
>>
>>>
>>> To me, the above appears to be consistent with a caching
>>> issue that your code is not enabling when the file is
>>> first open. The "File Copy" is doing it, but you are
>>> not.
>>> Probably showing how you are opening the file will help,
>>> i.e. the CreateFile() function or fopen().
>>>
>>> Another thing is maybe to check google
>>>
>>> Search: SAMBA Slow File Copy
>>
>>The problem is the contradiction formed by the fact that
>>reading and writng the file is fast, while reading and not
>>wrting this same file is slow.
>>I am currently using fopen() and fread(); I am using
>>Windows XP.
> ****
> Use of fopen/fread would certainly indicate that you are
> not doing this in anything like
> an optimal fashion.
>
> If you want to read a file that is under 100MB, it is
> usually best just to allocate a
> buffer the size of the file,

I tested this and increasing the buffer size beyond 64K and
4K respectively does not measurably increase speed. Also in
my case I must process files of arbitrary sizes to compute
their MD5.

> CreateFile, do a single ReadFile, do your computation, do
> a
> WriteFile, and you are done. You are comparing two
> completely unrelated concepts:
> fopen/fread and a copy command; what you didn't ask was
> "what is the fastest way to read a
> file"; instead, you observe that two completely different
> technologies have different
> performance. You did not actually state this in your
> original question; you just used a
> generic concept of "copy". Details matter!
>
> Note that fread, called thousands of times, is amazingly
> slow in comparison to a single
> ReadFile.
>
> By failing to supply all the critical information, you
> essentially asked "Why is it that I
> can get from city A to city B in 20 minutes, but my friend
> takes two hours?" and neglected
> to mention you took the high-speed train while your friend
> went by bicycle.
> joe
> ****
>>
>>>
>>> There are some interesting hits, in particular if you
>>> are
>>> using Vista, this MS Samba hotfix for Vista,
>>>
>>> http://support.microsoft.com/kb/931770
>>>
>>> There was another 2007 thread that the original poster
>>> said turning off indexing improved the speed.
>>>
>>> --
>>> HLS
>>
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm


From: Hector Santos on
Stephen Myers > wrote:

> Just to verify my (admittedly limited) understanding...
>
> I assume that the code posted will fail for files greater than 2GB or so
> with a 32 bit OS due to available address space.
>

I should probably test it, but it should work because the code is
streaming and not doing any seeking.

remove the DWORD nTotal counter or change it to _int64 if he wants to
collect that bit of information.

But as a streaming reader, I don't expect an issue.

The problems, as we seen it in our 32 bit product when it comes to a
file that has gone beyond 2 gigs, is when there is SEEK beyond a 32
bit range, like a file index returning a DWORD position.

This is something that is currently pushing our agenda as more and
more customers are coming across a 2 gig+ file size need. We tested
the Extended seeking function that offer 64 bit quads positions so I
know that works. What we are debating is whether we should add 64 bit
FILE I/0 (which again I know works) or just move to pure 64 bit. The
latter is a big revamping investment and not quite ready for that.

So off hand, just starting at the beginning of the file and reading
32bit chucks, should be fine. I don't see a technical reason why that
that will not work.

--
HLS
From: Hector Santos on
Joseph M. Newcomer wrote:

> Yes, but the file size was given as 50MB.
> joe

Its amazing how "big" is relative these days. My first computer had a
$1,500 Micropolis 10 meg drive! :)

BTW, it (the code posted) should not be an issue when just streaming
in a file of any size.

The problem begins when seeking a file.

When seeking is required, we know by using the 64 bit WIN32 file I/O
functions that it works for large +2gig files.

In one part of our server product file handing, a ISAM database with
four key index files have DWORD position indices. Issues occur as the
database grows and a index goes beyond a 32 bit value. A documented
limitation but a limitations that is not outdated.

A simple solution in the works was to use the 64 bit extended file I/O
functions which offer QUAD (double DWORD) positions. Its about to be
implemented in new major revision of our server.

For backward single source compatibility, I produced a header and
wrapper functions. Here is the *.h and *.cpp files:

//------------------------------------------------------------
// File Name : ss64lib.h
//------------------------------------------------------------

#ifndef __SS64LIB_H
#define __SS64LIB_H

#ifndef _WINDOWS_
#include <windows.h>
#endif

#define SUPPORT_FILEIO_64BIT

#ifdef SUPPORT_FILEIO_64BIT
# define TINT INT64
# define TWORD QWORD
# define TFILESIZE INT64
# define MAXQWORD _UI64_MAX
# define MAXINT64 _I64_MAX
#else
# define TINT DWORD
# define TWORD DWORD
# define TFILESIZE DWORD
#endif


TINT ssFileSeek(HANDLE hf, TINT distance,
WORD MoveMethod = FILE_BEGIN);
TINT ssFileEnd(HANDLE hf);
TINT ssFilePos(HANDLE hf);
BOOL ssFileRewind(HANDLE hf);
BOOL ssFileLock(HANDLE hf, TINT Offset, TINT nBytes);
BOOL ssFileUnlock(HANDLE hf, TINT Offset, TINT nBytes);
TFILESIZE ssFileSize(HANDLE hf);

#endif // __SS64LIB_H

//------------------------------------------------------------
// File Name : ss64lib.cpp
//------------------------------------------------------------

#include "ss64lib.h"

TINT ssFileSeek(HANDLE hf, TINT dist, WORD method)
{
LARGE_INTEGER li;
li.QuadPart = dist;
li.LowPart = SetFilePointer (hf, li.LowPart, &li.HighPart, method);
if (li.LowPart == 0xFFFFFFFF && GetLastError() != NO_ERROR) {
li.QuadPart = -1;
}
return li.QuadPart;
}

BOOL ssFileRewind(HANDLE hf)
{
return ssFileSeek(hf,0,FILE_BEGIN) == 0;
}

TINT ssFilePos(HANDLE hf)
{
return ssFileSeek(hf,0,FILE_CURRENT);
}

TINT ssFileEnd(HANDLE hf)
{
return ssFileSeek(hf,0,FILE_END);
}

BOOL ssFileLock(HANDLE hf, TINT Offset, TINT nBytes)
{
LARGE_INTEGER fp, nb;
fp.QuadPart = Offset;
nb.QuadPart = nBytes;
return LockFile (hf,fp.LowPart,fp.HighPart,nb.LowPart,nb.HighPart);
}

BOOL ssFileUnlock(HANDLE hf, TINT Offset, TINT nBytes)
{
LARGE_INTEGER fp, nb;
fp.QuadPart = Offset;
nb.QuadPart = nBytes;
return UnlockFile(hf,fp.LowPart,fp.HighPart,nb.LowPart,nb.HighPart);
}

TFILESIZE ssFileSize(HANDLE hf)
{
LARGE_INTEGER li;
li.LowPart = GetFileSize(hf,(DWORD *)&li.HighPart);
return (TINT)li.QuadPart;
}

--
HLS
From: Hector Santos on
Peter Olcott wrote:

>> char _cache[1024*16] = {0}; // 16K cache
>> BYTE buf[1024*1] = {0}; // 1K buffer
>
> char _cache[1024*64] = {0}; // 64K cache
> BYTE buf[1024*4] = {0}; // 4K buffer
>
> These buffer sizes match the DOS copy speed, and provide the
> best performance.


Ok, thanks for the feedback.

The reason I asked what protocol was Samba was setup to use is because
it can use sockets and when it comes to sockets, we generally use as a
rule of thumb, a telecommunications "Bucket Brigade" concept:

Receive HIGH, Write LOW

This helps to minimize contention (pressure) issues. Receivers know
what they can handle for reception but not what the remote has for
transmission. So overwhelming it with large buffer writes can (and
generally will) increase flow control issues.

A "smooth" bucket brigade is when the right buffer (packets) sizes are
used and the brigade is working in harmony, e.g. a full bucket is
used, there are no spill overs and no bucket handler that is saying:

"yo slow down, I only have two hands!"

For you, you were receiving, so your receiver optimization was
possible. When writing, it might be a different story.

--
HLS
From: Joseph M. Newcomer on
It wouldn't have occurred to me that seeking was a problem, because if I were doing any
serious file system, I would not be using anything with a 32-bit limitation. Haven't
really worried about this problem since about 1997, when I did my first database that was
expected to hit terabytes. We just ignored the primitive and outdated C library, since it
did nothing useful for us. I notice you are still using the obsolete SetFilePointer,
instead of the more modern SetFilePointerEx, which has been around for a decade.

Also, returning the .QuadPart component should fail in any environment in which TINT is
declared as a DWORD, since the compiler should complain about truncation of a value (you
should always build at /W4). You should use the same cast as for the file size, (TINT).
joe

On Thu, 14 Jan 2010 21:51:24 -0500, Hector Santos <sant9442(a)nospam.gmail.com> wrote:

>Joseph M. Newcomer wrote:
>
>> Yes, but the file size was given as 50MB.
>> joe
>
>Its amazing how "big" is relative these days. My first computer had a
>$1,500 Micropolis 10 meg drive! :)
>
>BTW, it (the code posted) should not be an issue when just streaming
>in a file of any size.
>
>The problem begins when seeking a file.
>
>When seeking is required, we know by using the 64 bit WIN32 file I/O
>functions that it works for large +2gig files.
>
>In one part of our server product file handing, a ISAM database with
>four key index files have DWORD position indices. Issues occur as the
>database grows and a index goes beyond a 32 bit value. A documented
>limitation but a limitations that is not outdated.
>
>A simple solution in the works was to use the 64 bit extended file I/O
>functions which offer QUAD (double DWORD) positions. Its about to be
>implemented in new major revision of our server.
>
>For backward single source compatibility, I produced a header and
>wrapper functions. Here is the *.h and *.cpp files:
>
>//------------------------------------------------------------
>// File Name : ss64lib.h
>//------------------------------------------------------------
>
>#ifndef __SS64LIB_H
>#define __SS64LIB_H
>
>#ifndef _WINDOWS_
>#include <windows.h>
>#endif
>
>#define SUPPORT_FILEIO_64BIT
>
>#ifdef SUPPORT_FILEIO_64BIT
># define TINT INT64
># define TWORD QWORD
># define TFILESIZE INT64
># define MAXQWORD _UI64_MAX
># define MAXINT64 _I64_MAX
>#else
># define TINT DWORD
># define TWORD DWORD
># define TFILESIZE DWORD
>#endif
>
>
>TINT ssFileSeek(HANDLE hf, TINT distance,
> WORD MoveMethod = FILE_BEGIN);
>TINT ssFileEnd(HANDLE hf);
>TINT ssFilePos(HANDLE hf);
>BOOL ssFileRewind(HANDLE hf);
>BOOL ssFileLock(HANDLE hf, TINT Offset, TINT nBytes);
>BOOL ssFileUnlock(HANDLE hf, TINT Offset, TINT nBytes);
>TFILESIZE ssFileSize(HANDLE hf);
>
>#endif // __SS64LIB_H
>
>//------------------------------------------------------------
>// File Name : ss64lib.cpp
>//------------------------------------------------------------
>
>#include "ss64lib.h"
>
>TINT ssFileSeek(HANDLE hf, TINT dist, WORD method)
>{
> LARGE_INTEGER li;
> li.QuadPart = dist;
> li.LowPart = SetFilePointer (hf, li.LowPart, &li.HighPart, method);
> if (li.LowPart == 0xFFFFFFFF && GetLastError() != NO_ERROR) {
> li.QuadPart = -1;
> }
> return li.QuadPart;
>}
>
>BOOL ssFileRewind(HANDLE hf)
>{
> return ssFileSeek(hf,0,FILE_BEGIN) == 0;
>}
>
>TINT ssFilePos(HANDLE hf)
>{
> return ssFileSeek(hf,0,FILE_CURRENT);
>}
>
>TINT ssFileEnd(HANDLE hf)
>{
> return ssFileSeek(hf,0,FILE_END);
>}
>
>BOOL ssFileLock(HANDLE hf, TINT Offset, TINT nBytes)
>{
> LARGE_INTEGER fp, nb;
> fp.QuadPart = Offset;
> nb.QuadPart = nBytes;
> return LockFile (hf,fp.LowPart,fp.HighPart,nb.LowPart,nb.HighPart);
>}
>
>BOOL ssFileUnlock(HANDLE hf, TINT Offset, TINT nBytes)
>{
> LARGE_INTEGER fp, nb;
> fp.QuadPart = Offset;
> nb.QuadPart = nBytes;
> return UnlockFile(hf,fp.LowPart,fp.HighPart,nb.LowPart,nb.HighPart);
>}
>
>TFILESIZE ssFileSize(HANDLE hf)
>{
> LARGE_INTEGER li;
> li.LowPart = GetFileSize(hf,(DWORD *)&li.HighPart);
> return (TINT)li.QuadPart;
>}
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm