From: Ole-Hjalmar Kristensen on
>>>>> "GB" == Georg Bauhaus <rm.dash-bauhaus(a)futureapps.de> writes:

I think the problem is not the explicit copy which happens when we hit
the end of the buffer (easily verified, just increase the buffer size
10 times, the copying will happen 1/10 of the time, but execution time
stays the same), but the implicit copy which seems to happen in the statement
return In_Buf(Start..I-1); I do not know if this copy is unavoidable,
or if the compiler could have optimized it away.

The idea of using a storage pool is an interesting one, since it could
allow you to avoid copying when you want to return a string from a
function. But how would you overlay the string on top of the read
buffer? An Ada string is not a C string, you would need to put the
descriptor somewhere as well.

GB> Ole-Hjalmar Kristensen schrieb:
>> astra06:~/ada/div-ada> uname -a
>> SunOS astra06 5.10 Generic_137137-09 sun4u sparc SUNW,Sun-Fire-V440
>> astra06:~/ada/div-ada> ls -l x.txt
>> -rw------- 1 ok145024 clustra 60000000 Sep 1 12:03 x.txt
>> astra06:~/ada/div-ada> time ./get_line_test < x.txt
>> 0.49u 0.09s 0:00.59 98.3%
>>


GB> Comparing the two programs, they are on a par, almost
GB> exactly.

GB> However, yours is IMHO less messy. It also has

GB> return Copy & Get_Line;

GB> which, I guess, solves the problem of long lines, even
GB> at the expense of copying.

GB> Can we reduce the copying overhead?
GB> The copying would be have O(f(Item'Length)) with
GB> f(n) ~= s*(s+1)/2 where s = n/BUFSIZ, so it is in O(n**2).
GB> We could, I think, use a storage pool:

GB> - whenever copying is needed, allocate another buffer
GB> right next to the current buffer. Use the newly allocated
GB> buffer for the recursive call.

GB> - on reaching of line/file, reinterpret the sequence of
GB> buffers in the storage pool as a String. Finally, Free.

GB> Can this be done?

GB> (This won't make a Shootout program any smaller, but
GB> seems good for the general case...)

--
C++: The power, elegance and simplicity of a hand grenade.
From: Georg Bauhaus on
Ole-Hjalmar Kristensen schrieb:
>>>>>> "GB" == Georg Bauhaus <rm.dash-bauhaus(a)futureapps.de> writes:
>
> I think the problem is not the explicit copy which happens when we hit
> the end of the buffer (easily verified, just increase the buffer size
> 10 times, the copying will happen 1/10 of the time, but execution time
> stays the same), but the implicit copy which seems to happen in the statement
> return In_Buf(Start..I-1); I do not know if this copy is unavoidable,
> or if the compiler could have optimized it away.

WRT return In_Buf(Start..I-1), there seems to be a way to have the
bounds shift towards 1 .. Result'Length without copying:

with Ada.Text_IO;
procedure Bnds is
function S return String is
begin
return String'(10 .. 20 => '*');
end S;

X : constant String := S;
begin
Ada.Text_IO.Put_Line(Positive'Image(X'First));
declare
subtype XString is String(1 .. X'Length);
Z : XString;
pragma Import (Ada, Z);
for Z'Address use X'Address;
begin
Ada.Text_IO.Put_Line(Positive'Image(Z'First));
end;
end Bnds;

Is this "phantom sliding" accidental or can it be relied upon?


> The idea of using a storage pool is an interesting one, since it could
> allow you to avoid copying when you want to return a string from a
> function. But how would you overlay the string on top of the read
> buffer? An Ada string is not a C string, you would need to put the
> descriptor somewhere as well.

I'll reread "Accessing Memory as a String",
http://adapower.com/adapower1/lang/accessmem.html

Maybe this leads somewhere regarding this question.
From: Georg Bauhaus on
Ole-Hjalmar Kristensen wrote:
>>>>>> "GB" == Georg Bauhaus <rm.dash-bauhaus(a)futureapps.de> writes:
>
> I think the problem is not the explicit copy which happens when we hit
> the end of the buffer (easily verified, just increase the buffer size
> 10 times, the copying will happen 1/10 of the time, but execution time
> stays the same), but the implicit copy which seems to happen in the statement
> return In_Buf(Start..I-1); I do not know if this copy is unavoidable,
> or if the compiler could have optimized it away.

It looks like copying In_Buf(Start..I-1) be circumvented.
I have added a solution to Line_IO.

read() performance vs. Stream_IO.Read performance seems
to depend on the OS. On a Mac here, read() is faster.
On Ubuntu, Stream_IO.Read is faster.

There is an issue with GNATs and End_File (GNATs other
than I guess the one you have). When comparing N <= 0,
``operator for type "System.Crtl.size_t" is not
directly visible''. To work around this, I wrote

function End_File return Boolean is
type Size_T is mod 2**Standard'Address_Size;
N0 : constant Size_T := Size_T(N);
Zero : constant Size_T := 0;
begin
return N0 <= Zero;
end End_File;

Finally, I think Get_Line_Test can be made more portable among Unix
compilers by changing GNAT specific Interfaces.C_Streams
into Interfaces.C; Could be slightly less convenient to
write, though?
From: Ole-Hjalmar Kristensen on
>>>>> "GB" == Georg Bauhaus <rm.tsoh.plus-bug.bauhaus(a)maps.futureapps.de> writes:

GB> Ole-Hjalmar Kristensen wrote:
>>>>>>> "GB" == Georg Bauhaus <rm.dash-bauhaus(a)futureapps.de> writes:
>>
>> I think the problem is not the explicit copy which happens when we hit
>> the end of the buffer (easily verified, just increase the buffer size
>> 10 times, the copying will happen 1/10 of the time, but execution time
>> stays the same), but the implicit copy which seems to happen in the statement
>> return In_Buf(Start..I-1); I do not know if this copy is unavoidable,
>> or if the compiler could have optimized it away.

GB> It looks like copying In_Buf(Start..I-1) be circumvented.
GB> I have added a solution to Line_IO.

GB> read() performance vs. Stream_IO.Read performance seems
GB> to depend on the OS. On a Mac here, read() is faster.
GB> On Ubuntu, Stream_IO.Read is faster.

Strange. I cannot really imagine how Stream_IO.Read is faster since it has to
do the system call to do the actual read anyway. One possibility is
that In_Buf is not properly aligned, so that read() into In_Buf is
slower than the equivalent read() into the C stdio buffers. The system
call should be the same in any case.

GB> There is an issue with GNATs and End_File (GNATs other
GB> than I guess the one you have). When comparing N <= 0,
GB> ``operator for type "System.Crtl.size_t" is not
GB> directly visible''. To work around this, I wrote

GB> function End_File return Boolean is
GB> type Size_T is mod 2**Standard'Address_Size;
GB> N0 : constant Size_T := Size_T(N);
GB> Zero : constant Size_T := 0;
GB> begin
GB> return N0 <= Zero;
GB> end End_File;

Yes. I was using an old version of GNAT.

GB> Finally, I think Get_Line_Test can be made more portable among Unix
GB> compilers by changing GNAT specific Interfaces.C_Streams
GB> into Interfaces.C; Could be slightly less convenient to
GB> write, though?

I only included Interfaces.C_Streams because it was a convenient place
to get hold of the types needed to import the read() function, so if
you find them in Interfaces.C, that's just fine.

--
C++: The power, elegance and simplicity of a hand grenade.
From: Ole-Hjalmar Kristensen on
>>>>> "GB" == Georg Bauhaus <rm.tsoh.plus-bug.bauhaus(a)maps.futureapps.de> writes:

GB> Ole-Hjalmar Kristensen wrote:
>>>>>>> "GB" == Georg Bauhaus <rm.dash-bauhaus(a)futureapps.de> writes:
>>
>> I think the problem is not the explicit copy which happens when we hit
>> the end of the buffer (easily verified, just increase the buffer size
>> 10 times, the copying will happen 1/10 of the time, but execution time
>> stays the same), but the implicit copy which seems to happen in the statement
>> return In_Buf(Start..I-1); I do not know if this copy is unavoidable,
>> or if the compiler could have optimized it away.

GB> It looks like copying In_Buf(Start..I-1) be circumvented.
GB> I have added a solution to Line_IO.

How did you avoid the copying?

--
C++: The power, elegance and simplicity of a hand grenade.