From: Ersek, Laszlo on
On Wed, 30 Jun 2010, Nicolas George wrote:

> Rainer Weikusat wrote in message <87aaqcsd19.fsf(a)fever.mssgmbh.com>:

>> There is no need to do 'dynamic reallocation' when parsing the contents
>> of some input buffer provided the buffer size is larger than the record
>> size.
>
> I do not know how you like to program, or even if you can program at
> all, but when I design a program, I like it to be able to deal with big
> inputs when necessary, but not allocates huge amounts of memory each
> time it reads a few dozens octets.

Sorry to interrupt, but your arguments don't collide. Each method
described depends on a corresponding condition, and those conditions
exclude each other ("provided the buffer size is larger than the record
size" <-> "deal with big inputs when necessary"). I do think the
expression "record size" (without further qualifications) excludes
megabyte-sized logical records.

lacos
From: Nicolas George on
Rainer Weikusat wrote in message <876310sb2e.fsf(a)fever.mssgmbh.com>:
> Fine. Back to square one: Assuming you send an a priory unknown record
> size which has neither a practical nor a theoretical limit, you
> may need to do 'dynamic buffer reallocation' after having received the
> length and possible even while receiving the length,

No.

> In the real world

Do not use words you do not understand.

> So far, you have posted a couple of assertions I have refuted

The correct term is "misunderstood".

Now please go away, having currently only one typing hand, I have at least
twice less time to waste with cheap trolls like you.
From: Nicolas George on
"Ersek, Laszlo" wrote in message
<Pine.LNX.4.64.1006301900350.9224(a)login03.caesar.elte.hu>:
> Sorry to interrupt, but your arguments don't collide. Each method
> described depends on a corresponding condition, and those conditions
> exclude each other ("provided the buffer size is larger than the record
> size" <-> "deal with big inputs when necessary"). I do think the
> expression "record size" (without further qualifications) excludes
> megabyte-sized logical records.

Let me clarify: let us assume that we want to receive a text file. In this
case, "text" means that we can assume that some control character is not
expected to happen and can be safely used as a terminator. The input is
trusted (there is no need to guard against memory exhaustion denial of
service), and must be loaded completely in memory for processing.
Furthermore, the input stream is wrapped in a good library that handles
gracefully network errors, buffering, etc. Most of the files are expected to
be quite small, but a few may be very big.

With the size+payload protocol, the implementation could not be more
straightforward: read the size, allocate the memory, read the file.

With an end marker, the lazy way is to allocate a big area and read into it
until you find the end marker. If the file does not fit in the area, fail.
The area have to be big enough to fit the biggest expected file? Let us make
it 10�Mo. Too bad, there is one 400�Mo file. Then let us make it 500�Mo.
Well, in that case, we can not load 8 small files in simultaneous threads on
a 32�bits arch: the address space is exhausted.

Fixed-size receiving areas are rarely good.

To avoid them, the solution is dynamic reallocating: allocate a small area,
read into it; if the marker is found, fine. If it is not, enlarge the area
(doubling is a good way, it achieve average linear complexity) and read some
more, and loop.

This last solution is fine; I have implemented it dozens of times. But what
a waste of time compared to when you already know the size of the data! That
is exactly what I called "annoying".
From: Ersek, Laszlo on
On Wed, 30 Jun 2010, Nicolas George wrote:

> "Ersek, Laszlo" wrote in message
> <Pine.LNX.4.64.1006301900350.9224(a)login03.caesar.elte.hu>:

>> Sorry to interrupt, but your arguments don't collide. Each method
>> described depends on a corresponding condition, and those conditions
>> exclude each other ("provided the buffer size is larger than the record
>> size" <-> "deal with big inputs when necessary"). I do think the
>> expression "record size" (without further qualifications) excludes
>> megabyte-sized logical records.
>
> Let me clarify: let us assume that we want to receive a text file. In
> this case, "text" means that we can assume that some control character
> is not expected to happen and can be safely used as a terminator. The
> input is trusted (there is no need to guard against memory exhaustion
> denial of service), and must be loaded completely in memory for
> processing. Furthermore, the input stream is wrapped in a good library
> that handles gracefully network errors, buffering, etc. Most of the
> files are expected to be quite small, but a few may be very big.
>
> With the size+payload protocol, the implementation could not be more
> straightforward: read the size, allocate the memory, read the file.
>
> With an end marker, the lazy way is to allocate a big area and read into
> it until you find the end marker. If the file does not fit in the area,
> fail. The area have to be big enough to fit the biggest expected file?
> Let us make it 10�Mo. Too bad, there is one 400�Mo file. Then let us
> make it 500�Mo. Well, in that case, we can not load 8 small files in
> simultaneous threads on a 32�bits arch: the address space is exhausted.
>
> Fixed-size receiving areas are rarely good.
>
> To avoid them, the solution is dynamic reallocating: allocate a small
> area, read into it; if the marker is found, fine. If it is not, enlarge
> the area (doubling is a good way, it achieve average linear complexity)
> and read some more, and loop.
>
> This last solution is fine; I have implemented it dozens of times. But
> what a waste of time compared to when you already know the size of the
> data! That is exactly what I called "annoying".

I agree pretty much with what you describe (and extra kudos for making the
effort of typing all that with one hand). However, when Rainer wrote
"record size", I imagined a protocol where the largest message ever can't
be bigger than, say, 16 KB. For that order of magnitude, putting the
complete char unsigned array in the client connection struct may perfectly
suffice, I believe. (Assuming synchronous requests, a hosted platform, and
no special performance needs.)

Thanks,
lacos
From: Rainer Weikusat on
"Ersek, Laszlo" <lacos(a)caesar.elte.hu> writes:
> On Wed, 30 Jun 2010, Nicolas George wrote:
>> "Ersek, Laszlo" wrote in message
>> <Pine.LNX.4.64.1006301900350.9224(a)login03.caesar.elte.hu>:
>
>>> Sorry to interrupt, but your arguments don't collide. Each method
>>> described depends on a corresponding condition, and those
>>> conditions exclude each other ("provided the buffer size is larger
>>> than the record size" <-> "deal with big inputs when necessary"). I
>>> do think the expression "record size" (without further
>>> qualifications) excludes megabyte-sized logical records.
>>
>> Let me clarify: let us assume that we want to receive a text
>> file.

[...]

> I agree pretty much with what you describe (and extra kudos for making
> the effort of typing all that with one hand). However, when Rainer
> wrote "record size", I imagined a protocol where the largest message
> ever can't be bigger than, say, 16 KB.

The original topic happened to be 'network protocols'. Network
protocols usually don't employ 'arbitrarily large' records and if the
protocol is yet to be designed, as happened to be the case here,
ensuring that the maximum size of a protocol data record is bounded is
a matter of designing the protocol in this way. The receiver needs to
enforce a (fairly small) upper limit for the maximum line length,
anyway, to prevent DoS-attacks. Nicolas George has now demonstrated
that he is capable of designing a least one protocol which needs to be
handled in the way he wants to handle it (and which seems inherently
prone to DoS-attacks in the present version). This communicates
nothing about differently designed protocols and real world network
protocols are designed differently.