From: George Neuner on
On Thu, 3 Jun 2010 07:31:45 CST, "gast128(a)hotmail.com"
<gast128(a)hotmail.com> wrote:


>I partially agree. Clearing (byte) buffers with memset isn't less
>readable and might even be performance intensive (e.g. using images or
>bitmaps which have often considerable large pixel buffers).

True, but if the buffer has to be cleared (or in general, uniformly
filled), what choices do you have? std::fill won't be any faster and
may be slower.

There are VMM systems that can zero-fill allocated pages on first
access - which amortizes the fill and may improve performance for a
sparse buffer. But VMM functions are non-portable and the page
granularity of the fill can be wasteful if the access pattern consists
of lots of small sub-arrays.


>Also have a look at some memset implementations. VStudio uses even
>SSE2 (if present) for memset to get the last percent performance
>improvement. Even without this SSE2 stuff, it seems a non trivial
>piece of assembly code, which might be hard to reproduce by an
>ordinary compiler.

memset is just template code ... there may be a number of versions if
the user can select ALU vs SIMD and/or the CPU has special zero-fill
capabilities.

In any case, the algorithm is simple:
- byte fill until the address is aligned for long fill
- calculate the # of iterations of long fill
- construct long fill pattern
- do long fills
- byte fill any remaining locations

but it can result in a healthy chunk of code depending on the CPU's
capabilities. Using SIMD registers vs ALU registers really only
changes the alignment and iteration calculation. And even if there is
a special zero-fill version, if the call site uses a variable for the
fill pattern, the compiler has to use the general version unless it
can prove the value of the variable is zero (which sometimes is simple
but often is not - and some compilers don't bother trying).

George

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Mathias Gaunard on
On 4 juin, 06:29, George Neuner <gneun...(a)comcast.net> wrote:
> On Thu, 3 Jun 2010 07:31:45 CST, "gast...(a)hotmail.com"
>
> <gast...(a)hotmail.com> wrote:
> >I partially agree. Clearing (byte) buffers with memset isn't less
> >readable and might even be performance intensive (e.g. using images or
> >bitmaps which have often considerable large pixel buffers).
>
> True, but if the buffer has to be cleared (or in general, uniformly
> filled), what choices do you have? std::fill won't be any faster and
> may be slower.

std::fill falls back to std::memset when the memory is contiguous...
It is called overloading.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Jens Schmidt on
George Neuner wrote:

[zero fill]
> In any case, the algorithm is simple:
> - byte fill until the address is aligned for long fill
> - calculate the # of iterations of long fill
> - construct long fill pattern
> - do long fills
> - byte fill any remaining locations
>
> but it can result in a healthy chunk of code depending on the CPU's
> capabilities.

There are architectures where the code can be reduced to
- byte fill all locations
without any performance penalty. This happens when a) the CPU is
executing instructions much faster than the memory system can write
and b) the memory system uses write combining and special handling
for sequential access.
--
Viele Grüße,
Jens Schmidt


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Hakusa on
On Jun 4, 10:34 am, Mathias Gaunard <loufo...(a)gmail.com> wrote:

> std::fill falls back to std::memset when the memory is contiguous...
> It is called overloading.

Theoretically, you're right, but i tried that and found it untrue.
Here's an abridged repost:

I'm told that std::fill is specialized for POD types, but the
fallowing code...

void usingbe_memset( int* begin, int* end, int value )
{
memset( begin, value, (end-begin)*sizeof(int) );
}

void usingbe_fill( int* begin, int* end, int value )
{
std::fill( begin, end, value );
}

Compiled with g++ version 4.4.1, command line options "-c -O3 -S". The
assembly, for brevity, i've posted here: http://pastebin.com/PVUUGGqH

For those who can't read assembly, basically, the std::fill code is a
straight loop, the memset code is done without looping, mostly in one
assembly line (highlighted). Theoretically, there should be no
difference, but it reality, there is.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Mathias Gaunard on
On 4 juin, 22:34, "Hak...(a)gmail.com" <hak...(a)gmail.com> wrote:

> I'm told that std::fill is specialized for POD types, but the
> fallowing code...
> [...]

> For those who can't read assembly, basically, the std::fill code is a
> straight loop, the memset code is done without looping, mostly in one
> assembly line (highlighted). Theoretically, there should be no
> difference, but it reality, there is.

Looks like a quality of implementation issue.
I suspect the result be different on MSVC, which has native support to
tell whether a type is a POD or has a trivial assignment.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]