From: George Neuner on
On Sat, 05 Jun 2010 22:08:43 -0400, George Neuner
<gneuner2(a)comcast.net> wrote:

>The sequence:
>
> movl 8(%ebp), %edx
> movl 12(%ebp), %ecx
> movb 16(%ebp), %al
> movl %edx, %edi
> rep stosb
>
>sets up the byte pattern in AL, the destination address in EDX and the
>count in ECX. "rep stosb" triggers the loop which implements:
>
> while (--ECX >= 0)
> *EDX++ = AL;

Whoops, "EDX" should have been "EDI" in the above description.

George

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: George Neuner on
On Fri, 4 Jun 2010 15:34:18 CST, "Hakusa(a)gmail.com"
<hakusa(a)gmail.com> wrote:

>I'm told that std::fill is specialized for POD types, but the
>following code...
>
>void usingbe_memset( int* begin, int* end, int value )
>{
> memset( begin, value, (end-begin)*sizeof(int) );
>}
>
>void usingbe_fill( int* begin, int* end, int value )
>{
> std::fill( begin, end, value );
>}
>
>Compiled with g++ version 4.4.1, command line options "-c -O3 -S". The
>assembly, for brevity, i've posted here: http://pastebin.com/PVUUGGqH
>
>For those who can't read assembly, basically, the std::fill code is a
>straight loop, the memset code is done without looping, mostly in one
>assembly line (highlighted). Theoretically, there should be no
>difference, but it reality, there is.

memset *is* looping ... the loop is simply in microcode (or in
whatever now passes for microcode) instead of being explicit in the
instruction stream.

The sequence:

movl 8(%ebp), %edx
movl 12(%ebp), %ecx
movb 16(%ebp), %al
movl %edx, %edi
rep stosb

sets up the byte pattern in AL, the destination address in EDX and the
count in ECX. "rep stosb" triggers the loop which implements:

while (--ECX >= 0)
*EDX++ = AL;

However, for a large buffer, I think this ought to be sub-optimal on
modern x86 processors - particularly on HT processors. Since the
memory bus is 64-bits, I would think it would be better to use SSE2 or
maybe even the FPU on 32-bit chips, and quadword (stosq) on 64-bit
chips, so that the write combine buffer is left available for other
stores.

Guess I'll have to try it.

George

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Martin Vejnár on
On 6/4/2010 11:34 PM, Hakusa(a)gmail.com wrote:
> On Jun 4, 10:34 am, Mathias Gaunard<loufo...(a)gmail.com> wrote:
>
>> std::fill falls back to std::memset when the memory is contiguous...
>> It is called overloading.
>
> Theoretically, you're right, but i tried that and found it untrue.
> Here's an abridged repost:
>
> void usingbe_memset( int* begin, int* end, int value )
> {
> memset( begin, value, (end-begin)*sizeof(int) );
> }
>
> void usingbe_fill( int* begin, int* end, int value )
> {
> std::fill( begin, end, value );
> }

The two functions have different semantics; if you want to compare them, you need to change the type of the range from int to char. I just tested the following code on msvc10:

char a[42];
std::fill(a, a + 42, 0);

and it resulted in a call to memset as expected.

--
Martin

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Andrew on
On 5 June, 20:48, Mathias Gaunard <loufo...(a)gmail.com> wrote:
> On 4 juin, 22:34, "Hak...(a)gmail.com" <hak...(a)gmail.com> wrote:
>
> > I'm told that std::fill is specialized for POD types, but the
> > fallowing code...
> > [...]
> > For those who can't read assembly, basically, the std::fill code is a
> > straight loop, the memset code is done without looping, mostly in one
> > assembly line (highlighted). Theoretically, there should be no
> > difference, but it reality, there is.
>
> Looks like a quality of implementation issue.
> I suspect the result be different on MSVC, which has native support to
> tell whether a type is a POD or has a trivial assignment.

Indeed. I have been disappointed with the lack of such performance
optimisations in VS 2005. You would think it would specialise common
cases like char arrays but unfortunately not. Something to be
especially wary of is any code that is using iterators to go over a
container with a large number of items. In debug mode it will use
checked iterators which is extremely slow. It was quite a surpise to
me because the code I wrote at the time started off using GCC then as
soon as I switched to VS 2005 in debug it ground to a halt.

Regards,

Andrew Marlow

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: gast128 on
> The two functions have different semantics; if you want to compare them, you need to change the type of the range from int to
char. I just tested the following code on msvc10:
>
> char a[42];
> std::fill(a, a + 42, 0);
>
> and it resulted in a call to memset as expected.

Yes in vstudio 2003 there are overloads for std::fill with (unsigned)
char arguments (e.g. fill(char *_First, char *_Last, int _Val), see
xutility header file). For the overlaods, the impl. falls back to
memset.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]