From: Peter J. Holzer on
On 2010-07-28 15:09, Wolfram Humann <w.c.humann(a)arcor.de> wrote:
> I went with Ilya's proposal but inserted the line a little further
> down, just after
> if (newlen > SvLEN(sv)) { /* need more room? */
>
> So now we have:
> if (newlen > SvLEN(sv)) { /* need more room? */
> newlen += (newlen >> 2) + 10;
> #ifndef Perl_safesysmalloc_size
> newlen = PERL_STRLEN_ROUNDUP(newlen);
> #endif
> if (SvLEN(sv) && s) {
> s = (char*)saferealloc(s, newlen);
> }
>
> The remaining question is by what ratio a string's memory should grow.
> I tried several values from (newlen >> 0) to (newlen >> 6) for the
> best compromise between execution time and memory usage and my
> personal favorite is (newlen >> 2). What do others here think?

That sounds about right. I've used factors between 1.2 and 1.5 in the
past for similar problems. I suggest you base the growth on the old
size, though, something like:

if (newlen > SvLEN(sv)) { /* need more room? */
size_t min = SvLEN(sv) * 5/4 + 10;
if (newlen < min) newlen = min;
...

This gives you the same growth pattern if the increments are small, but
it doesn't allocate extra memory if you append a large chunk.

hp

From: Ilya Zakharevich on
On 2010-07-28, Wolfram Humann <w.c.humann(a)arcor.de> wrote:
> So now we have:
> if (newlen > SvLEN(sv)) { /* need more room? */
> newlen += (newlen >> 2) + 10;
> #ifndef Perl_safesysmalloc_size
> newlen = PERL_STRLEN_ROUNDUP(newlen);
> #endif
> if (SvLEN(sv) && s) {
> s = (char*)saferealloc(s, newlen);
> }

I think you make your life too simple. What you must do is find the
chain of events which sets
Perl_safesysmalloc_size/PERL_STRLEN_ROUNDUP, and modify this chain. :-(

> I tried several values from (newlen >> 0) to (newlen >> 6) for the
> best compromise between execution time and memory usage and my
> personal favorite is (newlen >> 2). What do others here think?

My approach is never to take responsibility for such decisions.
Make a default value, and shift responsibility to the user. ;-)

#ifndef PERL_STRLEN_ROUNDUP_SHIFT
# define PERL_STRLEN_ROUNDUP_SHIFT 2
#endif

The suggestion to use the OLD length is also very viable...

> 1E5 chars + 1E4 x 1E2 chars: 1.5 ms

I hope your `ms' are actually seconds. It does not make sense to
measure performance on runs shorter than a second (maybe more on Win,
which is doing more unknown stuff in background)...

Yours,
Ilya
From: Wolfram Humann on
On Jul 29, 6:30 am, Ilya Zakharevich <nospam-ab...(a)ilyaz.org> wrote:
> On 2010-07-28, Wolfram Humann <w.c.hum...(a)arcor.de> wrote:
>
> > So now we have:
> >     if (newlen > SvLEN(sv)) {           /* need more room? */
> >    newlen += (newlen >> 2) + 10;
> > #ifndef Perl_safesysmalloc_size
> >    newlen = PERL_STRLEN_ROUNDUP(newlen);
> > #endif
> >    if (SvLEN(sv) && s) {
> >        s = (char*)saferealloc(s, newlen);
> >    }
>
> I think you make your life too simple.  What you must do is find the
> chain of events which sets
> Perl_safesysmalloc_size/PERL_STRLEN_ROUNDUP, and modify this chain.  :-(

I'm traveling territory that's fairly unknown to me already, so if
this is too simple I should leave it to someone more knowledgeable to
do it right. What I *think* is, that PERL_STRLEN_ROUNDUP is just
concerned with memory boundary alignment, e.g. roundup to the next
multiple of 4. If this is the case, it should be independent of any
string memory expansion strategy.

>
> > I tried several values from (newlen >> 0) to (newlen >> 6) for the
> > best compromise between execution time and memory usage and my
> > personal favorite is (newlen >> 2). What do others here think?
>
> My approach is never to take responsibility for such decisions.
> Make a default value, and shift responsibility to the user.  ;-)
>
> #ifndef PERL_STRLEN_ROUNDUP_SHIFT
> #  define PERL_STRLEN_ROUNDUP_SHIFT 2
> #endif

Given that nobody stumbled across the devastating current state, I
don't envision hoards of users trying to optimize this. But I agree
that it serves well as a reminder for someone reading the code that
this is not *the* correct value but a trade-off and could be decided
differently. However, given that IMO it's a different concept from the
existing PERL_STRLEN_ROUNDUP, I would prefer to give it a different
name. How about PERL_STRLEN_EXPAND_SHIFT?

> The suggestion to use the OLD length is also very viable...

Agreed.

> > 1E5 chars + 1E4 x 1E2 chars:    1.5 ms
>
> I hope your `ms' are actually seconds.  It does not make sense to
> measure performance on runs shorter than a second (maybe more on Win,
> which is doing more unknown stuff in background)...

No, these are milliseconds. Yes, this makes it a lousy benchmark. Yes,
these numbers do vary easily by +-30% and sometimes more from run to
run. I tried to post "average" runs, but that's even more subjective,
of course. However the time differences encountered between different
cases are often a factor of 10 or even much more (in the unmodified
case, many of these do run for several seconds ;-)), so I think this
is a valid base for comparison.

So my current proposal reads like this:

#ifndef PERL_STRLEN_EXPAND_SHIFT
# define PERL_STRLEN_EXPAND_SHIFT 2
#endif

if (newlen > SvLEN(sv)) { /* need more room? */
size_t minlen = SvCUR(sv);
minlen += (minlen >> PERL_STRLEN_EXPAND_SHIFT) + 10;
if (newlen < minlen) newlen = minlen;
#ifndef Perl_safesysmalloc_size
newlen = PERL_STRLEN_ROUNDUP(newlen);
#endif
if (SvLEN(sv) && s) {
s = (char*)saferealloc(s, newlen);
}

My benchmark script does run slower now. The previous version did
expand allocated memory beyond minimum requirements during the initial
assignment, so that the reallocation count during append was 0 in many
cases. The new version does not do that so that at least 1 realloc is
required when one starts appending to the string.

Wolfram