StringChain -- a data structure for managing large sequencesof chunks of bytes [Python]

Prev: StringChain -- a data structure for managing large sequences of chunks of bytes
Next: python module/utility equivalent to 'time' (linux) and/or 'ntimer'(Windows)

From: MRAB on 12 Mar 2010 08:40

Steven D'Aprano wrote:
> On Fri, 12 Mar 2010 00:11:37 -0700, Zooko O'Whielacronx wrote:
>
>> Folks:
>>
>> Every couple of years I run into a problem where some Python code that
>> worked well at small scales starts burning up my CPU at larger scales,
>> and the underlying issue turns out to be the idiom of accumulating data
>> by string concatenation.
>
> I don't mean to discourage you, but the simple way to avoid that is not
> to accumulate data by string concatenation.
>
> The usual Python idiom is to append substrings to a list, then once, at
> the very end, combine into a single string:
>
>
> accumulator = []
> for item in sequence:
> accumulator.append(process(item))
> string = ''.join(accumulator)
>
>
>> It just happened again
>> (http://foolscap.lothar.com/trac/ticket/149 ), and as usual it is hard
>> to make the data accumulator efficient without introducing a bunch of
>> bugs into the surrounding code.
>
> I'm sorry, I don't agree about that at all. I've never come across a
> situation where I wanted to use string concatenation and couldn't easily
> modify it to use the list idiom above.
>
> [...]
>> Here are some benchmarks generated by running python -OOu -c 'from
>> stringchain.bench import bench; bench.quick_bench()' as instructed by
>> the README.txt file.
>
> To be taken seriously, I think you need to compare stringchain to the
> list idiom. If your benchmarks favourably compare to that, then it might
> be worthwhile.
>
IIRC, someone did some work on making concatenation faster by delaying
it until a certain threshold had been reached (in the string class
implementation).

|
Pages: 1
Prev: StringChain -- a data structure for managing large sequences of chunks of bytes
Next: python module/utility equivalent to 'time' (linux) and/or 'ntimer'(Windows)