From: Inquisitive Scientist on
I am having problems with running copy.deepcopy on very large data
structures containing lots of numeric data:

1. copy.deepcopy can be very slow
2. copy.deepcopy can cause memory errors even when I have plenty of
memory

I think the problem is that the current implementation keeps a memo
for everything it copies even immutable types. In addition to being
slow, this makes the memo dict grow very large when there is lots of
simple numeric data to be copied. For long running programs, large
memo dicts seem to cause memory fragmentation and result in memory
errors.

It seems like this could be easily fixed by adding the following lines
at the very start of the deepcopy function:

if isinstance(x, (type(None), int, long, float, bool, str)):
return x

This seems perfectly safe, should speed things up, keep the memo dict
smaller, and be easy to add. Can someone add this to copy.py or point
me to the proper procedure for requesting this change in copy.py?

Thanks,
-I.S.
From: Stefan Behnel on
Inquisitive Scientist, 16.07.2010 14:45:
> I am having problems with running copy.deepcopy on very large data
> structures containing lots of numeric data:
>
> 1. copy.deepcopy can be very slow
> 2. copy.deepcopy can cause memory errors even when I have plenty of
> memory
>
> I think the problem is that the current implementation keeps a memo
> for everything it copies even immutable types. In addition to being
> slow, this makes the memo dict grow very large when there is lots of
> simple numeric data to be copied. For long running programs, large
> memo dicts seem to cause memory fragmentation and result in memory
> errors.
>
> It seems like this could be easily fixed by adding the following lines
> at the very start of the deepcopy function:
>
> if isinstance(x, (type(None), int, long, float, bool, str)):
> return x
>
> This seems perfectly safe, should speed things up, keep the memo dict
> smaller, and be easy to add.

and - have you tried it?

Stefan

From: Steven D'Aprano on
On Fri, 16 Jul 2010 05:45:50 -0700, Inquisitive Scientist wrote:

> I am having problems with running copy.deepcopy on very large data
> structures containing lots of numeric data:
[...]
> This seems perfectly safe, should speed things up, keep the memo dict
> smaller, and be easy to add. Can someone add this to copy.py or point me
> to the proper procedure for requesting this change in copy.py?

These are the minimum steps you can take:

(1) Go to the Python bug tracker: http://bugs.python.org/

(2) If you don't already have one, create an account.

(3) Create a new bug report, explaining why you think deepcopy is buggy,
the nature of the bug, and your suggested fix.

If you do so, it might be a good idea to post a link to the bug here, for
interested people to follow up.

However doing the minimum isn't likely to be very useful. Python is
maintained by volunteers, and there are more bugs than person-hours
available to fix them. Consequently, unless a bug is serious, high-
profile, or affects a developer personally, it is likely to be ignored.
Sometimes for years. Sad but true.

You can improve the odds of having the bug (assuming you are right that
it is a bug) fixed by doing more than the minimum. The more of these you
can do, the better the chances:

(4) Create a test that fails with the current code, following the
examples in the standard library tests. Confirm that it fails with the
existing module.

(5) Patch the copy module to fix the bug. Confirm that the new test
passes with your patch, and that you don't cause any regressions (failed
tests).

(6) Create a patch file that adds the new test and the patch. Upload it
to the bug tracker.

There's no point in writing the patch for Python 2.5 or 3.0, don't waste
your time. Version 2.6 *might* be accepted. 2.7 and/or 3.1 should be,
provided people agree that it is a bug.

If you do all these things -- demonstrate successfully that this is a
genuine bug, create a test for it, and fix the bug without breaking
anything else, then you have a good chance of having the fix accepted.

Good luck! Your first patch is always the hardest.



--
Steven
From: Mark Lawrence on
On 16/07/2010 14:59, Steven D'Aprano wrote:

[snip]

> However doing the minimum isn't likely to be very useful. Python is
> maintained by volunteers, and there are more bugs than person-hours
> available to fix them. Consequently, unless a bug is serious, high-
> profile, or affects a developer personally, it is likely to be ignored.
> Sometimes for years. Sad but true.
>

To give people an idea, here's the weekly Summary of Python tracker
Issues on python-dev and timed at 17:07 today.

"
2807 open (+44) / 18285 closed (+18) / 21092 total (+62)

Open issues with patches: 1144

Average duration of open issues: 703 days.
Median duration of open issues: 497 days.

Open Issues Breakdown
open 2765 (+42)
languishing 14 ( +0)
pending 27 ( +2)

Issues Created Or Reopened (64)
"

I've spent a lot of time helping out in the last few weeks on the issue
tracker. The oldest open issue I've come across was dated 2001, and
there could be older. Unless more volunteers come forward, particularly
to do patch reviews or similar, the situation as I see it can only get
worse.

Kindest regards.

Mark Lawrence.