How to create a shallow copy without calling a constructor? [C++]

Prev: Iterating over vectors - speed difference
Next: We Wait For Thee: char16_t, char32_t.

From: viboes on 2 Jan 2010 23:54

Hello,

I need to create a cache of a transactional object without using the
new operator nor the copy constructor of the class. This cache needs
only to copy the raw memory of the copied instance, is for that I have
called it shallow_clone

The following will create a deep copy and don't respect my
requirements

class C {
public:
C* shallow_clone() {
return new C(*this);
}
};

I have looked at uninitialized_copy but if I have understood it
correctly, it calls the copy constructor.
I have tried with

class C {
public:
C* shallow_clone() {
C* p = reinterpret_cast<C>(new char[sizeof(C)]);
if (p==0) {
throw std::bad_alloc();
}
std::memcpy(p, this, sizeof(C));
return p;
}
};

But I suspect that this is not correct in general. Is this correct on
some particular cases? if yes on witch ones?
Is there a way to create such a cache instance without calling to the
constructor in a portable way using some low level C++ interface?

Thanks,
Vicente

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Ulrich Eckhardt on 3 Jan 2010 22:19

viboes wrote:
> I need to create a cache of a transactional object without using the
> new operator nor the copy constructor of the class. This cache needs
> only to copy the raw memory of the copied instance, is for that I have
> called it shallow_clone

If you just want to copy the memory of an object, use this:

vector<char> m(sizeof o);
memcpy(&m[0], &o, sizeof o);

However, just having this memory is useless when it contains any pointers to
external storage or internal storage. With that in mind, I wonder where your
requirement to only copy the raw memory comes from.

> The following will create a deep copy and don't respect my
> requirements
>
> class C {
> public:
> C* shallow_clone() {
> return new C(*this);
> }
> };

If C actually is a POD, this boils down to a simple memcpy(). If C is not a
POD, a memcpy() wouldn't work, as mentioned above. So, you don't gain
anything from not doing it this way - unless of course there is some problem
to solve that you didn't explain here.

Note: It's sometimes better to ask for a solution to a problem than to ask
how to implement a very specific solution to that problem!

> C* shallow_clone() {
> C* p = reinterpret_cast<C>(new char[sizeof(C)]);
> if (p==0) {
> throw std::bad_alloc();
> }
> std::memcpy(p, this, sizeof(C));
> return p;
> }

The returned memory is not a C instance, so it shouldn't be a pointer to C!
I would rather return some handle to the memory and create a similar
function to restore the internal state from this memory, but this isn't
good.

> But I suspect that this is not correct in general. Is this correct on
> some particular cases? if yes on witch ones?

Apart from the unnecessary use of reinterpret_cast, the idea seems to be the
same as with the use of std::vector above.

> Is there a way to create such a cache instance without calling to the
> constructor in a portable way using some low level C++ interface?

Yes. How to do that depends on what exactly you are trying to achieve, which
isn't clear yet.

Uli

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: viboes on 5 Jan 2010 02:15

On Jan 4, 4:19 pm, Ulrich Eckhardt <dooms...(a)knuut.de> wrote:
> viboes wrote:
> > I need to create a cache of a transactional object without using the
> > new operator nor the copy constructor of the class. This cache needs
> > only to copy the raw memory of the copied instance, is for that I have
> > called it shallow_clone
>
> If you just want to copy the memory of an object, use this:
>
> vector<char> m(sizeof o);
> memcpy(&m[0], &o, sizeof o);

No I don't need just to copy the memory.

> However, just having this memory is useless when it contains any pointers to
> external storage or internal storage. With that in mind, I wonder where your
> requirement to only copy the raw memory comes from.

My context is Software Transactional Memory (STM). The shared
transactional objects need to be cached on transaction specific
objects. The pointers these transactional objects contains are
pointers to other transactional objects.

The idea I have in mind is that this cache don't need to do a deep
copy of the transactional object, but just a shallow copy, as the STM
system will take care of the copy of these pointee transactional
objects when modified. In addition I want the STM mechanism to
interfere as less as possible with the user space. That is why I don't
want to use copy constructor, assignment, neither class specific new/
delete operators.

> > The following will create a deep copy and don't respect my
> > requirements
>
> > class C {
> > public:
> > C* shallow_clone() {
> > return new C(*this);
> > }
> > };
>
> If C actually is a POD, this boils down to a simple memcpy(). If C is not a
> POD, a memcpy() wouldn't work, as mentioned above. So, you don't gain
> anything from not doing it this way - unless of course there is some problem
> to solve that you didn't explain here.

The only I see in the case the copy constructor is private, isn't it?

> Note: It's sometimes better to ask for a solution to a problem than to ask
> how to implement a very specific solution to that problem!
<snip>
> > Is there a way to create such a cache instance without calling to the
> > constructor in a portable way using some low level C++ interface?
>
> Yes. How to do that depends on what exactly you are trying to achieve, which
> isn't clear yet.

I will try to explain what I want to get as shortly as possible, but
it will be quite long.

The STM library I'm working on requires that any transactional object
inherits from

class base_transaction_object {
public:
virtual base_transaction_object*
make_cache(transaction* t) const = 0;
virtual void copy_cache(
base_transaction_object const * const) = 0;
virtual void delete_cache()=0;
virtual ~base_transaction_object() {};
...
};

I omit here why the STM library need this.

The same library defines a mixin class transaction_object using the
copy-constructor, the assignment operator and the new and delete
operators. This class is defined as follows:

template <
class Final,
class Base=base_transaction_object>
class transaction_object : public Base {
public:
base_transaction_object* make_cache() const
{
return new Final(this);
}
void delete_cache()
{
delete this;
}
void copy_cache(
base_transaction_object const * const rhs)
{
*static_cast<Final *>(this) =
*static_cast<Final const * const>(rhs);
}
};

Deep copy semantic liabilities: As the preceding transaction_object
class makes use of copy constructor, destructor and assignment
operator to manage the cache, if the class requires a deep copy, the
STM system will use the deep copy implementation. While deep copy is
needed for C++ classes with ownership on its members, it is not the
case with transaction specific cached objects owning other
transactional objects. Transaction specific cached objects are not
usual C++ objects, form the user perspective they don't exists at all,
they are seen as an implementation detail, so any allocation,
deallocation, copy from or copy to a transaction specific cached
object should not use the C++ new, delete, copy constructor or
assignment operator of the class.

The typical example of a class that can not be used with this mixin is
a non-copyable class and in particular any singleton class. An example
of a class on which the call to the constructor/destructor interferes
with the transactional world, is a class that don�t allows more than n
instances. If the cache instances are counted as instances of the
class, we couldn�t work with more than n-1 transactions.

Next follows an example of performance degradation:
struct Node {
tx::pointer<Node> next_; // this is a transactional pointer
};

class List : public transaction_object<List> {
std::size_t size_;
Node head_;
public:
// makes a deep-copy
List(List const& rhs);
// makes a deep-copy
List& operator=(List const& rhs);
�
};

As transaction_object is based on copy semantics, and the class List
owns all its elements, any change to the field size_ or head_ implies
the allocation of a transaction specific cache instance, a deep copy
of the complete list to the transaction specific cache before any
modification, and the deallocation this cached instance after the
transaction completes and possibly another copy from the cache to the
shared transactional object. Even if the implementation is correct, it
is clearly not efficient.

Form what you and others said, we can use std::memcpy instead of
calling to the CopyConstructor and the Assignment operator only for
classes that have a trivial copy constructor and a trivial assignment.
The STM system could provide also a trivial copy
trivial_transactional_object mixin applicable to classes for which
has_trivial_copy_semantics<C>::value is true.

template <class T>
struct has_trivial_copy_semantics :
boost::mpl::and_<
boost::has_trivial_copy_constructor<T>,
boost::has_trivial_assign<T>>
>
{};

template <
class Final,
class Base=base_transaction_object
>
class trivial_transaction_object
: public Base {
public:
base_transaction_object* make_cache() const
{
Final* p = cache_allocate<Final>(t);
std::memcpy(p,
static_cast<Final const*>(this),
sizeof(Final));
return p;
}
void delete_cache()
{
cache_deallocate(this);
}
void copy_cache(
base_transaction_object const * const rhs)
{
std::memcpy(static_cast<Final *>(this),
static_cast<Final const * const>(rhs),
sizeof(Final));
}
};
where
template <class T>
T* cache_allocate()
{
return reinterpret_cast<T*>(
new char[sizeof(T)]);
}
template <class T>
void cache_deallocate(T* ptr)
{
delete [] reinterpret_cast<char*>(ptr);
}

IMO there is no need to call the destructor of Final. The reason is
that this object is not a real one, is a cache of other real object.

For classes that have no trivial copy semantics we can yet do a
shallow copy, but as I understand it now, not without the help of
user. I'm wondering if it is worth to require the user to define the
following shallow functions:

struct shallow_t {};
const shallow_t shallow = {};

// Shallow copy constructor
C(C const&, shallow_t);
// Shallow assignement
C& shallow_assign(C const&);

STM could then provides also a shallow copy mixin.

template <
class Final,
class Base=base_transaction_object
>
class shallow_transactional_object
: public Base {
public:
base_transaction_object* make_cache() const
{
Final* p = cache_allocate<Final>(t);
return new(p) Final(this, shallow);
}
void delete_cache()
{
cache_deallocate(this);
}
void copy_cache(
base_transaction_object const * const rhs)
{
this->shallow_assign(
static_cast<Final const * const>(rhs));
}
};

Note that there is no need to call the destructor. The reason is the
same than the memcpy case. The single difference is that instead of
doing a single memcpy, we will do several ones or the equivalent. That
means that shallow copy should relay only on fundamental copies or use
shallow copy on its members.

Now we can define transaction_object that depends on whether the final
class has trivial, shallow or deep copy semantics.

namespace detail {
template <class Final, typename Base,
bool hasShallowCopySemantics,
bool hasTrivialCopySemantics>
class transactional_object;

template <class F, class B>
class transaction_object<F, B, true, true>:
public shallow_transaction_object<F, B> {};
template <class F, class B>
class transactional_object<F, B, true, true>:
public shallow_transaction_object<F, B> {};
template <class F, class B>
class transactional_object<F, B, false, true>:
public trivial_transaction_object<F, B> {};
template <class F, typename B>
class transactional_object<F, B, false, false>:
public deep_transaction_object<F, B> {};
}
template <
class Final,
class Base=base_transaction_object
> class transaction_object
: public detail::transaction_object<
Final, Base,
has_shallow_copy_semantics<Final>::value,
has_trivial_copy_semantics<Final>::value
>
{};

(deep_transaction_object corresponds to the initial transaction_object
class)

I don't know how to implement has_shallow_copy_semantics<T>::value, in
a portable way.

template <class T>
struct has_shallow_copy_semantics : boost::mpl::false_ {};

The user will need to specialize this template as follows

template <>
struct has_shallow_copy_semantics<C> : boost::mpl::true_ {};

With all this stuff I can define the list example avoiding the
performance degradation as follows

class List : public transactional_object<List>
{
std::size_t size_;
Node head_;
public:
// shallow copy constructor
List( List const& rhs, stm::shallow_t)
: size_(rhs.size_)
, head_(rhs.head_)
{}
// shallow assignment
List& shallow_assign( List const& rhs) {
if (*this!=rhs) {
size_=rhs.size_;
head_=rhs.head_;
}
return *this;
}
// makes a deep-copy
List(List const& rhs);
// makes a deep-copy
List& operator=(List const& rhs);
�
};

template <>
struct has_shallow_copy_semantics<List> : boost::mpl::true_ {};

Note that even if the following are public
List( List const& rhs, stm::shallow_t)
List& shallow_assign( List const& rhs) {

they are intended to be used only by
shallow_transactional_object<List>

Hopping this explain the problem I want to solve, and how I try to
solve it now. Please let me know if there is a flaw on this design, or
any hint to make this better, more portable.

Best.
Vicente

P.S. The code included in this post has evidently not been compiled :(

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Goran on 5 Jan 2010 02:17

On Jan 3, 5:54 pm, viboes <vicente.bo...(a)wanadoo.fr> wrote:
> Hello,
>
> I need to create a cache of a transactional object without using the
> new operator nor the copy constructor of the class. This cache needs
> only to copy the raw memory of the copied instance, is for that I have
> called it shallow_clone
>
> The following will create a deep copy and don't respect my
> requirements
>
> class C {
> public:
> C* shallow_clone() {
> return new C(*this);
> }
>
> };

Let's get terminology clear first... Deep versus shallow copy is
usually explained like this: http://en.wikipedia.org/wiki/Object_copy

What strikes me as strange is that you are employing operator new.
This is unwarranted when speaking only about deep/shallow copy
distinction. So the way to make what is commonly known as a deep copy
is both

C copy(original);

AND

C* pcopy = new C(original);

IOW, storage doesn't matter, at least unless you start with derivation
and polymorphism.

Shallow copy means that you want a copy where only part of original
instance data. Typically, in a shallow copy, you don't want a copy of
embedded or pointed-to objects. For that, there is no mechanism in C+
+, and trying to use memcpy is most usually a HUGE MISTAKE: you are
e.g. creating dangling pointers, you are breaking reference counting
for objects that do it, (strings come to mind) etc.

So the problem with shallow copy is that "shallow" means different
things in different contexts, that is, which class members are
"shallow", and which are "deep". That's for your design to decide.
Consequence of that is that you have to create your own shallow copy
function and avoid standard copy constructors and assignments. They
are most usually deep copies, so leave them for that.

Conclusion:

You have only one reasonable path: decide what constitutes a "shallow"
copy and make a function that does that, e.g.

class shallow_piece
{
int i, j, k; // Shallow
};

class C
{
shallow_piece shallow;
vector<other_class> deep;

// Special conversion ctor that acts as a shallow-copy-ctor.
// If shallow_piece is POD, it's effectively as fast as memcpy.
C(const shallow_piece& original) : shallow(original) {}
};

and then:

C original;
C shallow_copy(original.shallow);
C deep_copy(original);

I repeat: whether the result is on the heap or not is orthogonal to
the question of deep versus shallow. But, if you have derivation, then
it's very likely that shallow_piece idea is less useful. You then have
to have shallow_copy function that returns heap-based instances, and
each derived class decides what members are shallow (typically, it
will take base class "shallow members" and add one or two of it's
own). You can still employ shallow_piece idea e.g. like this, at a
cost of using heap for "shallow" part:

class shallow_piece_base {etc.};

class C_base
{
shallow_piece_base* p_shallow;
virtual C_base* shallow_copy() const;
};

class shallow_piece_derived : public shallow_piece_base {etc.};

class C : public C_base
{
// inherited p_shallow always points to shallow_piece_derived.
C(const shallow_piece_derived& shallow) : p_shallow(new
shallow_piece_derived(shallow)) {}
virtual C_base* shallow_copy() const { return new C_base
(*p_shallow); };
};

If your concern is speed, there's two things to consider: measure
whether it's faster to copy stuff around or to share heap pointers.
But that __hugely__ depends on data sizes and your actual code the way
it's running "in production" (that is, a simple test program might be
misleading).

>
> I have looked at uninitialized_copy but if I have understood it
> correctly, it calls the copy constructor.
> I have tried with
>
> class C {
> public:
> C* shallow_clone() {
> C* p = reinterpret_cast<C>(new char[sizeof(C)]);
> if (p==0) {
> throw std::bad_alloc();

That p==0 is just poor C++. new throws unless nothrow is used.

Goran.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: viboes on 5 Jan 2010 08:00

On Jan 5, 8:17 pm, Goran <goran.pu...(a)gmail.com> wrote:
> On Jan 3, 5:54 pm, viboes <vicente.bo...(a)wanadoo.fr> wrote:
>
>
>
> > Hello,
>
> > I need to create a cache of a transactional object without using the
> > new operator nor the copy constructor of the class. This cache needs
> > only to copy the raw memory of the copied instance, is for that I have
> > called it shallow_clone
>
> > The following will create a deep copy and don't respect my
> > requirements
>
> > class C {
> > public:
> > C* shallow_clone() {
> > return new C(*this);
> > }
>
> > };
>
> Let's get terminology clear first... Deep versus shallow copy is
> usually explained like this:http://en.wikipedia.org/wiki/Object_copy

I know. This is exactly how I'm using the shallow word.

> What strikes me as strange is that you are employing operator new.
> This is unwarranted when speaking only about deep/shallow copy
> distinction. So the way to make what is commonly known as a deep copy
> is both
>
> C copy(original);
>
> AND
>
> C* pcopy = new C(original);
>
> IOW, storage doesn't matter, at least unless you start with derivation
> and polymorphism.

Sorry if my text let you think I was saying that storage matter when
making the difference between deep and shallow copy. It was not my
intention.

> Shallow copy means that you want a copy where only part of original
> instance data. Typically, in a shallow copy, you don't want a copy of
> embedded or pointed-to objects. For that, there is no mechanism in C+
> +, and trying to use memcpy is most usually a HUGE MISTAKE: you are
> e.g. creating dangling pointers, you are breaking reference counting
> for objects that do it, (strings come to mind) etc.

In my use case, there is no dangling pointer, as the shallow copied
objects are cache of other shared objects. In a general case it is
clear that deep copy should be used when the class owns the reference
object. In my last post I explain widely the context.

> So the problem with shallow copy is that "shallow" means different
> things in different contexts, that is, which class members are
> "shallow", and which are "deep". That's for your design to decide.
> Consequence of that is that you have to create your own shallow copy
> function and avoid standard copy constructors and assignments. They
> are most usually deep copies, so leave them for that.

I agree. This was one of the motivations to don't use them.

> Conclusion:
>
> You have only one reasonable path: decide what constitutes a "shallow"
> copy and make a function that does that, e.g.
>
> class shallow_piece
> {
> int i, j, k; // Shallow
>
> };
>
> class C
> {
> shallow_piece shallow;
> vector<other_class> deep;
>
> // Special conversion ctor that acts as a shallow-copy-ctor.
> // If shallow_piece is POD, it's effectively as fast as memcpy.
> C(const shallow_piece& original) : shallow(original) {}
>
> };
>
> and then:
>
> C original;
> C shallow_copy(original.shallow);
> C deep_copy(original);

This could work. I have opted for a different solution. See my last
post.

> I repeat: whether the result is on the heap or not is orthogonal to
> the question of deep versus shallow. But, if you have derivation, then
> it's very likely that shallow_piece idea is less useful. You then have
> to have shallow_copy function that returns heap-based instances, and
> each derived class decides what members are shallow (typically, it
> will take base class "shallow members" and add one or two of it's
> own). You can still employ shallow_piece idea e.g. like this, at a
> cost of using heap for "shallow" part:
>
> class shallow_piece_base {etc.};
>
> class C_base
> {
> shallow_piece_base* p_shallow;
> virtual C_base* shallow_copy() const;
>
> };
>
> class shallow_piece_derived : public shallow_piece_base {etc.};
>
> class C : public C_base
> {
> // inherited p_shallow always points to shallow_piece_derived.
> C(const shallow_piece_derived& shallow) : p_shallow(new
> shallow_piece_derived(shallow)) {}
> virtual C_base* shallow_copy() const { return new C_base
> (*p_shallow); };
>
> };

This introduce an indirection that I would avoid.

> If your concern is speed, there's two things to consider: measure
> whether it's faster to copy stuff around or to share heap pointers.
> But that __hugely__ depends on data sizes and your actual code the way
> it's running "in production" (that is, a simple test program might be
> misleading).

I'm doing a library that could improve its performance when the
template parameter type has ShallowCopy semantics. So I will let this
decision to the user. Thanks anyway for your suggestions.

> > I have looked at uninitialized_copy but if I have understood it
> > correctly, it calls the copy constructor.
> > I have tried with
>
> > class C {
> > public:
> > C* shallow_clone() {
> > C* p = reinterpret_cast<C>(new char[sizeof(C)]);
> > if (p==0) {
> > throw std::bad_alloc();
>
> That p==0 is just poor C++. new throws unless nothrow is used.

You are right. :)

Could you comment on the approach I have presented on my last post?
Best,
Vicente

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

| Next | Last
Pages: 1 2
Prev: Iterating over vectors - speed difference
Next: We Wait For Thee: char16_t, char32_t.