From: Goran on
On Sep 4, 11:53 am, Elizabeta <Elizab...(a)discussions.microsoft.com>
wrote:
> >>If your dialog is used to entry/show some data, you may want to create a
> >>C++ class derived from CObject that stores that data (e.g. a CPerson,
> >>with name, surname, address, etc.), and use DECLARE_SERIAL and
> >>IMPLEMENT_SERIAL on this non-GUI class.
>
> Ok lets say that for Dialog template classes I can do your suggestion, thanks.
>
> But what about template classes that are not dialogs, so I want to
> reformulate my question :
> How to use macros like DECLARE_SERIAL and IMPLEMENT_SERIAL with c++ template
> classes derived from CObject ?

You really can't do that. You can only serialize a template
instantiation. So, for example, you could do:

template<typename T>
class C : public CObject
{
void Serialize(CArchive& ar) { ar>>member, ar>>member; }
T member;
};

then

class CInt : public CObject<int>
{ DECLARE_SERIAL(CInt) }
IMPLEMENT_SERIAL(CDouble, CObject, schema...)

class CDouble : public CObject<double>
{ DECLARE_SERIAL(CDouble) }
IMPLEMENT_SERIAL(CDouble, CObject, schema...)

So...
1. you must have a concrete class for XXX_SERIAL macros to even
compile
2. if you ever change template type (e.g. from int to short), you must
change the schema number and deal with that. This may be tricky for a
beginner ;-)

Also, to be able to serialize a class (e.g a template), you don't
necessarily need to use XXX_SERIAL macros, but then:
1. you can't use schema number for format evolution
2. >> and << operators don't work (consequence: you can't serialize
using CObArray).

What Joe says about schema migration being impossible is not true.
It's not easy, but not impossible. You must know how-tos and pitfalls,
but it's doable. For example, I have code here at work that can read
stuff serialized more than a decade ago, and that has since undergone
literally thousands of changes in saved data. There are constraints,
but it is very rare that we break serialization. If fact,
serialization problems are a tiny minority compared to other issues.

HTH,
Goran.
From: Joseph M. Newcomer on
I was overstating the case somewhat. But for someone asking about how to do
serialization, "incredibly difficult" is not that distinguishable from "impossible".
Having done this numerous times, in a variety of languages and environments over the last
46 years, the code grows in complexity each time you make a schema change, and after the
tenth major change, you end up with an unintelligible, not to mention unmaintainable,
mess. I've probably worked on a dozen projects that did this, and while we could and did
read files saved not-quite-ten years ago, you had to maintain in the comments a
specification of each of the schemata. The real problem was someone making a change that
broke the ability to read the 3.7 release (which was different from 3.6 and 3.8 but only
in trivial ways), and one of my tasks was to find and fix that problem.

In one system, we wrote a tape-copy program that took in tapes in the old schema and wrote
new tapes in the new schema. At General Motors, a schema conversion involved three days'
of runtime on an IBM 360/75. Most of the delay was not computational, but in the massive
amount of data in an automobile database; even today we don't have the fantastic I/O
bandwidth of a 1970s huge mainframe, except of the highest-end multiprocessor multi-bus
servers with the super-high-end ($2500) disk controller cards. Computationally, we are
orders of magnitude faster than those old mainframes, but I/O bandwidth has *not* doubled
every 18 months! So binary schema migration is essentially the same problem it was in the
1950s, only we can do it with faster computational engines. (So fast, in fact, that
parsing XML is almost not a factor in the cost). One client was upset at their slow
input, but it turned out that the programmer decided that reading text (not XML, just
plain text) was going to be slower, so added a progress bar. The progress bar update took
more time than the reading of the data (I discovered this when I moved the reader to a
separate thread and updated the progress bar every 100 records instead of every record;
reading the text was only 2x as long as reading the binary, 30 seconds vs. 15).

In fact, one of the serious defects of MFC is the inability to move serialization easily
into a background thread. Too much of MFC was designed with a single-thread model in
mind.

One interesting C++ solution we used was to treat each schema as a new class instance,
keeping the old classes for serialization. So we would read in using the
CVersion36Data::Serialize if we found (using my previous example) the header was a 3.6
(well 0x00036000) version, then we had a subroutine that took the 3.6 version and produced
a 3.7 version. We couldn't do this years ago because we didn't have enough memory to hold
both versions, but even in Win16 3.1 this was not a problem. But it did mean that we had
to write a n<=3.m for n.m<3.7 converter. Perhaps surprisingly, this was much easier than
the if(majorversion < 3) and minor-version interlaced tests that had pervaded the 1.x and
2.x code; my predecessor on the project who created version 3 threw up his hands and said
"this is impossible!" and culled through the old versions, extracting the code for each
serialization and building a new class. I got the code about version 7, and was able to
easily create version 7.x+1, with many new features added to the file. (He later told me
that his real failure was that he did not cascade the converters; that is, if you read
version 1.n, just sequentially run the 1.n to 1.n+1, 1.n+1 to 1.n+2, etc. to 1.m to 2.0,
etc.; his thought was that this would mean the first read of an old file might be much
slower, but since there was no backward compatibility requirement, only one output writer
was required. We had somewhat less than 25 version converters, each quite simple, so he
said it ultimately didn't matter too much).

In our version of proto-XML, we kept the undefined fields as text and the defined fields
in binary, and had a table-drive algorithm that could convert text-to-binary+text or
binary-to-text. IDL did not support the notion of handling undefined fields, but was also
table-drive. We could read a structure in as text or binary and fix up all the pointers
so we could save arbitrary graphs of objects-pointing-to-objects, even with cyclic lists.
I found this was a particular failure of most MFC serialization techniques. Ultimately, I
got to the point where I'd convert the files to text and nobody noticed or cared that they
were larger or read more slowly, with one exception (which we were able to fix rather
trivially).

Getting multi-schema serialization to work without massive pain requires some careful
up-front design. I have not encountered one example of this being done correctly in all
the serialization code I've had to fix.

The most serious one was the project where we had to add new fields to the structure and
had to maintain backward compatibility to the older binary executables; that's where I
explained that this was not going to happen. We solved it by writing two files, one which
was old-binary-compatible and lacked the richer structure of the new version, and another
which was the "fixup" of the structure which embodied all the new features. This was
actually pretty clean, but took two passes to write two archive files, and we had to use a
derived class of CArchive to indicate which pass we were in.

In addition, I have encountered, but not worked on, a fairly large number of projects
whose participants complained about the problems of serialization, at least the binary
serialization that is normally used.

So your experience seems anomalous.
joe

On Fri, 4 Sep 2009 07:06:12 -0700 (PDT), Goran <goran.pusic(a)gmail.com> wrote:

>On Sep 4, 11:53�am, Elizabeta <Elizab...(a)discussions.microsoft.com>
>wrote:
>> >>If your dialog is used to entry/show some data, you may want to create a
>> >>C++ class derived from CObject that stores that data (e.g. a CPerson,
>> >>with name, surname, address, etc.), and use DECLARE_SERIAL and
>> >>IMPLEMENT_SERIAL on this non-GUI class.
>>
>> Ok lets say that for Dialog template classes I can do your suggestion, thanks.
>>
>> But what about template classes that are not dialogs, so I want to
>> reformulate my question :
>> How to use macros like DECLARE_SERIAL and IMPLEMENT_SERIAL with c++ template
>> classes derived from CObject ?
>
>You really can't do that. You can only serialize a template
>instantiation. So, for example, you could do:
>
>template<typename T>
>class C : public CObject
>{
> void Serialize(CArchive& ar) { ar>>member, ar>>member; }
> T member;
>};
>
>then
>
>class CInt : public CObject<int>
>{ DECLARE_SERIAL(CInt) }
>IMPLEMENT_SERIAL(CDouble, CObject, schema...)
>
>class CDouble : public CObject<double>
>{ DECLARE_SERIAL(CDouble) }
>IMPLEMENT_SERIAL(CDouble, CObject, schema...)
>
>So...
>1. you must have a concrete class for XXX_SERIAL macros to even
>compile
>2. if you ever change template type (e.g. from int to short), you must
>change the schema number and deal with that. This may be tricky for a
>beginner ;-)
>
>Also, to be able to serialize a class (e.g a template), you don't
>necessarily need to use XXX_SERIAL macros, but then:
>1. you can't use schema number for format evolution
>2. >> and << operators don't work (consequence: you can't serialize
>using CObArray).
>
>What Joe says about schema migration being impossible is not true.
>It's not easy, but not impossible. You must know how-tos and pitfalls,
>but it's doable. For example, I have code here at work that can read
>stuff serialized more than a decade ago, and that has since undergone
>literally thousands of changes in saved data. There are constraints,
>but it is very rare that we break serialization. If fact,
>serialization problems are a tiny minority compared to other issues.
>
>HTH,
>Goran.
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Goran on
OK, so I'll share my experience and how we do it overall. It is about
a run-off-the mill editor-type application that gets developed over
the course of a decade (and counting). I am not the original author
(Hi, Serge!), but I see no major problems serialization-wise.
Certainly there are things that could have been done better, but hey,
rarely do I think that my own code is good when looking at it a couple
of years, either! ;-)

What we do is provide only backwards compatibility. That decision
serves us well, I'd say. We use VERSIONABLE_SCHEMA for particular
classes and also have global document version number. Canonical
Serialize is:

void CFoo::Serialize(ar)
{
//storing:
ar << member1 << m2 << m3 << ...;

//loading:
ar << member1<<m2;
if (schema >= 2)
{
ar << m3;
if (schema >= 3)
{
ar << m4;
// etc
}
}
}

For one particular class, I don't think we ever went over 20 or so.
Clearly, that "if" nesting in loading branch can certainly become
deep, but I don't remember that we tried to alleviate by moving code
(e.g. ReadFromSchema3(ar)). Frankly, I think it's rather canonical and
not complicated at all.

About global document format number: each set of particular changes,
before going into the wild, requires a bump in this number. That way,
code knows whether it's too old to read a particular file (newer).
Older files it must read anyhow.

Sometimes (that is very rare, though), we remove something from the
program. For that, we just do ar << unused and ar >> unused. So no
schema change for that. Sometimes we change data type of a particular
variable. That is handled by either "unused" trick, either specific
schema check "in the middle". But that is rare, too.

I'd say, when the above is followed strictly, serialization really is
a breeze!

Now, of course, this is only overall approach, and things indeed do
get messy in places (I'd prefer not to talk about them ;-) ).

A pet peeve of Serialization for many must be the inability to have
schema number on a per-derived-class basis. That is, schema change
anywhere in a hierarchy demands schema change for all participating
classes, so that they all carry the same schema number. Go MS, huh?
Recently, someone here complained that he could not rename a class
because of serialization. Yes, that is a breaking change, but not
insurmountable, either (I don't think we ever did this, BTW!)

Goran.
From: Joseph M. Newcomer on
None of the serialization projects I worked on were fortunate enough to have structures
that simple. Data structures were 2-3 levels deep and very complex.
joe

On Sat, 5 Sep 2009 03:19:29 -0700 (PDT), Goran <goran.pusic(a)gmail.com> wrote:

>OK, so I'll share my experience and how we do it overall. It is about
>a run-off-the mill editor-type application that gets developed over
>the course of a decade (and counting). I am not the original author
>(Hi, Serge!), but I see no major problems serialization-wise.
>Certainly there are things that could have been done better, but hey,
>rarely do I think that my own code is good when looking at it a couple
>of years, either! ;-)
>
>What we do is provide only backwards compatibility. That decision
>serves us well, I'd say. We use VERSIONABLE_SCHEMA for particular
>classes and also have global document version number. Canonical
>Serialize is:
>
>void CFoo::Serialize(ar)
>{
> //storing:
> ar << member1 << m2 << m3 << ...;
>
> //loading:
> ar << member1<<m2;
> if (schema >= 2)
> {
> ar << m3;
> if (schema >= 3)
> {
> ar << m4;
> // etc
> }
> }
>}
>
>For one particular class, I don't think we ever went over 20 or so.
>Clearly, that "if" nesting in loading branch can certainly become
>deep, but I don't remember that we tried to alleviate by moving code
>(e.g. ReadFromSchema3(ar)). Frankly, I think it's rather canonical and
>not complicated at all.
>
>About global document format number: each set of particular changes,
>before going into the wild, requires a bump in this number. That way,
>code knows whether it's too old to read a particular file (newer).
>Older files it must read anyhow.
>
>Sometimes (that is very rare, though), we remove something from the
>program. For that, we just do ar << unused and ar >> unused. So no
>schema change for that. Sometimes we change data type of a particular
>variable. That is handled by either "unused" trick, either specific
>schema check "in the middle". But that is rare, too.
>
>I'd say, when the above is followed strictly, serialization really is
>a breeze!
>
>Now, of course, this is only overall approach, and things indeed do
>get messy in places (I'd prefer not to talk about them ;-) ).
>
>A pet peeve of Serialization for many must be the inability to have
>schema number on a per-derived-class basis. That is, schema change
>anywhere in a hierarchy demands schema change for all participating
>classes, so that they all carry the same schema number. Go MS, huh?
>Recently, someone here complained that he could not rename a class
>because of serialization. Yes, that is a breaking change, but not
>insurmountable, either (I don't think we ever did this, BTW!)
>
>Goran.
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Goran on
On Sep 5, 5:40 pm, Joseph M. Newcomer <newco...(a)flounder.com> wrote:
> None of the serialization projects I worked on were fortunate enough to have structures
> that simple.  Data structures were 2-3 levels deep and very complex.

Well... That doesn't matter. You have two choices: either you treat
"substructure" as a full-blown serializable class, in which case you
have schema version an' all^^^, either you don't, in which case you
can (well, have to) pass "parent" structure schema version to
serialization of those.

^^^Note: in this case, is not __obliged__ to use heap allocation and
ar >> pObj/arr<<pObj. Instead, ar.SerializeClass() and obj.Serialize()
produces the same effect for "embedded" structures. AFAIK
SerializeClass has a very reasonable footprint.

Goran.