XML Deserialize of empty value [CSharp]

Prev: Reordering DataGridView and Scrolling
Next: Is possible to run with framework n-1 when built with framework n

From: Peter Duniho on 16 Feb 2010 08:04

Ele wrote:
> Hello,
>
> Yes, I understand your point. I did not post it to NG because I didn't want
> to add too much code in my message, making it difficult to read. However,
> here's the code.
> As always, thank you! :-)

Thanks for the sample. It helps make your question much clearer.

Unfortunately, I'm not entirely sure why the code doesn't work. That
is, looking at the output I do have a hint as to the problem, but I
don't completely understand the nature of it. (That may be at least in
part because I've stayed up way too late for other reasons, and have had
the poor judgment to still want to look at your question :) ).

What I can tell you is that a) you appear to be trying to deserialize
data that was not generated by the XmlSerializer class, and b) when you
deserialize that data, if the "value" element is an empty tag, because
it's of type XmlNode and the XmlSerializer class is looking for one of
those, the value of that property winds up being the next node in the
sequence, which happens to be the "distance" element.

On the second point, one _might_ under certain circumstances consider
that a bug. After all, the deserializing process should be looking for
the content _within_ the "value" element, regardless. But�

It appears that instead, there's a reader somewhere that when it hits
the "value" element, it doesn't bother to see whether that's a closed
element or not; it simply starts reading the next XmlNode whatever that
may happen to be, and returns that node for the "value" element.

And for better or worse, this isn't really illegal behavior, because the
XmlSerializer class doesn't promise to be able to deserialize any
arbitrary XML, but rather just XML that was generated by the
XmlSerializer class. And XmlSerializer wouldn't have written out XML
like the XML you're trying to deserialize.

If you want more robust deserialization, you'll probably have to
implement your own deserialization. You might try reporting the
behavior you're seeing as a bug on Microsoft's Connect web site
(http://connect.microsoft.com/) but I suspect they'll just close the bug
either as "By Design" or "Won't Fix".

By the way, in terms of the code you've posted, your serialize and
deserialize methods can be _much_ simpler:

public static T DeserializeObject<T>(string XML)
{
using (StringReader stringreader = new StringReader(XML))
{
XmlSerializer serializer = new XmlSerializer(typeof(T));
return (T)serializer.Deserialize(stringreader);
}
}

public static string SerializeObject<T>(T Source)
{
XmlSerializer serializer = new XmlSerializer(typeof(T));

using (StringWriter writer = new StringWriter())
{
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("", "");
serializer.Serialize(writer, Source, ns);
return writer.ToString();
}
}

(Technically, you don't really need the "using" or calls to Dispose(),
because you know that the StringReader/Writer classes don't actually
have unmanaged objects to dispose, but IMHO it's better form to include
them, as you did in your original example).

Note that with the above change, you need to change the call to the
deserializer method too:

el = DeserializeObject<CDataElement>(strXml);

Finally, even if you prefer the more explicit approach your example
showed, you had a bunch of extraneous "null" assignments in those
methods. Regardless of what else you might change, you definitely
should not write code like that. Setting local variables to "null"
accomplishes nothing beneficial, and in some cases it can actually hurt
overall memory performance, because it prevents the JIT compiler from
discarding the objects earlier.

Pete

From: Ele on 16 Feb 2010 09:26

Hello Pete,

First of all, thank you for your complete explanation. Here's my comments...
:-)

Yes, you are right when you say that I'm not deserializing something
generated by XMLSerializer. But, actually, it was generated by
XMLSerializer, saved to a MS SQL Server 2005 database, and then read. When
saving/reading from SQL Server 2005, the XML gets automatically transformed
that way (it's SQL that does that). I think this happens because SQL thinks
the format "<value></value>" is the same as "<value />", and so it
transforms it (and I think SQL is right). So I need to find a workaround for
this, and I'm keen to consider it a bug (in c#, not SQL). As I workaround (I
want to avoid writing my own deserializer), before deserializing I could
search for all occurrences of "/>" (closed node) and transform them to
open/close node. Then the .NET deserializer should work correctly. What do
you think?

Thank you for you simpler code for serializing/deserializing. I'm now using
it.

Finally, I don't understand your suggestions about null values, and I would
really like to know more... :-) Back from VB6 ages, I always used to set
objects (e.g. connections) to null, when I did not need them any longer and
before exiting subs/functions. I have the same habit under C#. So are you
saying this is not correct and I should avoid setting them to null? Where
can I read more documentation? I just want to learn new things... :-)

Thank you again!

From: Peter Duniho on 16 Feb 2010 15:12

Ele wrote:
> Hello Pete,
>
> First of all, thank you for your complete explanation. Here's my comments...
> :-)
>
> Yes, you are right when you say that I'm not deserializing something
> generated by XMLSerializer. But, actually, it was generated by
> XMLSerializer, saved to a MS SQL Server 2005 database, and then read. When
> saving/reading from SQL Server 2005, the XML gets automatically transformed
> that way (it's SQL that does that).

One thing I'm curious about is how you wind up with an empty tag node in
the first place. The XmlSerializer _should_ simply omit the element if
it's null, and if it's non-null I would expect a non-empty element in
the XML to be needed. So you're explanation about SQL Server 2005
rewriting the XML doesn't really completely cover what seems to be going on.

That said, assuming the above is explainable, IMHO the best solution
would be to not allow SQL Server 2005 to rewrite your XML. Surely you
can simply save the XML in the database as a string, rather than letting
SQL Server 2005 know it's actually supposed to be XML?

> I think this happens because SQL thinks
> the format "<value></value>" is the same as "<value />", and so it
> transforms it (and I think SQL is right).

Yes, SQL Server 2005 is technically correct that the two are equivalent.
And IMHO, XmlSerializer should in fact handle it correctly. But, it's
not too surprising to me that it doesn't (given the likely original
design requirements for XmlSerializer), and I don't have a lot of
confidence Microsoft would think it's worth fixing.

That said, I do think there's value in you reporting the problem as a
bug. You never know, Microsoft could in fact wind up fixing it. And if
they don't, at least you'll have some closure on question. :)

> So I need to find a workaround for
> this, and I'm keen to consider it a bug (in c#, not SQL). As I workaround (I
> want to avoid writing my own deserializer), before deserializing I could
> search for all occurrences of "/>" (closed node) and transform them to
> open/close node. Then the .NET deserializer should work correctly. What do
> you think?

If you are against customizing the deserialization to handle the case,
then yes�certainly transforming the XML into something you know will
work should avoid the problem. In that case, you may find that writing
a custom XmlTextReader is a simple way to handle that. You can delegate
most of the work to the .NET XmlTextReader, but map the empty tag nodes
to a pair of start and end tags.

> Thank you for you simpler code for serializing/deserializing. I'm now using
> it.
>
> Finally, I don't understand your suggestions about null values, and I would
> really like to know more... :-) Back from VB6 ages, I always used to set
> objects (e.g. connections) to null, when I did not need them any longer and
> before exiting subs/functions.

I have barely ever used VB6, so I can't speak to whether that was ever
actually necessary even in that environment. I would not expect setting
local variables to null to be useful even there, but perhaps VB6 had
some kind of extra behavior on setting a variable to null, such as
automatically handling disposal or disposal-like behaviors when that
happens.

There's no such consequence in C# though to setting a local variable to
null.

> I have the same habit under C#. So are you
> saying this is not correct and I should avoid setting them to null? Where
> can I read more documentation? I just want to learn new things... :-)

I'm not sure if there's a specific piece of documentation that actually
says "there's no need to set local variables to null". But, it's a
natural consequence of how memory management in C# works.

Specifically, C# uses garbage collection (GC). GC works by detecting
the case when a particular object is no longer reachable by any valid
reference in the program. Local variables exist only while the method
is executing, so once the method returns, any local variables that
referenced an object are no longer a way to reach that object, and thus
as long as there are no other ways to reach the object, it can be GC'ed.

So, at best there is no benefit to setting a local variable to null.
The GC system can already detect the case where the object isn't
reachable any more after you return from the method, without you doing that.

Now, there's another little wrinkle: the JIT compiler and GC system work
together, and the JIT compiler is smart enough to know where in the code
is the last place you used a local variable. In _some_ situations, this
means the local variable may cease to exist _before_ the method returns,
and if that happens, the GC is able to collect the object earlier than
it otherwise might have.

But, when you set the local variable to null, that causes the variable
to continue to be in use in the method longer than it otherwise would
have been, _delaying_ the collection of the object to that point.

In C#, the assignment operator is not overloadable, and so I suppose in
the future it's possible the JIT compiler would detect that particular
pattern and ignore a use of the variable if all you're doing is setting
it to null and there are no more uses of the variable after that point.
That would at least avoid any run-time implications to unnecessary
assignments (you'd still have cluttered code, but that's between you and
the compiler :) ).

Pete

From: RayLopez99 on 17 Feb 2010 04:32

On Feb 16, 4:26 pm, "Ele" <nos...(a)nospam.com> wrote:
> Hello Pete,

> Thank you again!

Don't thank Pete, though he does good work here for free, and gets
nothing but abuse from me. Thank yourself for lerning. And keep in
mind one thing: there's an XML group devoted to these problems,
comp.text.xml, and you will find that something that sounds easy in
theory doesn't work in practice.

Case in point: going through this long thread you'll find that a
simple solution for appending an XML file did not work "the short
way" (though I got it to work the long way):
http://groups.google.com/group/comp.text.xml/browse_thread/thread/41bc4904526329da/05a9eb8e51fa0bb1?hl=en#05a9eb8e51fa0bb1

I never did find out why it did not work the "short" or "easy" way,
though I had several theories at the time.

RL

From: Ele on 17 Feb 2010 05:26

"Peter Duniho" <no.peted.spam(a)no.nwlink.spam.com> ha scritto nel messaggio
news:usEOWR0rKHA.3800(a)TK2MSFTNGP06.phx.gbl...
> One thing I'm curious about is how you wind up with an empty tag node in
> the first place. The XmlSerializer _should_ simply omit the element if
> it's null, and if it's non-null I would expect a non-empty element in the
> XML to be needed.

You are right, your behaviour happens for standard nodes. But if we have a
CDATA node and the node is empty, we end up with:

<value><![CDATA[]]></value>

So, in this case, it does not omit the element, it writes it with an empty
value. And when I pass through SQL, the CDATA is removed <value></value>)
and then it is shortened (<value />).

I will submit this bug to Microsoft.

I think I have found a way to tell SQL to not convert XML, but it requires
editing the registry, so I cannot ask my users to do that.

And I was thinking to simply store the value in SQL in a nvarchar field, not
XML field, the same you have suggested.

Finally, thank you for your comprehensive explanation about null values, I
understood that!

Thanks! :-)

First | Prev | Next | Last
Pages: 1 2 3
Prev: Reordering DataGridView and Scrolling
Next: Is possible to run with framework n-1 when built with framework n