From: Le Chaud Lapin on
Hi All,

Do I get any guarantee that the unsigned version of a corresponding
signed type is:

1. the same size as the signed version
2. equivalent to the signed version as far as the bit pattern is
concerned

I especially would like to know if the standard prevents the compiler
from changing the bit pattern for the cast.

Here are some unsigned/signed pairs:

unsigned char/signed char
unsigned int/signed int
unsigned long int/signed long int

TIA,

[I realize this is silly/trivial question, but my book is not
available right now.]

-Le Chaud Lapin-

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: SG on
On 2 Nov., 22:26, Le Chaud Lapin wrote:
>
> Do I get any guarantee that the unsigned version of a corresponding
> signed type is:
>
> 1. the same size as the signed version

In terms of the sizeof operator: yes. I'm not sure if that implies
that the value representation uses the same subset of bits (as in
"potential padding at the same places"). But I would be surprized if
that's not the case for some implementation.

> 2. equivalent to the signed version as far as the bit pattern is
> concerned

It is if the signed number is non-negative. In addition, it will also
be the same bit pattern for negative numbers given the system uses 2's
complement (as opposed to 1's complement or sign+magnitude). This
directly follows from the "modulo rule": The conversion will obey
equivalence modulo N where N is the number of bits in the target
*unsigned* type. The converse is not guaranteed (conversion to signed)
by the standard. But you can expect implementations that use 2's
complement to make this guarantee. The conversion to signed int where
the original value can not be represented is implementation-defined
and thus has to be documented by the implementation.

Cheers,
SG

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Johannes Schaub (litb) on
Le Chaud Lapin wrote:

> Hi All,
>
> Do I get any guarantee that the unsigned version of a corresponding
> signed type is:
>
> 1. the same size as the signed version
>
3.9.1/3 in the Standard says yes - they have the same storage size making
"sizeof" yield the same value.

> 2. equivalent to the signed version as far as the bit pattern is
> concerned
>
Standard says in the same paragraph: "the value representation of each
corresponding signed/unsigned type shall be the same.", for all non-negative
values of the signed type.

> I especially would like to know if the standard prevents the compiler
> from changing the bit pattern for the cast.
>
> Here are some unsigned/signed pairs:
>
> unsigned char/signed char
> unsigned int/signed int
> unsigned long int/signed long int
>
According to 4.7/2, converting signed -> unsigned is a mathematical
operation. The resulting value is the least unsigned value congruent to the
source integer (modulo 2**n with n being the number of bits in the
representation of the unsigned integer).

So the bit-pattern could change. For example, (unsigned int)-1 yields
UINT_MAX, because the difference UINT_MAX - (-1) is divisible by
2**BITS_IN_UINT and is the least positive integer doing so. For two's
complement, there is no change in the bit pattern. But for sign-magnitude,
you go from "1000...01" to "1111...11", and for one's complement, you go to
all-one from "1111...110" and so on.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Le Chaud Lapin on
On Nov 3, 12:57 am, "Johannes Schaub (litb)" <schaub-johan...(a)web.de>
wrote:
> Le Chaud Lapin wrote:
> > Do I get any guarantee that the unsigned version of a corresponding
> > signed type is:
>
> > 2. equivalent to the signed version as far as the bit pattern is
> > concerned
>
> According to 4.7/2, converting signed -> unsigned is a mathematical
> operation. The resulting value is the least unsigned value congruent to the
> source integer (modulo 2**n with n being the number of bits in the
> representation of the unsigned integer).
>
> So the bit-pattern could change. For example, (unsigned int)-1 yields
> UINT_MAX, because the difference UINT_MAX - (-1) is divisible by
> 2**BITS_IN_UINT and is the least positive integer doing so. For two's
> complement, there is no change in the bit pattern. But for sign-magnitude,
> you go from "1000...01" to "1111...11", and for one's complement, you go to
> all-one from "1111...110" and so on.

Thanks for the clear explanation.

So I can see one's complement might be an issue for what I am trying
to do. I am trying to build a UNICODE Buffer object from a string
given by const char *. [Please ignore the fact that the UNICODE Buffer
is templated on 'C'.]:

template <typename C> struct Buffer
{
unsigned int length_;
C *pointer;
Buffer (const char *string)
{
length_ = 0;
while (string[length_])
++length_;
if (length_)
{
pointer = new C[length_ + 1];
for (unsigned int i = 0; i <= length_; ++i)
pointer[i] = static_cast<C>(static_cast<unsigned char>
(string[i]));
}
else
pointer = 0;
}
} ;

On platforms where type char is inherently unsigned, the static_cast
in the code above was not necessary, because the type of 'C' is always
unsigned in my design. But when type char is inherently signed, the
cast is necessary because a char value exceeding 127 will be negative,
sign extension will occur during bit-width-extension for conversion to
an unsigned type, and unsigned type will become large positive value,
per the congruence rule that you and SG mentioned. This value will, of
course, not be the UNICODE value that I wanted, even when there is a
complete match between the bit pattern for type char, and bit pattern
for type 'C' when tyepof(C) == wchar_t.

So without cast,

"plus �a change, plus c'est la m�me chose"

...becomes

"plus @a change, plus c'est la m(a)me chose" // @ = value > 255.

I would like to take a char, and ensure that, when converted to
whar_t, it always yields the proper corresponding unsigned value in
wchar_t. I am permitted to assume 1-byte chars, but not 2's-
complement.

I thought of using reinterpret_cast on the char to force it to
unsigned char, but wanted to get different opinions before trying
that.

-Le Chaud Lapin-


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Le Chaud Lapin on
On Nov 3, 5:46 pm, Le Chaud Lapin <jaibudu...(a)gmail.com> wrote:
> I would like to take a char, and ensure that, when converted to
> whar_t, it always yields the proper corresponding unsigned value in
> wchar_t. I am permitted to assume 1-byte chars, but not 2's-
> complement.
>
> I thought of using reinterpret_cast on the char to force it to
> unsigned char, but wanted to get different opinions before trying
> that.

Seems that these questions and so many others can be answered by the C+
+0x draft document:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2914.pdf

A trivial fact for those who wonder, as I did, what means the 'x' in 'C
++0x': It is a place-holder for single decimal digit 0-9, indicating
date of release of "specification", 2000-2009, respectively.

Thanks to Seungbeom Kim and others for passively referring to the
standard in my previous posts, which finally induced me to take a
look.

It is worth the and not as painful as I imagined it would be. :)

-Le Chaud Lapin-


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]