From: already5chosen on
On Apr 7, 8:25 pm, "Wilco Dijkstra" <Wilco_dot_Dijks...(a)ntlworld.com>
wrote:
> "Duane Rettig" <du...(a)franz.com> wrote in messagenews:o0iqyttzes.fsf(a)gemini.franz.com...
> > "Wilco Dijkstra" <Wilco_dot_Dijks...(a)ntlworld.com> writes:
>
> >> "Nick Maclaren" <n...(a)cus.cam.ac.uk> wrote in messagenews:ftd6np$fhn$1(a)gemini.csx.cam.ac.uk...
>
> >>> In article <cNoKj.13308$h65.12...(a)newsfe2-gui.ntli.net>,
> >>> "Wilco Dijkstra" <Wilco_dot_Dijks...(a)ntlworld.com> writes:
> >>> |>
> >>> |> > "Adding a macro here and there" is precisely what I meant by doing it
> >>> |> > wrong. And, no, endianness does not change the sizes of anything - but
> >>> |> > genuinely portable code can handle size, endian and other differences.
> >>> |>
> >>> |> Changing endian does change structure size, this is exactly the kind of thing
> >>> |> that catches the less experienced people :-)
>
> >>> Just exactly HOW does reordering bits change the space they take or
> >>> their alignment?
>
> >> Combine non-pure endian types with bitfields. It should be easy to work out
> >> an example (remember you thought it was trivial?). Alignment never changes.
>
> > I'd like to see this. So go ahead and work us out an example, since
> > it's so trivial. I think you're going to have a hard time, unless you
> > introduce other factors that were not directly related to endianness.
>
> No problem:
>
> struct X {
> int x : 16;
> long long y : 33;
> long long z : 15;
>
> }
>
> Assuming 32-bit int, 64-bit long long, natural alignment and commonly used bitfield
> packing rules this structure takes 8 bytes normally. However it needs either 8 or 16
> bytes if long longs are mixed endian. A mixed endian type has its low and high parts
> always at the same offset, but the bytes in each part are swapped normally.
>

Doesn't the recommendation to avoid bit fields if you are interested
in the portable data layout appear in every C book in existence.
From: Duane Rettig on
"Wilco Dijkstra" <Wilco_dot_Dijkstra(a)ntlworld.com> writes:

> "Duane Rettig" <duane(a)franz.com> wrote in message news:o0iqyttzes.fsf(a)gemini.franz.com...
>> "Wilco Dijkstra" <Wilco_dot_Dijkstra(a)ntlworld.com> writes:
>>
>>> "Nick Maclaren" <nmm1(a)cus.cam.ac.uk> wrote in message news:ftd6np$fhn$1(a)gemini.csx.cam.ac.uk...
>>>>
>>>> In article <cNoKj.13308$h65.12966(a)newsfe2-gui.ntli.net>,
>>>> "Wilco Dijkstra" <Wilco_dot_Dijkstra(a)ntlworld.com> writes:
>>>> |>
>>>> |> > "Adding a macro here and there" is precisely what I meant by doing it
>>>> |> > wrong. And, no, endianness does not change the sizes of anything - but
>>>> |> > genuinely portable code can handle size, endian and other differences.
>>>> |>
>>>> |> Changing endian does change structure size, this is exactly the kind of thing
>>>> |> that catches the less experienced people :-)
>>>>
>>>> Just exactly HOW does reordering bits change the space they take or
>>>> their alignment?
>>>
>>> Combine non-pure endian types with bitfields. It should be easy to work out
>>> an example (remember you thought it was trivial?). Alignment never changes.
>>
>> I'd like to see this. So go ahead and work us out an example, since
>> it's so trivial. I think you're going to have a hard time, unless you
>> introduce other factors that were not directly related to endianness.
>
> No problem:
>
> struct X {
> int x : 16;
> long long y : 33;
> long long z : 15;
> }

Ah, so you're giving me non-portable code to prove non-portability. I
see.

Anyone can indeed write non-portable code, in any language. That
point is not at at issue.

>> Be careful about using absolutes. Not _everybody_ uses fixed-width
>> types. My product, for example, which uses some C code to interface
>> to system libraries, and which has a "foreign function" interface to
>> enabe users to load and call libraries, and we strive always to move
>> away from any reference to size. Instead, we match the machine model
>> (e.g. ILP32, LP64, IL32P64, etc) with the names of types, and get
>> ourselves away from sizes in type names.
>
> I'm not sure what you mean by matching, but if you assume a particular model
> then you are effectively working with sized types, whatever name you use.

Well of course every type has a size. That's also not at issue. The
question is if the size matters, and whether the abstraction at the
source level can be made portable. For example, we use a typedef
called "nat", which is meant to be the natural word size of the
program being run. We might have used "long" to represent this type,
except that Windows XP-64 uses an IL32P64 model, which means that
longs are still 32-bit. But by defining the nat type, we can say what
we mean by (an integer that has the same size as a pointer) and we can
avoid code that is non-portable and full of conditionalizations. In
that sense, we do not use a fixed-width type; we use a type that
describes the problem space we're working on in an abstract manner,
without it being necessary to reveal its size.

>>> Another well known example is trying to represent
>>> N languages and M target architectures using a single intermediate language.
>>> Unfortunately it doesn't work like that in the real world...
>>
>> We've had experience with this, and it's not the case of intermediate
>> languages being inherently problematic - it's more the case where the
>> intermediate language is simply not powerful enough to take on some of
>> the desired target languages in an efficient manner.
>
> So you add some more specific intermediate instructions and semantics. Do this
> for several languages and targets, and you either end up with a huge intermediate
> language

but "huge" is so relative...

>> or complex intermediate instructions whose semantics varies with the context.

or if you spend the time to refactor the intermediate language, you
start refining it to the point where it is able to take on more
targets with less added complexity.

> The current compiler I'm working on supports 3 different exception models in the
> intermediate language, and that's just 3 languages so far... To make matters worse,
> each target encodes exception tables different enough that very little can be shared.

Perhaps you're just not abstracting your intermediate model deeply enough.

--
Duane Rettig duane(a)franz.com Franz Inc. http://www.franz.com/
555 12th St., Suite 1450 http://www.555citycenter.com/
Oakland, Ca. 94607 Phone: (510) 452-2000; Fax: (510) 452-0182
From: Paul Gotch on
Duane Rettig <duane(a)franz.com> wrote:
> > struct X {
> > int x : 16;
> > long long y : 33;
> > long long z : 15;
> > }

> Ah, so you're giving me non-portable code to prove non-portability. I
> see.

That code is standard C99. however how it's represented is different
depending on how the compiler chooses to implement bitfields. Just as a C++
compiler can choose the length of enums to fit the declared values with in
them.

One way of taking a buffer of bytes an interpreting them as a protocol is to
cast the pointer to a packed struct however a soon as you this (in fact as
soon as you use a compiler extension such as a packed struct) you are
making assumptions about the compiler implementation and the underlying
machine. This is often worth it if it saves you hundres or thousands of
instructions worth of byte accesses, compares and branches.

Unfortunately when dealing with something where the typing is poorly defined
at compile time somewhere along the line you end up casting it through a
void* pointer and making assumptions about the memory layout. Now this may
be hidden inside an XDR library, or an ASN.1 library etc. however it's still
there.

-p
--
"Unix is user friendly, it's just picky about who its friends are."
- Anonymous
--------------------------------------------------------------------
From: Nick Maclaren on

In article <nlm*odR-r(a)news.chiark.greenend.org.uk>,
Paul Gotch <paulg(a)at-cantab-dot.net> writes:
|> Duane Rettig <duane(a)franz.com> wrote:
|> > > struct X {
|> > > int x : 16;
|> > > long long y : 33;
|> > > long long z : 15;
|> > > }
|>
|> > Ah, so you're giving me non-portable code to prove non-portability. I
|> > see.
|>
|> That code is standard C99. however how it's represented is different
|> depending on how the compiler chooses to implement bitfields. Just as a C++
|> compiler can choose the length of enums to fit the declared values with in
|> them.

I said that I was going to drop out, but you are a new poster on this
thread, and I can't let that pass :-(

No, it isn't, not even remotely. Even in its syntax alone, there are
two cases, where its very validity depends on implementation-defined
features (see 6.7.2 #5, 6.7.2.1 #4, #9 and others). In one case, the
above code could quietly give wrong answers if its assumption was
false - in the other, a compiler message is more-or-less required.

Duane's remarks about its representation considerably understate the
case - there was considerable debate on the SC22WG14 reflector about
bit-fields, and the decision (I won't say consensus) was to specify
their syntax and leave almost all other aspects implementation-defined
or unspecified.

In particular, the standard is VERY clear that the order of storing
bits in bit-fields is completely unspecified - there is nothing
stopping a compiler from using a completely different convention for
'int' bit-fields and plain 'int'. Someone stated that in so many
words, too.

Incidentally, as I read the standard, it is unspecified - not even
implementation-defined - whether the bits in the 'int' and first
'long long' will share a storage unit.

There's more, but I should need to explain the interaction with other
parts of C99 to explain the situation, so I shall stop.

From: Wilco Dijkstra on

"Duane Rettig" <duane(a)franz.com> wrote in message news:o0bq4ltf71.fsf(a)gemini.franz.com...
> "Wilco Dijkstra" <Wilco_dot_Dijkstra(a)ntlworld.com> writes:
>
>> "Duane Rettig" <duane(a)franz.com> wrote in message news:o0iqyttzes.fsf(a)gemini.franz.com...
>>> "Wilco Dijkstra" <Wilco_dot_Dijkstra(a)ntlworld.com> writes:

>> struct X {
>> int x : 16;
>> long long y : 33;
>> long long z : 15;
>> }
>
> Ah, so you're giving me non-portable code to prove non-portability. I
> see.

That shows how little you know about writing portable code and the C standard...

This code is 100% portable to any compiler that implements the C99 standard.
In fact even most non-C99 compilers can compile this code flawlessly - long long
was a defacto standard well before C99.

> Well of course every type has a size. That's also not at issue. The
> question is if the size matters, and whether the abstraction at the
> source level can be made portable. For example, we use a typedef
> called "nat", which is meant to be the natural word size of the
> program being run.

OK, so you do exactly the same as everybody else: using typedefs for types rather
than using the standard ones. So I think we all agree the C builtin types are useless
and non-portable. The models you mentioned have the same sizes for most types
apart from pointers (and sometimes long), so it's hard to argue they are evil. You may
not rely on their exact width but you still require them to be a specific minimum width.

>> So you add some more specific intermediate instructions and semantics. Do this
>> for several languages and targets, and you either end up with a huge intermediate
>> language
>
> but "huge" is so relative...
>
>>> or complex intermediate instructions whose semantics varies with the context.
>
> or if you spend the time to refactor the intermediate language, you
> start refining it to the point where it is able to take on more
> targets with less added complexity.

It's a nice ideal indeed, but has anyone ever succeeded in doing it? You can spend
a lot of time on refactoring, but there will always be languages that still don't fit in...

>> The current compiler I'm working on supports 3 different exception models in the
>> intermediate language, and that's just 3 languages so far... To make matters worse,
>> each target encodes exception tables different enough that very little can be shared.
>
> Perhaps you're just not abstracting your intermediate model deeply enough.

It has been abstracted as much as feasible. The problem is the semantics of various
exception models are so different you can't find a common high-level abstraction that
supports all. Of course at the lowest level you can represent exceptions as special
control flow edges in a CFG, but at that point you've lost most high-level info - and
thus the ability to optimise.

Wilco