From: Alexei A. Frounze on 6 Dec 2008 05:25
On Dec 6, 3:13 am, "robertwess...(a)yahoo.com" <robertwess...(a)yahoo.com>
> On Dec 5, 5:39 pm, "Alexei A. Frounze" <alexfrun...(a)gmail.com> wrote:
> > On Dec 6, 1:55 am, "robertwess...(a)yahoo.com" <robertwess...(a)yahoo.com>
> > wrote:
> > ...
> > > It was common (but not universal) in the past for word addressed
> > > machine to assign sequential words consecutive addresses. IOW, the
> > > addresses of the first few words on the machine are 0, 1, 2, 3...
> > > That requires mangling your char (and void) pointers so that you can
> > > store the byte offset somewhere. Either it's added to the bottom,
> > > leaving you with "natural" looking addresses, that you have to shift
> > > right before actually using to address storage, or you tuck them in
> > > the high end (assuming there's room), where you have to mask them off
> > > and then they're hard to use for the shifting of the word itself, or
> > > you add them on as an extension, and then you have char and void
> > > pointers being longer than other types of pointers. It can certainly
> > > be made to work, but is uglier than if the first (and now much more
> > > common) scheme is used.
> > Or one could implement C in such a way that chars and ints are of the
> > machine word size (>= 16 bits). That way the pointers to int and char
> > don't have to be of different size. Example: the compiler for TI's
> > TMS320C54xx series.
> Of course, but excessively large chars are not pleasant if you have to
> store a lot of them - or interoperate with a lot of other stuff in the
> world. So there's strong pressure to implement 8-bit chars, even if
> it's not particularly efficient or pretty. And the eternal issue that
> oddly sized chars break a lot of code.
That's possible since a lot of code blatantly ignores the fact that a
char can be larger than 8 bits. I'm usually writing such code too. :)
> It's also arguable that a hosted implementation cannot have sizeof
> (char) == sizeof(int), because of assumptions in the library (notably
> you cease being able to assign a unique value to EOF that cannot be
> returned by the character I/O functions). A freestanding
> implementation (common, of course, on DSPs), doesn't have those
If I'm not mistaken, in that implementation the characters were
truncated to 8 bits when they were stored to a text file (through fputc
(), fputs(), fprintf(), etc) and the values read back were 8-bit, so
EOF was probably OK. Of course, binary files gave some headache -- we
needed to repack things.
From: Alexei A. Frounze on 6 Dec 2008 05:59
On Dec 5, 9:51 pm, NathanCBa...(a)gmail.com wrote:
[hijacking your discussion with Rod]
> I am still not convinced that C is an assembly language.
Although it appears that C was designed after an assembler (directly
or indirectly -- doesn't matter), C-- is more of an assembler than C.
Just yesterday I read some of C-- docs and liked it.
> So, you are saying that C is a poorly designed language?
There're a few bad things about it. A few programmer-unfriendly things
here, a few subtle things there. To program in it correctly one needs
to read the standard or something equivalent (which many books
aren't). While this isn't different from reading the CPU manuals when
programming in assembly, C's slightly more abstract and high-level
nature (in comparison to asm) dupes many people. They either don't
read the standard (or an equivalent in simpler words) at all, or do so
selectively, missing some important facts about C and that leads to
bugs. What happens is that people come to C with different set of
concepts and assumptions than those actually in C. The sets somewhat
overlap which permits happy writing of crappy code for a while and the
C compiler often doesn't mind due to C's nature (akin to that of asm)
or disabled warnings/errors. Such bad code even appears to work, until
pushed beyond the intersection of the concept and assumption sets. At
that point bugs start pointing at the discrepancy and suggest reading
with attention the standard or a really good book based on it (went
through it myself due to lack of good literature when I started C).
Surprisingly, some people don't do it anyway.
From: Phil Carmody on 6 Dec 2008 07:17
"robertwessel2(a)yahoo.com" <robertwessel2(a)yahoo.com> writes:
> On Dec 6, 3:36�am, "Rod Pemberton" <do_not_h...(a)nohavenot.cmm> wrote:
>> Oh my! �It seems we're *BOTH* wrong here. �(Where's Phil when you need him?)
This thread *doesn't* need me. Robert seems to be saying everything
that I'd say.
> And I want to mention that I quote from the C99 standard more often
> only because I have that in electronic form, and only hardcopies of
> C89, which makes for less typing...
If you ask on c.l.c, someone will furnish you with pointers to
older versions, I'm sure. I used to have copies of various things
pre-C99, but don't any more, alas.
I tried the Vista speech recognition by running the tutorial. I was
amazed, it was awesome, recognised every word I said. Then I said the
wrong word ... and it typed the right one. It was actually just
detecting a sound and printing the expected word! -- pbhj on /.
From: H. Peter Anvin on 6 Dec 2008 14:02
Alexei A. Frounze wrote:
> That's possible since a lot of code blatantly ignores the fact that a
> char can be larger than 8 bits. I'm usually writing such code too. :)
For what it's worth, a POSIX implementation requires 8 bit chars. This
is due to the fact that the uint8_t type is required in certain structures.
From: Rod Pemberton on 7 Dec 2008 01:20
<robertwessel2(a)yahoo.com> wrote in message
On Dec 6, 3:36 am, "Rod Pemberton" <do_not_h...(a)nohavenot.cmm> wrote:
> <robertwess...(a)yahoo.com> wrote in message
> > > "Values stored in non-bit-field objects of any other object type
> > > consist of n * CHAR_BIT bits, where n is the size of an object of that
> > > type, in bytes. The value may be copied into an object of type
> > > unsigned char [n] (e.g., by memcpy); the resulting set of bytes is
> > > called the object representation of the value."
> > > Which clearly requires the equivalence of bytes and chars.
> > As I see it, no. They're only partially equivalent in one direction.
> > bytes must be greater or equal to chars in size. If a byte is 9-bits, a
> > char can be 8-bits. I.e., a char is not equivalent to a byte. But, a
> > can represent a char and then some.
> > > It clearly
> > > says that N bytes can be stored in N (unsigned) chars.
> > Reversed? I think that says N (whatever) chars fit in N bytes. Doesn't
> No, it says you can store an object of N bytes in an array of N
Where? Unless you misquoted, it says:
1) the object consists of n*CHAR_BITS
2) n is the number of bytes needed to build that object
3) the resulting set of bytes is the objects representation
4) an object comprised of some char's can be copied into a bunch of bytes
Their truth states:
1) is true regardless of the size of a byte
2) is true as long as a byte is larger than or equal to a char in bits
3) is true as long as a byte is larger than or equal to a char in bits
4) is true regardless of the size of a byte and is true as long as a byte
is larger than or equal to a char in bits
> So a byte must fit in a char. And you've acknowledged that a
> char must fit in a byte.
Illogical conclusion. You're basis is based upon you're misunderstanding of
what was stated.
> > > 184.108.40.206.1 of C99 ("Sizes of integer type" ) says in the definition of
> > > CHAR_BIT, "number of bits for smallest object that is not a bit-field
> > > (byte)". And further specified that CHAR_BIT be at least 8.
> > > Footnote 40 of 220.127.116.11 (C99): "A byte contains CHAR_BIT bits." Which
> > > happens to be exactly the same number of bit a char contains.
> > > There are numerous other such statements.
> > I haven't looked at those. But, I'd think most of these are likely
> > "incorrect" from the abstraction of C from mostly 8-bit architectures
> > was done for C89. Or, it's "understood" to currently be clarified by 3.6
> > and 3.7.1.
> How is the statement, taken directly from the standard, that a byte
> contains CHAR_BIT bits in any way related to eight bit
> implementations, or in any way ambiguous as to the exact size of a C
> byte (IOW, it's CHAR_BITS)?
Because, they clearly state exactly what I stated at the begining of this
discussion, which you demonstrated was incorrect, specifically reversed in
terms of my statement of byte and char. Therefore, it's only logical to
assume these are incorrect too and have their incorrectness based upon
historical abstractions from working versions of C.
> > There is no smaller unit of addressability in C than a char. Which is
> > the same as a byte.
> False. You just quoted C99 above! It said a char must fit in a byte.
> I.e., a byte can be larger than a char. It said the byte is the smallest
> addressable unit from C's perspective. I admit I got them reversed, but
> didn't grasp what you quoted!
> A byte fits in a char,
> and a char fits in a byte.
> If you can find
> wiggle room for different sizes in there, you're cleverer than I am.
There's no wiggle room. You proved one is False and the other is True by
quoting 3.6 and 3.7.1. If the other sections apply as you state, then one
or both of the definitions for 3.6 and/or 3.7.1 must be False.
> > > Your statement "if the smallest native addressable unit is 4-bits,
> > > that's a C byte. And a C char must be at least 8-bits, therefore it's
> > > at least two C bytes." is flatly wrong. There is not, without a non-
> > > standard extension, any addressability to anything smaller than a
> > > char. And C bytes may not be 4 bits. There is no type "byte" in C,
> > > it exists in the C mostly to distinguish the notion of the physically
> > > stored data in memory from the logical type char.
> > You're correct. This is all backwards. Think about it...
> > > That hardware bytes (for lack of a better term for the smallest
> > > addressable unit of storage) are commonly 8 bits these days is wholly
> > > irrelevant. Hardware bytes, whatever those may be, are *not*
> > > addressed by the C standard.
> > They are partially addressed by the C standard. What do you think
> > "addressable unit of data storage" really refers to? It refers to the
> > that C's byte, the smallest addressable unit of storage, must map onto
> > hardware's addressable unit or units.
> I have no clue what you're trying to say here.
Yup. Not to be offensive, but that's part of the problem.
> Obviously a C char or
> byte must eventually by stored in real memory, presumably in whatever
> physically addressable units that the hardware actually provides (the
> "hardware byte" under discussion). The C standard continues to impose
> no required relationship between the hardware byte and the C byte/
Explicitly, no. But, you can see remants of it in the spec, if you look. A
char being a minimum of 8-bits in limits.h is one such case. 3.6 and 3.7.1
don't say it must be 8-bits or larger. Technically, at the time C89 was
defined only ASCII and EBCDIC were in use. I.e., a char could've been
defined with a minimum of 7-bits. So, why do you think it's 8-bits? I
think it's 8-bits because C's with 8-bit chars and 8-bit bytes were used to
> Most real implementations will, of course, attempt a mapping
> between the two that is simple and efficient (eg. a C char/byte is
> implemented as a conventional 8-bit hardware byte), unless there is
> some really compelling reason to do otherwise.
A char's value is accessible in C, but a char is not addressable according
to the spec. you quoted. A byte is addressable according to the spec. you
quoted, but a byte's value may not be entirely accessible in C. Only the
part of byte which overlaps with a char is accessible. The byte represents
the hardware addessability issue which has to be solved by a real
implementation to implement objects in C comprised of C char's as contiguous
sequences of bytes on hardware. Does that make more sense?
> > > (...)
> >> A word addressed machine
> >> with 32 bit words (or hardware bytes), would need to generate code to
> >> pack and unpack four C chars (again assuming we wanted the
> >> implementation to have 8 bit C chars), from a single word as needed.
> >Not necessarily. It could implement chars as 32 bit words or some other
> >combination larger than 8-bits.
> What part of "assuming we wanted the implementation to have 8 bit C
> chars" did you miss in the above?
Nothing AFAICT. You can use a single 32-bit word to implement a single
8-bit char if you choose... It might be wasteful of space but quickest or
easiest to implement. In which case, there is no need to pack and unpack
four C chars, which clearly explains the "Not necessarily."
> > > All of which is irrelevant, except to implementation.
> > So, why'd you bring it up?
> Because you did, by appearing to conflate hardware bytes and C bytes.
!?!?!... (Interesting, Phil likes to use "conflate" too...)
> > > If you wanted to implement a system with 16 bit C bytes (and thus 16
> > > bit C chars), on a 8-bit-byte addressed machine, the compiler will
> > > have to generate code so that all char accesses address a pair of 8-
> > > bit hardware bytes. And the smallest addressable unit in the C
> > > program will be that 16 bit C char.
> > > Nor is your assertion that hardware with a 9-bit hardware byte
> > > requires a 9-bit C byte and char true.
> > Nowhere did I say that... Reread.
> "If the smallest native addressable unit is 9-bits, that's a C byte"
> appears to refer to hardware bytes, both in isolation and in context.
> If that's not what you meant, then my comment was superfluous.
That's exactly what was meant. Nowhere did I say this was "required".
Nowhere did I "assert". These are extra attributes you applied to the
example in the discussion. Reread.
> > > While that might well make for
> > > a convenient implementation on the machine, there is no reason that
> > > the implementation might not expose 8 bit C bytes and chars, and
> > > synthesize those out of the underlying 9-bit hardware bytes.
> > True. Haven't we been over this? Either this time or last time? May of
> > last year...
> Yes. And you basically refused to acknowledge that the C standard is
> not described in terms of real hardware,
Did I? (From the same para even that your FWIW came from...)
FWIW: RP: The "minimum model" requirements for C aren't part of the
definition of a "virtual machine," or of C's "abstract machine," or even
included in the C standards...
> Yes. And you basically refused to acknowledge that the C standard is
> not described in terms of real hardware,
Wrong. I said it's impossible to entirely abstract C from real hardware.
I've also said (maybe not in that thread...) that it's impossible to
understand C completely without understanding how it fits onto real
hardware. You said C was implemented on some "virtual machine"... What a
crock! A complete bastardization of 18.104.22.168 that you and Phil used as a
justification. An the "abstract machine" in 22.214.171.124 doens't refer to an
"abstract machine" in the normal sense or a "virtual machine." Both of
these are execution or interpretation environments implemented on real
hardware, usually in software. An "abstract machine" in 126.96.36.199 refers to
an imaginary unimplementable context that ensures proper C program
> and went away in a huff...
> FWIW: RP: "This, of course, is due to his continued belief in the pure
> abstraction of C from the underlying hardware and assembly: a
I "went away in a huff..."? Where do you get that from? You bailed out.
Phil insulted and bailed out. My last post is after you two.
> Any given implementation of course relates the two, but the C standard
> itself does not.
If the C spec. does not relate the two, then the C spec. itself is
unimplementable. No version of C can comply with the spec without this
relationship being defined. What is unimplementable is worthless. There is
no exception to this fact. You need to learn to read between the lines of
the C spec or add in historical context.
> Also obviously hardware with odd parameters may make
> C difficult to implement in various ways, and also clearly one reason
> that C is broadly popular is that most hardware does *not* produce
> significant difficulties for a C implementer.
Yes, exactly as I described in May 07:
FWIW: RP: I hinted at the truth by referring to statements by Alex Stepanov
in 1995, the primary creator of the C++ STL. He stated that Dennis Ritchie
designed C around a minimum model of computers which were well designed to
solve numerical problems: byte addressable memory, flat address spaces, and
pointers. He claimed that this minimum model, developed over many decades
using real computers, is the reason C is a success.
> The reverse is true as
> well, it's hard to image a modern hardware designer not taking ease of
> C implementation into account when designing an architecture.
Does the x64 instruction set support 8-bit bytes?
> > What "extended character set" ? The 3.6 section that you quoted only
> > supports the "basic character set"... How do you rationalize inserting
> > "extended character set" into the discussion if not supported by 3.6?
> Character set are defined in 5.2 (C99). The basic set include the
> upper and lower case letters, digits, 29 punctuation marks, space and
> several control characters. The extended set also includes all other
> characters an implementation provides. For example, on ASCII
> implementations, the at-sign is a character you'd find the extended
> set, but not in the basic. Extended characters that are not multi-
> byte characters (I omitted the "not multi-byte" condition in my first
> post), need to fit in a byte/char, but are not required to be positive
> values in a char.
This has no bearing whatsoever on the question asked. You said: "It [a
char] has the additional requirement of needing to be able to store all of
the characters in the extended character set." This contradicts 3.6 which
says a char only has to be able to store all of the characters in the "basic
character set." So, I asked how do you get a char needing to represent an
"extended character set" from 3.6 which says only a char only needs
represent the "basic character set." Nothing in your response relates to
> The bottom line is this: The C standard uses the terms byte and char
> essentially synonymously,
That may be true. And, I think that the fact that most C's used to derive
C89 had 8-bit bytes and 8-bit chars is likely the reason, which I stated
previously. But, some part of the spec. must accurately describe things.
If there's a discrepancy, then the issue must be resolved by the more
"authoratative" section. In this case, 3.6 and 3.7.1, the sections which
actually define the terms in question, should be considered "authoratative,"
IMO. Don't you agree?
> and further [a byte and char] must appear to be the same size
> from a C program's perspective.
False. A byte is not an accessible unit from C's perspective. A char is
accessible. If bytes are 9-bits, and chars are 8-bits, there is no way for
me to access the 9th bit of a byte from C. You can only access the lower 8
bits of the byte, which are the 8-bit char in this case.
> A byte is the unit of storage, the
> type is a char.
> A char must fit in a byte
> and a byte must fit in a