Prev: Win32 non blocking console input?
Next: hugi compo #29
From: robertwessel2 on 8 Dec 2008 18:50 On Dec 7, 12:20 am, "Rod Pemberton" <do_not_h...(a)nohavenot.cmm> wrote: > <robertwess...(a)yahoo.com> wrote in message > > news:8d725647-529b-40ae-a97e-ad6d296e10c4(a)33g2000yqm.googlegroups.com... > On Dec 6, 3:36 am, "Rod Pemberton" <do_not_h...(a)nohavenot.cmm> wrote: > > > <robertwess...(a)yahoo.com> wrote in message > > > > > "Values stored in non-bit-field objects of any other object type > > > > consist of n * CHAR_BIT bits, where n is the size of an object of that > > > > type, in bytes. The value may be copied into an object of type > > > > unsigned char [n] (e.g., by memcpy); the resulting set of bytes is > > > > called the object representation of the value." > > > > > Which clearly requires the equivalence of bytes and chars. > > > > As I see it, no. They're only partially equivalent in one direction. > I.e., > > > bytes must be greater or equal to chars in size. If a byte is 9-bits, a > > > char can be 8-bits. I.e., a char is not equivalent to a byte. But, a > byte > > > can represent a char and then some. > > > > > It clearly > > > > says that N bytes can be stored in N (unsigned) chars. > > > > Reversed? I think that says N (whatever) chars fit in N bytes. Doesn't > it? > > > No, it says you can store an object of N bytes in an array of N > > chars. > > Where? Unless you misquoted, it says: > 1) the object consists of n*CHAR_BITS > 2) n is the number of bytes needed to build that object > 3) the resulting set of bytes is the objects representation > 4) an object comprised of some char's can be copied into a bunch of bytes > > Their truth states: > 1) is true regardless of the size of a byte > 2) is true as long as a byte is larger than or equal to a char in bits > 3) is true as long as a byte is larger than or equal to a char in bits > 4) is true regardless of the size of a byte and is true as long as a byte > is larger than or equal to a char in bits Ugh. Your first #4 is plainly wrong. The standard says the N bytes of an object can be copied into an array N unsigned chars. Those characters are also stored in bytes, of course. Given the utterly plain language that you are misinterpreting, I'm beginning to wonder if you're serious. Here's the statement broken apart a bit: "Values stored in non-bit-field objects of any other object type consist of n * CHAR_BIT bits, where n is the size of an object of that type, in bytes." - an object is stored in N bytes, each with CHAR_BIT bits "The value may be copied into an object of type unsigned char [n] (e.g., by memcpy);" - Those N bytes, or at least the values contained therein, can be copied in to an array of N chars. "the resulting set of bytes is called the object representation of the value." - that array of chars is also a bunch of bytes And note that memcpy() is defined to move *chars*. So memcpy, which moves chars, also happens to move the same number of bytes (by the above). > > So a byte must fit in a char. And you've acknowledged that a > > char must fit in a byte. > > Illogical conclusion. You're basis is based upon you're misunderstanding of > what was stated. > > > > > > > 5.2.4.2.1 of C99 ("Sizes of integer type" ) says in the definition of > > > > CHAR_BIT, "number of bits for smallest object that is not a bit-field > > > > (byte)". And further specified that CHAR_BIT be at least 8. > > > > > Footnote 40 of 6.2.6.1 (C99): "A byte contains CHAR_BIT bits." Which > > > > happens to be exactly the same number of bit a char contains. > > > > > There are numerous other such statements. > > > > I haven't looked at those. But, I'd think most of these are likely > > > "incorrect" from the abstraction of C from mostly 8-bit architectures > that > > > was done for C89. Or, it's "understood" to currently be clarified by 3.6 > > > and 3.7.1. > > > How is the statement, taken directly from the standard, that a byte > > contains CHAR_BIT bits in any way related to eight bit > > implementations, or in any way ambiguous as to the exact size of a C > > byte (IOW, it's CHAR_BITS)? > > Because, they clearly state exactly what I stated at the begining of this > discussion, which you demonstrated was incorrect, specifically reversed in > terms of my statement of byte and char. Therefore, it's only logical to > assume these are incorrect too and have their incorrectness based upon > historical abstractions from working versions of C. > > > > There is no smaller unit of addressability in C than a char. Which is > > > the same as a byte. > > > False. You just quoted C99 above! It said a char must fit in a byte. > > I.e., a byte can be larger than a char. It said the byte is the smallest > > addressable unit from C's perspective. I admit I got them reversed, but > you > > didn't grasp what you quoted! > > ... It's completely unclear how you think 3.6 and 3.7.1 contradict each other. And while "A fits in B" can mean "B is larger than A" ("the baseball fits in the shoebox"), it's also perfectly valid to use that form to mean "B (exactly) fits in A". For example "bolt A fits in nut B." Despite the fact that a 1/8 inch bolt will actually "fit" into a 1/4 inch nut, the plain meaning is that the bolt *exactly* matches the nut. While the two sections were talking about may not make it completely clear which of the two forms of fit they mean, numerous other places in the standard do. Nice selective quoting, BTW. Why not at least address the even plainer extended quote from footnote 40: "A byte contains CHAR_BIT bits, and the values of type unsigned char range from 0 to 2**CHAR_BIT - 1." So a byte contains CHAR_BIT bits. And the numbers that you can put in an unsigned char exactly correspond to that. It's not "unsigned char ranges from zero to no more than (2**CHAR_BIT - 1)" - rather the range is exact. So a byte contains exactly the number of bits that can fit in an unsigned char. And an unsigned char can hold exactly the number of different values that can fit in a byte. > > A byte fits in a char, > > False. > > > and a char fits in a byte. > > True. > > > If you can find > > wiggle room for different sizes in there, you're cleverer than I am. > > There's no wiggle room. You proved one is False and the other is True by > quoting 3.6 and 3.7.1. If the other sections apply as you state, then one > or both of the definitions for 3.6 and/or 3.7.1 must be False. I'm utterly baffled. > > > > Your statement "if the smallest native addressable unit is 4-bits, > > > > that's a C byte. And a C char must be at least 8-bits, therefore it's > > > > at least two C bytes." is flatly wrong. There is not, without a non- > > > > standard extension, any addressability to anything smaller than a > > > > char. And C bytes may not be 4 bits. There is no type "byte" in C, > > > > it exists in the C mostly to distinguish the notion of the physically > > > > stored data in memory from the logical type char. > > > > You're correct. This is all backwards. Think about it... > > > > > That hardware bytes (for lack of a better term for the smallest > > > > addressable unit of storage) are commonly 8 bits these days is wholly > > > > irrelevant. Hardware bytes, whatever those may be, are *not* > > > > addressed by the C standard. > > > > They are partially addressed by the C standard. What do you think > > > "addressable unit of data storage" really refers to? It refers to the > fact > > > that C's byte, the smallest addressable unit of storage, must map onto > the > > > hardware's addressable unit or units. > > > I have no clue what you're trying to say here. > > Yup. Not to be offensive, but that's part of the problem. If the reader is baffled, it may be incompetence on either the part of the reader, or the writer. Or both, of course. > > Obviously a C char or > > byte must eventually by stored in real memory, presumably in whatever > > physically addressable units that the hardware actually provides (the > > "hardware byte" under discussion). The C standard continues to impose > > no required relationship between the hardware byte and the C byte/ > > char. > > Explicitly, no. But, you can see remants of it in the spec, if you look. A > char being a minimum of 8-bits in limits.h is one such case. 3.6 and 3..7.1 > don't say it must be 8-bits or larger. Technically, at the time C89 was > defined only ASCII and EBCDIC were in use. I.e., a char could've been > defined with a minimum of 7-bits. So, why do you think it's 8-bits? I > think it's 8-bits because C's with 8-bit chars and 8-bit bytes were used to > create C89. Clearly much of C89 was an attempt to codify existing practice. True enough, 3.6 and 3.7 don't require a minimum of 8 bits but that happens elsewhere. So what? And machines with six, nine and ten bit characters were in (reasonably) common use at the time C89 was being written. Some even had C implementations. They set some minimums, because that's useful. Obviously other minimums are implied in various ways (for example, you couldn't have six bit chars, because there are too many required characters in the basic set). Why did they settle on eight bits when seven would have done? No existing practice, for one, and little point for another (what would be the odds that someone would actually build such a machine?). It also helps the programmer by setting a useful minimum - an eight bit byte will, in fact, accommodate the vast majority of the worlds stored data (at least at that time now theres a bunch stored in Unicode too). A related example, why did they set the minimum size of a long to be 32 bits? The changes in the standard required to make it 16 bits would be trivial (and there are plenty of machines on which 32 bits is *not* a natural type), but I'm happy they chose the larger value, since it makes my life easier. > > Most real implementations will, of course, attempt a mapping > > between the two that is simple and efficient (eg. a C char/byte is > > implemented as a conventional 8-bit hardware byte), unless there is > > some really compelling reason to do otherwise. > > A char's value is accessible in C, but a char is not addressable according > to the spec. you quoted. A byte is addressable according to the spec. you > quoted, but a byte's value may not be entirely accessible in C. Only the > part of byte which overlaps with a char is accessible. The byte represents > the hardware addessability issue which has to be solved by a real > implementation to implement objects in C comprised of C char's as contiguous > sequences of bytes on hardware. Does that make more sense? That's true of a hardware byte. Not a C byte. The implementation creates some mapping between C bytes and hardware bytes, not necessarily 1:1 (an implementation on a 9 bit machine could map nine 8 bit C bytes into eight 9-bit hardware bytes), or even using the entirety of the hardware bytes (an implementation on a 9 bit machine might ignore one bit of each hardware byte, thus making the mapping of eight bit C bytes onto the hardware bytes appear to be 1:1, while leaving that ninth bit completely hidden from the C program). But you cannot use those hidden bit in other types, either IOW, if youve hidden that bit from a char, you cannot use it in an int. They also require that the C types are binary, despite there (historically) having been numerous decimal machines. This would make an implementation of C on a decimal machine quite painful, although not impossible - you could for example, map three 8 bit C chars onto a eight decimal digit word (aka hardware byte), and ignore the extra range. The packing and unpacking of those values would be painful, to say the least. Again, the exact size (or representation) of hardware bytes is *not* specified by the C standard, which defines only C bytes. The implementation must establish some mapping between the two. Obviously we'd prefer such a mapping is easy and efficient (as presumably would the folks on the C committee), so it's not surprising that the requirements for C mapping fairly well onto common hardware (and that, of course, works in both directions). > > > > (...) > > >> A word addressed machine > > >> with 32 bit words (or hardware bytes), would need to generate code to > > >> pack and unpack four C chars (again assuming we wanted the > > >> implementation to have 8 bit C chars), from a single word as needed. > > > >Not necessarily. It could implement chars as 32 bit words or some other > > >combination larger than 8-bits. > > > What part of "assuming we wanted the implementation to have 8 bit C > > chars" did you miss in the above? > > Nothing AFAICT. You can use a single 32-bit word to implement a single > 8-bit char if you choose... It might be wasteful of space but quickest or > easiest to implement. In which case, there is no need to pack and unpack > four C chars, which clearly explains the "Not necessarily." Unused bytes *between* objects (whether for alignment, some other whim of the compiler), are a different (and mostly irrelevant) issue. You can, absolutely, create an implementation where only eight bits of each (let's say 32 bit for the sake of discussion) word are used. That would lead to an array of chars being a sequence of (32 bit) words. You cannot, however, then put an int (or long) into a single 32 bit word. That would break the ability to copy the array of bytes that make up the int into an array of chars, and leave the values intact (see the extended quote from footnote 40 above, for example), and would also make the result of sizeof illogical. > > > > All of which is irrelevant, except to implementation. > > > > So, why'd you bring it up? > > > Because you did, by appearing to conflate hardware bytes and C bytes. > > !?!?!... (Interesting, Phil likes to use "conflate" too...) Good for Phil. "Conflate" is an excellent word, of the best quality. It should be used more often. > > > > If you wanted to implement a system with 16 bit C bytes (and thus 16 > > > > bit C chars), on a 8-bit-byte addressed machine, the compiler will > > > > have to generate code so that all char accesses address a pair of 8- > > > > bit hardware bytes. And the smallest addressable unit in the C > > > > program will be that 16 bit C char. > > > > > Nor is your assertion that hardware with a 9-bit hardware byte > > > > requires a 9-bit C byte and char true. > > > > Nowhere did I say that... Reread. > > > "If the smallest native addressable unit is 9-bits, that's a C byte" > > appears to refer to hardware bytes, both in isolation and in context. > > True. > > > If that's not what you meant, then my comment was superfluous. > > That's exactly what was meant. Nowhere did I say this was "required". > Nowhere did I "assert". These are extra attributes you applied to the > example in the discussion. Reread. > > > > > While that might well make for > > > > a convenient implementation on the machine, there is no reason that > > > > the implementation might not expose 8 bit C bytes and chars, and > > > > synthesize those out of the underlying 9-bit hardware bytes. > > > > True. Haven't we been over this? Either this time or last time? May of > > > last year... > > > Yes. And you basically refused to acknowledge that the C standard is > > not described in terms of real hardware, > > Did I? (From the same para even that your FWIW came from...) > > FWIW: RP: The "minimum model" requirements for C aren't part of the > definition of a "virtual machine," or of C's "abstract machine," or even > included in the C standards... > > > Yes. And you basically refused to acknowledge that the C standard is > > not described in terms of real hardware, > > Wrong. I said it's impossible to entirely abstract C from real hardware. > I've also said (maybe not in that thread...) that it's impossible to > understand C completely without understanding how it fits onto real > hardware. You said C was implemented on some "virtual machine"... What a > crock! A complete bastardization of 5.1.2.3 that you and Phil used as a > justification. An the "abstract machine" in 5.1.2.3 doens't refer to an > "abstract machine" in the normal sense or a "virtual machine." Both of > these are execution or interpretation environments implemented on real > hardware, usually in software. An "abstract machine" in 5.1.2.3 refers to > an imaginary unimplementable context that ensures proper C program > execution. > > > and went away in a huff... > > > FWIW: RP: "This, of course, is due to his continued belief in the pure > > abstraction of C from the underlying hardware and assembly: a > > fallacy." > > I "went away in a huff..."? Where do you get that from? You bailed out. > Phil insulted and bailed out. My last post is after you two. When Phil wondered why you didn't respond to my post, you said "No. I decided it wasn't in my best interest to pursue the conversation with RW." If you've decided not to pursue the conversation with me, do you expect me to continue talking to myself? But I have to simply disagree about the need to consider history when reading the C standard. It may help you to understand why certain things are the way they are, may illustrate which of several choices an implementation may make might be the "better" one, and may well help you understand the standard. The whole point of the C standard is to provide the complete* definition, explicitly so that knowledge of history is not required. They may not succeed 100%, but they come pretty darn close. *Complete within the context that the standard was written. It does not, for example, define the term "binary," or much other industry jargon. > > Any given implementation of course relates the two, but the C standard > > itself does not. > > If the C spec. does not relate the two, then the C spec. itself is > unimplementable. No version of C can comply with the spec without this > relationship being defined. What is unimplementable is worthless. There is > no exception to this fact. You need to learn to read between the lines of > the C spec or add in historical context. There you go, conflating again... ;-) The implementation creates the mapping. It defines the relationship That's it's job. Consider the IEEE math standard. It also does not talk about any physical implementation, at best it talks about patterns of bits, and what various operations do to those patterns of bits. Sort of like the C standard. In both cases I can write a program that does something well defined, with no reference to any particular implementation. If I actually hope to run my program, I'm going to have to find an implementation. In both cases, various attributes of a particular implementation might well be visible to me, and I might well make use of such information. For example, the exact sizes of various types - for example, the number of bits in an IEEE extended double, or the order in which those bits are stored, or with what padding. Those details might well be important to a particular program, but assuming I write a program that depends on a 128 bit extended double (as opposed to the minimum 80 bit), I clearly restrict the its portability (even more so than just assuming that a minimum extended double exists at all, since it's an optional type). > > Also obviously hardware with odd parameters may make > > C difficult to implement in various ways, and also clearly one reason > > that C is broadly popular is that most hardware does *not* produce > > significant difficulties for a C implementer. > > Yes, exactly as I described in May 07: > > FWIW: RP: I hinted at the truth by referring to statements by Alex Stepanov > in 1995, the primary creator of the C++ STL. He stated that Dennis Ritchie > designed C around a minimum model of computers which were well designed to > solve numerical problems: byte addressable memory, flat address spaces, and > pointers. He claimed that this minimum model, developed over many decades > using real computers, is the reason C is a success. > > > The reverse is true as > > well, it's hard to image a modern hardware designer not taking ease of > > C implementation into account when designing an architecture. Which is all true, but all irrelevant. C has certainly, and usefully, been implemented on machines which do not meet that minimal model. Ask anyone who's ever compiled a large model C program on 16 bit x86 (where pointers and the address space are most assuredly not flat). Or someone who's used C on a non-byte addressable microcontroller where the implementers decided to implement large chars (instead of synthesizing them out of the actual words). > Does the x64 instruction set support 8-bit bytes? Assuming you mean x86-64, yes. > > The bottom line is this: The C standard uses the terms byte and char > > essentially synonymously, > > That may be true. And, I think that the fact that most C's used to derive > C89 had 8-bit bytes and 8-bit chars is likely the reason, which I stated > previously. But, some part of the spec. must accurately describe things. > If there's a discrepancy, then the issue must be resolved by the more > "authoratative" section. In this case, 3.6 and 3.7.1, the sections which > actually define the terms in question, should be considered "authoratative," > IMO. Don't you agree? > > > and further [a byte and char] must appear to be the same size > > from a C program's perspective. > > False. A byte is not an accessible unit from C's perspective. A char is > accessible. If bytes are 9-bits, and chars are 8-bits, there is no way for > me to access the 9th bit of a byte from C. You can only access the lower 8 > bits of the byte, which are the 8-bit char in this case. > > > A byte is the unit of storage, the > > type is a char. > > True. > > > A char must fit in a byte > > True. > > > and a byte must fit in a > > char > > False. Again you insist on conflating the C notion of a byte and the hardware concept. The C standard does not talk about the implementation, it defines what a C program (and thus a C compiler, aka implementation) needs to appear to do. It makes those definitions in terms of an abstract machine, within which it defines various local terms like "byte". The implementation maps that onto real (for some definition of real) hardware. You appear to have some fundamental objection to that state of affairs, which I am failing to understand, but that is the way it is.
From: robertwessel2 on 8 Dec 2008 19:04 On Dec 6, 6:17 am, Phil Carmody <thefatphil_demun...(a)yahoo.co.uk> wrote: > "robertwess...(a)yahoo.com" <robertwess...(a)yahoo.com> writes: > > And I want to mention that I quote from the C99 standard more often > > only because I have that in electronic form, and only hardcopies of > > C89, which makes for less typing... > > If you ask on c.l.c, someone will furnish you with pointers to > older versions, I'm sure. I used to have copies of various things > pre-C99, but don't any more, alas. I have the common copy of the C89 draft, but I hesitate to quote from it because the section numbering is quite different. And there are a few (minor) substantive changes, too.
From: Rod Pemberton on 9 Dec 2008 15:00 <robertwessel2(a)yahoo.com> wrote in message news:b8426bcf-7cca-474c-85f3-67daab581e07(a)x38g2000yqj.googlegroups.com... > On Dec 7, 12:20 am, "Rod Pemberton" <do_not_h...(a)nohavenot.cmm> wrote: > > <robertwess...(a)yahoo.com> wrote in message > > > > news:8d725647-529b-40ae-a97e-ad6d296e10c4(a)33g2000yqm.googlegroups.com... > > On Dec 6, 3:36 am, "Rod Pemberton" <do_not_h...(a)nohavenot.cmm> wrote: > > > > > <robertwess...(a)yahoo.com> wrote in message > > > > > > > "Values stored in non-bit-field objects of any other object type > > > > > consist of n * CHAR_BIT bits, where n is the size of an object of > that > > > > > type, in bytes. The value may be copied into an object of type > > > > > unsigned char [n] (e.g., by memcpy); the resulting set of bytes is > > > > > called the object representation of the value." [...] > > > > Where? Unless you misquoted, it says: > > 1) the object consists of n*CHAR_BITS > > 2) n is the number of bytes needed to build that object > > 3) the resulting set of bytes is the objects representation > > 4) an object comprised of some char's can be copied into a bunch of bytes > > > > Ugh. Your first #4 is plainly wrong. No. > The standard says the N bytes > of an object can be copied into an array N unsigned chars. It might. But, it's definately not part of what you quoted above... > "Values stored in non-bit-field objects of any other object type > consist of n * CHAR_BIT bits, where n is the size of an object of that > type, in bytes." > > - an object is stored in N bytes, #2) > each with CHAR_BIT bits No. First, "object" definately isn't CHAR_BITS in size. Second, it doesn't say "bytes" is CHAR_BITS in size either. "Object" and "bytes" are the only two nouns in the first part of your phrase, one of which much represent "each." I took it to be "bytes" due to context and proximity. What that does say is the values, of certain types of objects, are n*CHAR_BITS bits with n being obtained from the object's size in bytes. I.e., if sizeof(long)==4, then the values occupy total bits of 4*CHAR_BITS. While the number of bytes needed, N, is mentioned, there is no mention of bytes being a certain number of bits, AFAICT. (BTW, do you diagram sentences as you read them?) > "The value may be copied into an object of type unsigned char [n] > (e.g., by memcpy);" > > - Those N bytes, No. It doesn't say "N bytes". It says, "The value... may be copied." > or at least the values contained therein, Yes. > can be > copied in to an array of N chars. No. It doesn't say into an array of "N chars" anywhere. It says "The value... may be copied" into "an object" which is "of type unsigned char[n]". An object of type unsigned char[n] is comprised of N bytes, see #2. > "the resulting set of bytes is called the object representation of the > value." > > - that array of chars is also a bunch of bytes No. An "array" (in quotes since C doesn't actually have arrays, only array declarations...), uh, never mind... A sequence of C chars has values which are storable in a sequence of C bytes. But, a sequence of C bytes has values which aren't necessarily storable in a sequence of C chars. > And note that memcpy() is defined to move *chars*. .... > So memcpy, which > moves chars, also happens to move the same number of bytes (by the > above). No. It only has to move the char's values, not bytes. I.e., if both C and hardware bytes are 9-bits and C char's are 8-bits, then the C char's values are 8-bits. It only has to copy the 8-bit values into a new set of 9-bit C or hardware bytes. I.e., it could clear the 9th bit, logical or 8-bits leave whatever garbage is in the 9th bit, or set the 9th bit because the 9th bit can't be accessed via C. [AFAICT, snip unrelated] > It's completely unclear how you think 3.6 and 3.7.1 contradict each > other. What? That response would've made sense further below in your statements, but not here... > "A byte contains CHAR_BIT bits, and the values of type unsigned char > range from 0 to 2**CHAR_BIT - 1." So a byte contains CHAR_BIT bits. > And the numbers that you can put in an unsigned char exactly > correspond to that. It's not "unsigned char ranges from zero to no > more than (2**CHAR_BIT - 1)" - rather the range is exact. So a byte > contains exactly the number of bits that can fit in an unsigned char. > And an unsigned char can hold exactly the number of different values > that can fit in a byte. "(Adapted from the American National Dictionary for Information Processing Systems.)" It probably should read: A) "An unsigned char contains CHAR_BIT bits, and the values of type unsigned char range from 0 to 2CHAR_BIT - 1." B) "If a byte contains CHAR_BIT bits, then the values of type unsigned char range from 0 to 2CHAR_BIT - 1." I think they used "unsigned char" as a synonym for a byte. It's a typo. > > > Most real implementations will, of course, attempt a mapping > > > between the two that is simple and efficient (eg. a C char/byte is > > > implemented as a conventional 8-bit hardware byte), unless there is > > > some really compelling reason to do otherwise. > > > > A char's value is accessible in C, but a char is not addressable according > > to the spec. you quoted. A byte is addressable according to the spec. you > > quoted, but a byte's value may not be entirely accessible in C. Only the > > part of byte which overlaps with a char is accessible. The byte represents > > the hardware addessability issue which has to be solved by a real > > implementation to implement objects in C comprised of C char's as > contiguous > > sequences of bytes on hardware. Does that make more sense? > > That's true of a hardware byte. Not a C byte. It's true of a C byte too. > The implementation > creates some mapping between C bytes and hardware bytes, not > necessarily 1:1 (an implementation on a 9 bit machine could map nine 8 > bit C bytes into eight 9-bit hardware bytes), or even using the > entirety of the hardware bytes (an implementation on a 9 bit machine > might ignore one bit of each hardware byte, thus making the mapping of > eight bit C bytes onto the hardware bytes appear to be 1:1, while > leaving that ninth bit completely hidden from the C program). But you > cannot use those hidden bit in other types, either - IOW, if you've > hidden that bit from a char, you cannot use it in an int. Let's say the C byte is 9-bits because the hardware byte is 9-bits. But, the C char is 8-bits. How do you access the ninth bit of the C byte in C? (You can't.) The byte is the addressable unit which can address 9-bits both in C and on hardware. But, the char is the "value unit" which can only access 8 of those 9-bits. I.e., your "byte must fit in a char" doesn't work in this legal example. This is entirely *independent* of whether you think I'm "conflating" C bytes and hardware bytes. > Unused bytes *between* objects (whether for alignment, some other whim > of the compiler), are a different (and mostly irrelevant) issue. Not irrelevant. It directly affects your understanding of 3.6 and 3.7.1. > You > can, absolutely, create an implementation where only eight bits of > each (let's say 32 bit for the sake of discussion) word are used. > That would lead to an array of chars being a sequence of (32 bit) > words. You cannot, however, then put an int (or long) into a single > 32 bit word. You could, but not spec. compliantly. Every object being representable as a sequence of char's would be broken and you'd have to ensure memcpy() copied 32-bits behind the scenes instead of C chars. You'd probably want to limit int, long etc. to 32-bits or whatever size the modified memcpy() was using. Int's and long's would be only accessible as int's and long's, not via char's. The offset operator should still work allowing "arrays"... > That would break the ability to copy the array of bytes > that make up the int into an array of chars, and leave the values > intact (see the extended quote from footnote 40 above, for example), True. > and would also make the result of sizeof illogical. No. If sizeof(long)==4 and long is 32-bits, then given a long "broken up" over 4 8-bit char's which consume 32-bits each, sizeof(long)==4 is still valid. I.e., it's still four char's. The value returned by sizeof would only be illogical if you don't "break up" long's and int's into char's as required. > Good for Phil. "Conflate" is an excellent word, of the best quality. > It should be used more often. It's also used much by one individual on comp.lang.c. > When Phil wondered why you didn't respond to my post, you said "No. I > decided it wasn't in my best interest to pursue the conversation with > RW." I'm still not sure it's in my best interest... ;) I recall getting into a number of long drawn out conversations on C in threads and NG's unrelated to C and was really tired of that. I don't immediately recall if one of those was with you. I do know I've a few with Phil, and a few with others who frequent comp.lang.c. They tend to harass those even off c.l.c. with their frequently incorrect "understanding" of C, IMO. The usual c.l.c. response: 100+ individuals claim you are wrong without being able to prove it 50+ individuals insult you without remorse 10 individuals claim you are wrong but use faulty logic 1 individual attempts to prove you are wrong but can't do so 1 individual makes an almost correct proof, by ignoring a fact or two > If you've decided not to pursue the conversation with me, do you > expect me to continue talking to myself? Continue "writing" to yourself...? Are you currently talking to yourself? If so, then I expect you'll likely continue... ;) You seemed to have waited over a year to resume this conversational topic with me on C in an assembly NG. Coincidence? > But I have to simply disagree about the need to consider history when > reading the C standard. Really? BTW, which standard?... There are at least four, IMO. And, you can't decide which to *one* of them to read without "need[ing] to consider history." There are sufficiently large differences between them. > It may help you to understand why certain > things are the way they are, may illustrate which of several choices > an implementation may make might be the "better" one, and may well > help you understand the standard. The whole point of the C standard > is to provide the complete* definition, That's the problem: one can't provide a complete definition for C unless the hardware is 100% identical on every platform. One can only provide a somewhat complete definition if the language uses the bare minimum de-facto features of the computing hardware: basic arithmetic, addresses, integers, contiguous memory, byte-sized memory, etc. I.e., these underlying characteristics are implicitly standardized by the C standard. It doesn't matter if the C spec. doesn't mention them explicitly. They are a requirement to implementing C. > explicitly so that knowledge > of history is not required. They may not succeed 100%, but they come > pretty darn close. How do setjmp and longjmp fit? How do arg's passed from the environment fit? How do you implement realloc without access to underlying OSes memory allocator? How do you implement the C library if you don't have a *nix concept of files and equivalents to unistd.h like functions: open,close,read,write,lseek? They succeeded in abstracting that which was abstractable: grammar syntax, arithmetic, types, and or portable due to de-facto hardware standardization. If C didn't have 15+ years of usage prior to C89 which proved some "portability" or "adaptability" of the language, do you think C would have ever been standardized? Of course not, it'd have died out. > *Complete within the context that the standard was written. It does > not, for example, define the term "binary," or much other industry > jargon. Nor, does it properly define a byte or an abstract machine or many other standardized terms ... etc. > > > Any given implementation of course relates the two, but the C standard > > > itself does not. > > > > If the C spec. does not relate the two, then the C spec. itself is > > unimplementable. No version of C can comply with the spec without this > > relationship being defined. What is unimplementable is worthless. There is > > no exception to this fact. You need to learn to read between the lines of > > the C spec or add in historical context. > > There you go, conflating again... ;-) No. I believe I stated the truth accurately. > The implementation creates the mapping. It defines the relationship > That's it's job. Consider the IEEE math standard. It also does not > talk about any physical implementation, at best it talks about > patterns of bits, and what various operations do to those patterns of > bits. Sort of like the C standard. In both cases I can write a > program that does something well defined, with no reference to any > particular implementation. The last sentence is true but only to a limited degree. It is nowhere near true in it's entirety. You can't expect to use argv/argc parameters to main portably, realloc portably, setjmp/longjmp portably, files portably, structures accessed as "arrays" portably, escape characters portably, int portably, getenv portably, signals portably, offsetof portably, errno portably, exit portably, etc. In the case of C, C is constrained by that which is common among differing computer architectures. Yet, when I tell you that by providing specific quotes of Alex Stepanov or perhaps if I said "C captures the essence of RISC but not CISC," you blantantly declare it to be true but irrelevant. E.g., RW said: "Which is all true, but all irrelevant." So, your point here must be irrelevant too, from your perspective. I.e., your current point CANNOT be RELEVANT *AND* at the same time BE IRRELEVANT for my other point when they're are both based on platform commonality. That's irrational, illogical, and contradictory. > > False. A byte is not an accessible unit from C's perspective. A char is > > accessible. If bytes are 9-bits, and chars are 8-bits, there is no way for > > me to access the 9th bit of a byte from C. You can only access the lower 8 > > bits of the byte, which are the 8-bit char in this case. > > > > > A byte is the unit of storage, the > > > type is a char. > > > > True. > > > > > A char must fit in a byte > > > > True. > > > > > and a byte must fit in a > > > char > > > > False. > > > Again you insist on conflating the C notion of a byte and the hardware > concept. No. This is what 3.6 and 3.7.1 say: 1) byte is the (addressable) unit of storage - "byte: addressable unit of data storage" 2) char must fit in a byte - "character ... bit representation that fits in a byte" 3) byte can't be smaller than a char, i.e., it must be equal or larger in size - "byte: ... large enough to hold" "3.6: byte: addressable unit of data storage large enough to hold any member of the basic character set of the execution environment." "3.7.1: character - single-byte character <C> bit representation that fits in a byte." > The C standard does not talk about the implementation, it defines what > a C program (and thus a C compiler, aka implementation) needs to > appear to do. It makes those definitions in terms of an abstract > machine, within which it defines various local terms like "byte". The > implementation maps that onto real (for some definition of real) > hardware. You appear to have some fundamental objection to that state > of affairs, I think you didn't fully understand what you read... Rod Pemberton
From: Glen Herrmannsfeldt on 9 Dec 2008 18:34 robertwessel2(a)yahoo.com wrote: (snip) > Nor is your assertion that hardware with a 9-bit hardware byte > requires a 9-bit C byte and char true. While that might well make for > a convenient implementation on the machine, there is no reason that > the implementation might not expose 8 bit C bytes and chars, and > synthesize those out of the underlying 9-bit hardware bytes. It > could, for example, store 9 C chars in 8 hardware bytes, or store one > C char per hardware byte, and ignore one bit of each hardware byte. > In fact, you might be tempted to do such a thing if you wanted to port > much existing C code to your 9-bit-byte machine, simply because so > much code will break if CHAR_BIT is not 8. Well, sizeof(int) must be an integer, so on a machine with 9 bit hardware and, for example, a 36 bit int then the C byte could not be 8 bits. More specifically, the PDP-10 is a 36 bit word addressed machine which has the ability to load/store "bytes" smaller than 36 bits. The possible CHAR_BIT for such machines are 9, 12, 18, and 36. Since it does have operations on 18 bit halfwords it is likely that short would be 18 bits leaving 9 and 18 bits for C char. > It's also arguable that a hosted implementation cannot have sizeof > (char) == sizeof(int), because of assumptions in the library (notably > you cease being able to assign a unique value to EOF that cannot be > returned by the character I/O functions). A freestanding > implementation (common, of course, on DSPs), doesn't have those > issues. I believe it is done on word addressed machines. In most cases, it is important that the EOF value must not be the value of a character in the character set. Machines with 32 bit char most likely don't actually use all the values. (UTF-32 is pretty rare.) -- glen
From: robertwessel2 on 9 Dec 2008 18:40
On Dec 9, 2:00 pm, "Rod Pemberton" <do_not_h...(a)nohavenot.cmm> wrote: > <robertwess...(a)yahoo.com> wrote in message > > news:b8426bcf-7cca-474c-85f3-67daab581e07(a)x38g2000yqj.googlegroups.com... > > > On Dec 7, 12:20 am, "Rod Pemberton" <do_not_h...(a)nohavenot.cmm> wrote: > > > <robertwess...(a)yahoo.com> wrote in message > > > >news:8d725647-529b-40ae-a97e-ad6d296e10c4(a)33g2000yqm.googlegroups.com.... > > > On Dec 6, 3:36 am, "Rod Pemberton" <do_not_h...(a)nohavenot.cmm> wrote: > > > > > <robertwess...(a)yahoo.com> wrote in message > > > > > > > "Values stored in non-bit-field objects of any other object type > > > > > > consist of n * CHAR_BIT bits, where n is the size of an object of > > that > > > > > > type, in bytes. The value may be copied into an object of type > > > > > > unsigned char [n] (e.g., by memcpy); the resulting set of bytes is > > > > > > called the object representation of the value." > [...] > > > > Where? Unless you misquoted, it says: > > > 1) the object consists of n*CHAR_BITS > > > 2) n is the number of bytes needed to build that object > > > 3) the resulting set of bytes is the objects representation > > > 4) an object comprised of some char's can be copied into a bunch of > bytes > > > Ugh. Your first #4 is plainly wrong. > > No. > > > The standard says the N bytes > > of an object can be copied into an array N unsigned chars. > > It might. But, it's definately not part of what you quoted above... > > > "Values stored in non-bit-field objects of any other object type > > consist of n * CHAR_BIT bits, where n is the size of an object of that > > type, in bytes." > > > - an object is stored in N bytes, > > #2) > > > each with CHAR_BIT bits > > No. First, "object" definately isn't CHAR_BITS in size. Second, it doesn't > say "bytes" is CHAR_BITS in size either. "Object" and "bytes" are the > only two nouns in the first part of your phrase, one of which much represent > "each." I took it to be "bytes" due to context and proximity. What that > does say is the values, of certain types of objects, are n*CHAR_BITS bits > with n being obtained from the object's size in bytes. I.e., if > sizeof(long)==4, then the values occupy total bits of 4*CHAR_BITS. While > the number of bytes needed, N, is mentioned, there is no mention of bytes > being a certain number of bits, AFAICT. (BTW, do you diagram sentences as > you read them?) > > > "The value may be copied into an object of type unsigned char [n] > > (e.g., by memcpy);" > > > - Those N bytes, > > No. It doesn't say "N bytes". It says, "The value... may be copied." > > > or at least the values contained therein, > > Yes. > > > can be > > copied in to an array of N chars. > > No. It doesn't say into an array of "N chars" anywhere. It says "The > value... may be copied" into "an object" which is "of type unsigned > char[n]". An object of type unsigned char[n] is comprised of N bytes, see > #2. How is "an array of N chars" different from "an object of type unsigned char[n]"? Ignoring the signedness, which I've explicitly ignored several times. So an object has a value consisting of N*CHAR_BITS bits ("Values...consist of n * CHAR_BIT bits"), the same object is N bytes long ("where n is the size of an object of that type, in bytes"), the value can be stored in an array of N unsigned chars ("The value may be copied into an object of type unsigned char [n]"), and further, CHAR_BITS is defined as specifying the size of a byte (in 5.2.4.2.1 - "number of bits for smallest object that is not a bit-field (byte)") . I simply cannot see where there's any wiggle room. Unless you want to declare a second typo in the CHAR_BITS definition, and even then it requires a very unusual reading of the first three parts to get to your position. > > "A byte contains CHAR_BIT bits, and the values of type unsigned char > > range from 0 to 2**CHAR_BIT - 1." So a byte contains CHAR_BIT bits. > > And the numbers that you can put in an unsigned char exactly > > correspond to that. It's not "unsigned char ranges from zero to no > > more than (2**CHAR_BIT - 1)" - rather the range is exact. So a byte > > contains exactly the number of bits that can fit in an unsigned char. > > And an unsigned char can hold exactly the number of different values > > that can fit in a byte. > > "(Adapted from the American National Dictionary for Information Processing > Systems.)" > > It probably should read: > A) "An unsigned char contains CHAR_BIT bits, and the values of type > unsigned char range from 0 to 2CHAR_BIT - 1." > B) "If a byte contains CHAR_BIT bits, then the values of type unsigned > char range from 0 to 2CHAR_BIT - 1." > > I think they used "unsigned char" as a synonym for a byte. It's a typo.. Seriously? A sentence for which you cannot find a convoluted parsing that supports your position is a typo?! And your additional quote is flatly incorrect. The part you added ("Adapted from...)" applies to the preceding sentence. > Let's say the C byte is 9-bits because the hardware byte is 9-bits. But, > the C char is 8-bits. How do you access the ninth bit of the C byte in C? > (You can't.) The byte is the addressable unit which can address 9-bits both > in C and on hardware. But, the char is the "value unit" which can only > access 8 of those 9-bits. I.e., your "byte must fit in a char" doesn't work > in this legal example. This is entirely *independent* of whether you think > I'm "conflating" C bytes and hardware bytes. If the ninth bit is completely inaccessible, it's not meaningful from the perspective of the C program, no? If the C byte has extra bits in it, how do they affect the C program? If they do not, then they may as well not exist. > > When Phil wondered why you didn't respond to my post, you said "No. I > > decided it wasn't in my best interest to pursue the conversation with > > RW." > > I'm still not sure it's in my best interest... ;) > > I recall getting into a number of long drawn out conversations on C in > threads and NG's unrelated to C and was really tired of that. I don't > immediately recall if one of those was with you. I do know I've a few with > Phil, and a few with others who frequent comp.lang.c. They tend to harass > those even off c.l.c. with their frequently incorrect "understanding" of C, > IMO. The usual c.l.c. response: > > 100+ individuals claim you are wrong without being able to prove it > 50+ individuals insult you without remorse > 10 individuals claim you are wrong but use faulty logic > 1 individual attempts to prove you are wrong but can't do so > 1 individual makes an almost correct proof, by ignoring a fact or two > > > If you've decided not to pursue the conversation with me, do you > > expect me to continue talking to myself? > > Continue "writing" to yourself...? Are you currently talking to yourself? > If so, then I expect you'll likely continue... ;) > > You seemed to have waited over a year to resume this conversational topic > with me on C in an assembly NG. Coincidence? I didn't choose the resurrect a long forgotten conversation with you. You posted, in a new thread, incorrect information, which happened to be similar to that of the 5-07 thread. I responded to that. You brought up the old thread. > > But I have to simply disagree about the need to consider history when > > reading the C standard. > > Really? BTW, which standard?... There are at least four, IMO. And, you > can't decide which to *one* of them to read without "need[ing] to consider > history." There are sufficiently large differences between them. > > > It may help you to understand why certain > > things are the way they are, may illustrate which of several choices > > an implementation may make might be the "better" one, and may well > > help you understand the standard. The whole point of the C standard > > is to provide the complete* definition, > > That's the problem: one can't provide a complete definition for C unless the > hardware is 100% identical on every platform. One can only provide a > somewhat complete definition if the language uses the bare minimum de-facto > features of the computing hardware: basic arithmetic, addresses, integers, > contiguous memory, byte-sized memory, etc. I.e., these underlying > characteristics are implicitly standardized by the C standard. It doesn't > matter if the C spec. doesn't mention them explicitly. They are a > requirement to implementing C. > > > explicitly so that knowledge > > of history is not required. They may not succeed 100%, but they come > > pretty darn close. > > How do setjmp and longjmp fit? How do arg's passed from the environment > fit? How do you implement realloc without access to underlying OSes memory > allocator? How do you implement the C library if you don't have a *nix > concept of files and equivalents to unistd.h like functions: > open,close,read,write,lseek? In what sense are the functions of setjmp and longjmp ambiguous? How it's implemented is clearly very implementation dependent (in fact most implementations require a couple of short snippets of assembler, unlike most of the rest of the standard C library), but it functions the same in all versions of C (modulo bugs and non-conformance). Same with realloc - it has to do something specific - it will presumably invoke an OS service to allocate memory in many environments, on at least some occasions. So what? And the semantics of C files are what they are, the implementation clearly need to figure out how to map that onto something that's considered valuable in the local environment. > They succeeded in abstracting that which was abstractable: grammar syntax, > arithmetic, types, and or portable due to de-facto hardware standardization. > If C didn't have 15+ years of usage prior to C89 which proved some > "portability" or "adaptability" of the language, do you think C would have > ever been standardized? Of course not, it'd have died out. > > > *Complete within the context that the standard was written. It does > > not, for example, define the term "binary," or much other industry > > jargon. > > Nor, does it properly define a byte or an abstract machine or many other > standardized terms ... etc. Clearly it does not define them to your satisfaction. > > The implementation creates the mapping. It defines the relationship > > That's it's job. Consider the IEEE math standard. It also does not > > talk about any physical implementation, at best it talks about > > patterns of bits, and what various operations do to those patterns of > > bits. Sort of like the C standard. In both cases I can write a > > program that does something well defined, with no reference to any > > particular implementation. > > The last sentence is true but only to a limited degree. It is nowhere near > true in it's entirety. You can't expect to use argv/argc parameters to main > portably, realloc portably, setjmp/longjmp portably, files portably, > structures accessed as "arrays" portably, escape characters portably, int > portably, getenv portably, signals portably, offsetof portably, errno > portably, exit portably, etc. The argc/argv parameters are quire portable. How they get specified when the program is run, is quite implementation specific (as is how one actually runs a program), including how the specified input is parsed into those parameters, but there's nothing unportable about accessing the values passed into main. And most of the other examples are simply wrong. A couple I've already addressed, and the others (offsetof, for example) are quite possible to portably, if you observe the specified restrictions. Same with ints. a=b+c; has a perfectly portable meaning so long as b, c and the sum of the two, are within -32767..+32767. It also has a perfectly defined meaning if the three values have values between INT_MIN and INT_MAX. Code can, of course, assume behavior when those limits are exceeded, and thus tie themselves to a particular implementation (or set thereof). > > Again you insist on conflating the C notion of a byte and the hardware > > concept. > > No. Ah, so you concede my point, since this is obviously a typo, and you meant "yes"... > > The C standard does not talk about the implementation, it defines what > > a C program (and thus a C compiler, aka implementation) needs to > > appear to do. It makes those definitions in terms of an abstract > > machine, within which it defines various local terms like "byte". The > > implementation maps that onto real (for some definition of real) > > hardware. You appear to have some fundamental objection to that state > > of affairs, > > I think you didn't fully understand what you read... Someone certainly isn't. I'm not sure this is worth pursuing. You appear fully convinced of your position, to the point where you appear to be willing to make obviously false and illogical statements. Conversely, you accuse me of doing the same. If you think either position might be changed, I am willing to continue for a reasonable time. |