Why Is Escaping Data Considered So Magical? [Python]

Prev: GDAL-1.7.1 : vcvarsall.bat missing
Next: improving python performance by extension module (64bit)

From: Cameron Simpson on 30 Jun 2010 05:55

On 29Jun2010 21:49, Carl Banks <pavlovevidence(a)gmail.com> wrote:
| On Jun 28, 2:44 am, Gregory Ewing <greg.ew...(a)canterbury.ac.nz> wrote:
| > Carl Banks wrote:
| > > Indeed, strncpy does not copy that final NUL if it's at or beyond the
| > > nth element. Probably the most mind-bogglingly stupid thing about the
| > > standard C library, which has lots of mind-boggling stupidity.
| >
| > I don't think it was as stupid as that back when C was
| > designed. Every byte of memory was precious in those days,
| > and if you had, say, 10 bytes allocated for a string, you
| > wanted to be able to use all 10 of them for useful data.
| >
| > So the convention was that a NUL byte was used to mark
| > the end of the string *if it didn't fill all the available
| > space*.
|
| I can't think of any function in the standard library that observes
| that convention, which inclines me to disbelieve this convention ever
| really existed. If it did, there would be functions to support it.
|
| For that matter, I'm not really inclined to believe bytes were *that*
| precious in those days.

Jeez. PDP-11s, 16 bit addressing, tiny tiny disc drives!

The original V7 (and probably earlier) UNIX filesystem has 16 byte directory
entries: 2 bytes for an inode and 14 bytes for the name. You could use 14
bytes of that name, and strncpy makes it effective to work with that data
structure.

Shortening something already only 14 bytes (the name) _is_ a big ask,
and it is well work the unusual convention in play.

| The obvious rationale behind strncpy's stupid behavior is that it's
| not a string function at all, but a memory block function, that stops
| at a NUL in case you don't care what's after the NUL in a block. But
| it leads you to believe it's a string function by it's name.

Bah. It's for copying a _string_ into a _buffer_! Strangely, since it
starts with a string (NUL-terminated byte sequence) it begins with
"str". And it _is_ copying, but not into another string.

It is special purpose but perfectly reasonable for the problem at hand.
--
Cameron Simpson <cs(a)zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

If it ain't broken, keep playing with it.

From: Roy Smith on 30 Jun 2010 08:42

In article <mailman.14.1277891765.1673.python-list(a)python.org>,
Cameron Simpson <cs(a)zip.com.au> wrote:

> Jeez. PDP-11s, 16 bit addressing, tiny tiny disc drives!

What you talking about, tiny? An RK-05 was huge! Why would anybody
ever need more than that?

> The original V7 (and probably earlier) UNIX filesystem has 16 byte directory
> entries

Certainly earlier. I used v6, and it was like that there. I'm
reasonably sure it pre-dated v6, however.

From: Michael Torrie on 30 Jun 2010 10:02

On 06/30/2010 03:00 AM, Jorgen Grahn wrote:
> On Wed, 2010-06-30, Michael Torrie wrote:
>> On 06/29/2010 10:17 PM, Michael Torrie wrote:
>>> On 06/29/2010 10:05 PM, Michael Torrie wrote:
>>>> #include <stdio.h>
>>>>
>>>> int main(int argc, char ** argv)
>>>> {
>>>> char *buf = malloc(512 * sizeof(char));
>>>> const int a = 2, b = 3;
>>>> snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
>>> ^^^^^^^^^^
>>> Make that 512*sizeof(buf)
>>
>> Sigh. Try again. How about "512 * sizeof(char)" ? Still doesn't make
>> a different. The code still crashes because the &buf is incorrect.
>
> I haven't tried to understand the rest ... but never write
> 'sizeof(char)' unless you might change the type later. 'sizeof(char)'
> is by definition 1 -- even on odd-ball architectures where a char is
> e.g. 16 bits.

You're right. I normally don't use sizeof(char). This is obviously a
contrived example; I just wanted to make the example such that there's
no way the original poster could argue that the crash is caused by
something other than &buf.

Then again, it's always a bad idea in C to make assumptions about
anything. If you're on Windows and want to use the unicode versions of
everything, you'd need to do sizeof(). So using it here would remind
you that when you move to the 16-bit Microsoft unicode versions of
snprintf need to change the sizeof(char) lines as well to sizeof(wchar_t).

From: Carl Banks on 30 Jun 2010 15:03

On Jun 30, 2:55 am, Cameron Simpson <c...(a)zip.com.au> wrote:
> On 29Jun2010 21:49, Carl Banks <pavlovevide...(a)gmail.com> wrote:
> | On Jun 28, 2:44 am, Gregory Ewing <greg.ew...(a)canterbury.ac.nz> wrote:
> | > Carl Banks wrote:
> | > > Indeed, strncpy does not copy that final NUL if it's at or beyond the
> | > > nth element. Probably the most mind-bogglingly stupid thing about the
> | > > standard C library, which has lots of mind-boggling stupidity.
> | >
> | > I don't think it was as stupid as that back when C was
> | > designed. Every byte of memory was precious in those days,
> | > and if you had, say, 10 bytes allocated for a string, you
> | > wanted to be able to use all 10 of them for useful data.
> | >
> | > So the convention was that a NUL byte was used to mark
> | > the end of the string *if it didn't fill all the available
> | > space*.
> |
> | I can't think of any function in the standard library that observes
> | that convention, which inclines me to disbelieve this convention ever
> | really existed. If it did, there would be functions to support it.
> |
> | For that matter, I'm not really inclined to believe bytes were *that*
> | precious in those days.
>
> Jeez. PDP-11s, 16 bit addressing, tiny tiny disc drives!
>
> The original V7 (and probably earlier) UNIX filesystem has 16 byte directory
> entries: 2 bytes for an inode and 14 bytes for the name. You could use 14
> bytes of that name, and strncpy makes it effective to work with that data
> structure.
>
> Shortening something already only 14 bytes (the name) _is_ a big ask,
> and it is well work the unusual convention in play.

You are talking about fixed-length memory records, not strings.

I'm saying that bytes were not so precious that, when you operate on
*actual strings*, that you need to desperately cut off nul terminators
to save space.

> | The obvious rationale behind strncpy's stupid behavior is that it's
> | not a string function at all, but a memory block function, that stops
> | at a NUL in case you don't care what's after the NUL in a block. But
> | it leads you to believe it's a string function by it's name.
>
> Bah. It's for copying a _string_ into a _buffer_! Strangely, since it
> starts with a string (NUL-terminated byte sequence) it begins with
> "str". And it _is_ copying, but not into another string.

I'm going to disagree. The input of strncpy can be either a string or
a memory block, and the output can only a memory block. In other
words, neither the source nor destination has to be a string. This is
a memory block function, not a string function. The correct name for
this function should have been memcpytonul.

Even if you disagree, then you must admit it should have been called
strcpytobuf. Nothing about the name strncpy gives the slightest
suggestion that the destination is not a string. Based on analogy
from other str functions, none of which have any sources or
destinations that are memory blocks, one would logically expect that
strncpy's destination was a string. It defies common sense.

And there should have been an actual, correctly working strncpy in the
standard library that copies and truncates actual strings.

> It is special purpose but perfectly reasonable for the problem at hand.

The usefulness of strncpy's behavior for writing fixed-length memory
blocks is not in question here. The thing that's mind-bogglingly
stupid is that the function that does this is called "strncpy".

Carl Banks

From: Paul Rubin on 30 Jun 2010 15:19

Cameron Simpson <cs(a)zip.com.au> writes:
> The original V7 (and probably earlier) UNIX filesystem has 16 byte directory
> entries: 2 bytes for an inode and 14 bytes for the name. You could use 14
> bytes of that name, and strncpy makes it effective to work with that data
> structure.

Why not use memcpy for that?

First | Prev | Next | Last
Pages: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Prev: GDAL-1.7.1 : vcvarsall.bat missing
Next: improving python performance by extension module (64bit)