Why Is Escaping Data Considered So Magical? [Python]

Prev: GDAL-1.7.1 : vcvarsall.bat missing
Next: improving python performance by extension module (64bit)

From: John Nagle on 25 Jun 2010 14:58

On 6/25/2010 12:09 AM, Paul Rubin wrote:
> Nobody<nobody(a)nowhere.com> writes:
>> More generally, as a program gets more complex, "this will work so long as
>> we do X every time without fail" approaches "this won't work".

Yes. I was just looking at some of my own code. Out of about 100
SQL statements, I'd used manual escaping once, in code where the WHERE
clause is built up depending on what information is available for the
search. It's done properly, using "MySQLdb.escape_string(s)", which
is what's used inside "cursor.execute". Looking at the code, I
now realize that it would have been better to
add sections to the SQL string with standard escapes, and at the same
time, append the key items to a list. Then the list can be
converted to a tuple for submission to "cursor.execute".

John Nagle

From: Nobody on 25 Jun 2010 19:17

On Fri, 25 Jun 2010 12:15:08 +0000, Jorgen Grahn wrote:

> I don't do SQL and I don't even understand the terminology properly
> ... but the discussion around it bothers me.
>
> Do those people really do this?

Yes. And then some.

Among web developers, the median level of programming knowledge amounts to
the first 3 chapters of "Learn PHP in 7 Days".

It doesn't help the the guy who wrote PHP itself wasn't much better.

> - accept untrusted user data
> - try to sanitize the data (escaping certain characters etc)
> - turn this data into executable code (SQL)
> - executing it
>
> Like the example in the article
>
> SELECT * FROM hotels WHERE city = '<untrusted>';

Yep. Search the BugTraq archives for "SQL injection". And most of those
are for widely-deployed middleware; the zillions of bespoke site-specific
scripts are likely to be worse.

Also: http://xkcd.com/327/

> I thought it was well-known that the solution is *not* to try to
> sanitize the input

Well known by anyone with a reasonable understanding of the principles of
programming, but somewhat less well known by the other 98% of web
developers.

> Am I missing something?

There's a world of difference between a skilled chef and the people
flipping burgers for a minimum wage. And between a chartered civil
engineer and the people laying the asphalt. And between what you
probably consider a programmer and the people doing most web development.

> If not, I can go back to sleep -- and keep
> avoiding SQL and web programming like the plague until that community
> has entered the 21st century.

Don't hold your breath.

Of course, there's no fundamental reason why you can't apply sound
practices to web development. Well, other than the fact that you're
competing against an infinite number of (code-) monkeys for lowest-bidder
contracts.

To be fair, it isn't actually limited to web developers. I've seen the
following in scientific code written in C (or, more likely, ported to C
from Fortran) for Unix:

sprintf(buff, "rm -f %s", filename);
system(buff);

Why bother learning the Unix API when you already know system()?

From: Ian Kelly on 25 Jun 2010 20:25

On Fri, Jun 25, 2010 at 5:17 PM, Nobody <nobody(a)nowhere.com> wrote:
> To be fair, it isn't actually limited to web developers. I've seen the
> following in scientific code written in C (or, more likely, ported to C
> from Fortran) for Unix:
>
> sprintf(buff, "rm -f %s", filename);
> system(buff);

Tsk, tsk. And it's so easy to fix, too:

#define BUFSIZE 1000000
char buff[BUFSIZE];
if (snprintf(buff, BUFSIZE, "rm -f %s", filename) >= BUFSIZE) {
printf("No buffer overflow for you!\n");
} else {
system(buff);
}

There, that's much more secure.

From: Lawrence D'Oliveiro on 25 Jun 2010 20:40

In message <pan.2010.06.25.06.47.34.297000(a)nowhere.com>, Nobody wrote:

> On Fri, 25 Jun 2010 12:25:56 +1200, Lawrence D'Oliveiro wrote:
>
>> I construct ad-hoc queries all the time. It really isn't that hard to
>> do safely.
>
> Wrong.
>
> Even if you get the quoting absolutely correct (which is a very big "if"),
> you have to remember to perform it every time, without exception.
>
> More generally, as a program gets more complex, "this will work so long as
> we do X every time without fail" approaches "this won't work".

That's a content-free claim. Why? Because it applies equally to everything.
Replace “quoting” with something like “arithmetic”, and you'll see what I
mean:

Even if you get the arithmetic absolutely correct (which is a very big
"if"), you have to remember to perform it every time, without exception.

More generally, as a program gets more complex, "this will work so long
as we do X every time without fail" approaches "this won't work".

From which we can conclude, according to your logic, that one shouldn't be
doing arithmetic.

Next time, try to avoid fallacious arguments.

> And you need to perform it exactly once. As the program gets more complex,
> ensuring that it's done in the correct place, and only there, gets harder.

Nonsense. It only needs to be done at the boundary to the appropriate
component (MySQL, HTML, JavaScript, whatever). That's the only place which
needs to have knowledge of what's on the other side. Everything else can
work with arbitrary data without having to worry about such things.

Go back to my example, and you'll see this: the original updates two dozen
different fields in a database table, yet it only needs two calls to
SQLString: one deals with all the fields requiring updating, while the other
one deals with the key-matching. That's it. Instead of two dozen different
places needing checking, you only have two.

That's what “maintainability” is all about.

From: Roy Smith on 25 Jun 2010 20:43

In article <mailman.2117.1277511935.32709.python-list(a)python.org>,
Ian Kelly <ian.g.kelly(a)gmail.com> wrote:

> On Fri, Jun 25, 2010 at 5:17 PM, Nobody <nobody(a)nowhere.com> wrote:
> > To be fair, it isn't actually limited to web developers. I've seen the
> > following in scientific code written in C (or, more likely, ported to C
> > from Fortran) for Unix:
> >
> > � � � �sprintf(buff, "rm -f %s", filename);
> > � � � �system(buff);
>
> Tsk, tsk. And it's so easy to fix, too:
>
> #define BUFSIZE 1000000
> char buff[BUFSIZE];
> if (snprintf(buff, BUFSIZE, "rm -f %s", filename) >= BUFSIZE) {
> printf("No buffer overflow for you!\n");
> } else {
> system(buff);
> }
>
> There, that's much more secure.

I recently fixed a bug in some production code. The programmer was
careful to use snprintf() to avoid buffer overflows. The only problem
is, he wrote something along the lines of:

snprintf(buf, strlen(foo), foo);

I'm sure the code got reviewed originally, and probably looked at dozens
of times over the years. Nobody caught the problem until we ran a
static code analysis tool (Coverity) over it.

To bring this back to something remotely Python related, the point of
all this is that security is hard. A lot of the security best practices
(such as "don't compose SQL queries on the fly with externally tainted
strings") exist because they address ways that people have gotten burned
in the past. It if foolish to think that you're smarter than everybody
else and have thought of every possibility to avoid getting burned by
doing the things that have gotten other people in trouble.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: GDAL-1.7.1 : vcvarsall.bat missing
Next: improving python performance by extension module (64bit)