From: Thomas Gagne on
I was re-reading some of the messages under the subject "Databases as
objects" and ran across something Frans Bouma wrote:

Frans Bouma wrote:
> Thomas Gagne wrote:
>
>
>
>> But to the point, if a program was able to store improperly-formatted
>> zipcode inside the DB then whose fault is that?
>>
>
> what's an improperly formatted zipcode? In the US, you have 5 digits,
> in the netherlands you have 4 digits and 2 characters. A zipcode of
> 1234 AA is properly formatted for a dutch user of the application, but
> not correctly formatted for the US user of the program. Hence: context.
>
> This thus means that if the db stores '1234AA', it can do so, and the
> dutch user will happily use it. The US user can't because for the US
> user it's just data, 1234 and 2 characters, it's not information
> (zipcode).
If you think about it, neither zip code means anything to the database.
They're simply characters in a field. It's the /reader/ who derives
information from them. The only way a DB might know something about zip
codes is if the field were a FK to known zip codes, then the DB might
assert referential integrity, but still know nothing of the meaning of
"1234 AA." Only the post office and its customers are impacted by its
meaning.

So that takes us back to whether or not the database is correct beyond
its integrity checks. If it stores exactly what it should store then it
is correct. An incorrect or improperly formatted zip code isn't the
database's responsibility if that's what a human or application told it
to store. A DB wouldn't know it was correct or not until it tried
delivering the mail--or performing some function with the data.

This is the same problem with adding apples and oranges, or USD and
CAD. Where I able to store $1USD and $2CAD in an attribute that by
itself isn't necessarily wrong. It may, however, be an inability of a
function to find meaning to adding them together, or the unwillingness
of a bank to ACH 1USD+1CAD, even though people familiar with both
algebra and currency codes recognize exactly what the expression is and
what it means. After crossing the border to Windsor I may end up with
both USD and CAD in my wallet.

This reinforces the DB can be responsible for structural, type, and
referential integrity, but it can not give meaning to its data.
"Incorrect" data, having passed the three tests the DB can apply, is
only incorrect to those is has meaning for.

So this takes us back to our responsibility as programmers and designers
to guard our database's integrity. In the same way OO programmers guard
their objects' integrity by restricting access to object state, why
would they regard a DB's state any less by allowing any and all to
manipulate rows and attributes however and wherever they wish?

--
Visit <http://blogs.instreamfinancial.com/anything.php>
to read my rants on technology and the finance industry.
From: frebe on
> >> But to the point, if a program was able to store improperly-formatted
> >> zipcode inside the DB then whose fault is that?
>
> > what's an improperly formatted zipcode? In the US, you have 5 digits,
> > in the netherlands you have 4 digits and 2 characters. A zipcode of
> > 1234 AA is properly formatted for a dutch user of the application, but
> > not correctly formatted for the US user of the program. Hence: context.
>
> > This thus means that if the db stores '1234AA', it can do so, and the
> > dutch user will happily use it. The US user can't because for the US
> > user it's just data, 1234 and 2 characters, it's not information
> > (zipcode).
>
> If you think about it, neither zip code means anything to the database.
> They're simply characters in a field.

With a proper type system or check contraints, a RDBMS could guarantee
that no invalid zip codes are stored into the database. If we want to
store zip codes for multiple countries, one solution is to have
different relations for different countries with different domains/
types for the zip code attributes.

address(id, country, street, ...)
dutch_address(id, country, dutch_zipcode)
us_address(id, country, us_zipcode)

The country columns in duch_address and us_address should be constants
(enforced by a check constraint), and there should be two foreign keys
between dutch_address/us_address (id, country) referencing address(id,
country). Doing this, it would not be possible to have both a dutch
and us address. The format of dutch_zipcode and us_zipcode could be
enforced by a check constraint using regular expressions, or by
defining a custom type.

To make it simpler to use, we would create a view like this:

create view address_all
select id, country, street, dutch_zipcode as zip
from address a join dutch_address da on da.id=a.id
union
select id, country, street, us_zipcode
from address a join us_address ua on ua.id=a.id

> So that takes us back to whether or not the database is correct beyond
> its integrity checks. If it stores exactly what it should store then it
> is correct. An incorrect or improperly formatted zip code isn't the
> database's responsibility if that's what a human or application told it
> to store.

A RDBMS is capable of enforcing such constraints. We want all data in
the database to be valid.

> This reinforces the DB can be responsible for structural, type, and
> referential integrity, but it can not give meaning to its data.

A RDBMS can give meaning to its data in the same way as an application
can give "meaning" to its data.

> So this takes us back to our responsibility as programmers and designers
> to guard our database's integrity.

I am afraid that you say that "programmers" should guard the database
because you want to use stored procedures as proxies to the database.
The data integrity is enforced by primary and foreign key constraints,
check contraints and triggers, in that preferred order. Hiding the
database behind procedures is almost never necessary.

/Fredrik

From: Thomas Gagne on
frebe wrote:
>> <snip>
>> If you think about it, neither zip code means anything to the database.
>> They're simply characters in a field.
>>
>
> With a proper type system or check contraints, a RDBMS could guarantee
> that no invalid zip codes are stored into the database. If we want to
> store zip codes for multiple countries, one solution is to have
> different relations for different countries with different domains/
> types for the zip code attributes.
>
But you quickly approach the DB designer's version of Heisenberg's
uncertainty principle. How does the DB know it has all zip code formats
to test against? How does it know what all the valid zip codes are? As
soon as it's measured and implemented how does the DB know a new zip
code wasn't created in some growing suburb of Zurich?

As the DB more rigorously guards what it can not be certain of, it
creates obstacles, bugs, or simply starts getting in the way of people
who are certain of what they're doing.

Is there some definite point beyond which a DB designer should stop
creating constraints? Is there a point of diminishing returns? How
much effort should be asserted to know the exact value, set of values,
and potential values of every field in the database? At some point
mightn't even the post office shrug its shoulders and deliver a letter
to a destination country and let them figure-out its address if they
don't recognize anything but the country?


--
Visit <http://blogs.instreamfinancial.com/anything.php>
to read my rants on technology and the finance industry.
From: Dmitry A. Kazakov on
On 4 Apr 2007 10:47:47 -0700, frebe wrote:

>>>> But to the point, if a program was able to store improperly-formatted
>>>> zipcode inside the DB then whose fault is that?
>>
>>> what's an improperly formatted zipcode? In the US, you have 5 digits,
>>> in the netherlands you have 4 digits and 2 characters. A zipcode of
>>> 1234 AA is properly formatted for a dutch user of the application, but
>>> not correctly formatted for the US user of the program. Hence: context.
>>
>>> This thus means that if the db stores '1234AA', it can do so, and the
>>> dutch user will happily use it. The US user can't because for the US
>>> user it's just data, 1234 and 2 characters, it's not information
>>> (zipcode).
>>
>> If you think about it, neither zip code means anything to the database.
>> They're simply characters in a field.
>
> With a proper type system or check contraints, a RDBMS could guarantee
> that no invalid zip codes are stored into the database.

Where the types are stored?
[...]

>> This reinforces the DB can be responsible for structural, type, and
>> referential integrity, but it can not give meaning to its data.
>
> A RDBMS can give meaning to its data in the same way as an application
> can give "meaning" to its data.

No. RDBMS to application is like CPU.

CPU cannot give any meaning to anything. Neither RDBMS can.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
From: topmind on

frebe wrote:
> > >> But to the point, if a program was able to store improperly-formatted
> > >> zipcode inside the DB then whose fault is that?
> >
> > > what's an improperly formatted zipcode? In the US, you have 5 digits,
> > > in the netherlands you have 4 digits and 2 characters. A zipcode of
> > > 1234 AA is properly formatted for a dutch user of the application, but
> > > not correctly formatted for the US user of the program. Hence: context.
> >
> > > This thus means that if the db stores '1234AA', it can do so, and the
> > > dutch user will happily use it. The US user can't because for the US
> > > user it's just data, 1234 and 2 characters, it's not information
> > > (zipcode).
> >
> > If you think about it, neither zip code means anything to the database.
> > They're simply characters in a field.
>
> With a proper type system or check contraints, a RDBMS could guarantee
> that no invalid zip codes are stored into the database. If we want to
> store zip codes for multiple countries, one solution is to have
> different relations for different countries with different domains/
> types for the zip code attributes.
>
> address(id, country, street, ...)
> dutch_address(id, country, dutch_zipcode)
> us_address(id, country, us_zipcode)

I personally would take a different approach, partly because I have
more of a dynamic-typing (or type-free) preference than frebe.

I would have a general Address or Contact table, rather than one per
country, with all columns that any country would need. Validation
could be done by a trigger. A validation table could look something
like this:

table: contactColumnValidation
-------
countryRef // foreign key to country code
columnRef // column name (such as "postal_code")
isRequired // Boolean
regEx // regular expression used to validate field
function // function name or reference for any further validation
formatNote // optional note or hint to help data entry clerk

The function name (or even a code snippet stored in the DB) points to
an optional function that can be used for validation when regular
expressions are not capable of it by themselves. (Regular expressions
hopefully handle most cases.)

Whether existing trigger implementations can handle this or not, I
don't know. But this would be the ideal IMO. If not, then at least
these tables can be used on the app side to perform validation. In
practice there is usually only one Address Entry Screen anyhow, since
maintaining multiple that do the same thing is usually not worth it.
Thus, there is usually only one "entry point" for address changes/
additions anyhow.

>
> The country columns in duch_address and us_address should be constants
> (enforced by a check constraint), and there should be two foreign keys
> between dutch_address/us_address (id, country) referencing address(id,
> country). Doing this, it would not be possible to have both a dutch
> and us address. The format of dutch_zipcode and us_zipcode could be
> enforced by a check constraint using regular expressions, or by
> defining a custom type.
>
> To make it simpler to use, we would create a view like this:
>
> create view address_all
> select id, country, street, dutch_zipcode as zip
> from address a join dutch_address da on da.id=a.id
> union
> select id, country, street, us_zipcode
> from address a join us_address ua on ua.id=a.id
>
> > So that takes us back to whether or not the database is correct beyond
> > its integrity checks. If it stores exactly what it should store then it
> > is correct. An incorrect or improperly formatted zip code isn't the
> > database's responsibility if that's what a human or application told it
> > to store.
>
> A RDBMS is capable of enforcing such constraints. We want all data in
> the database to be valid.
>
> > This reinforces the DB can be responsible for structural, type, and
> > referential integrity, but it can not give meaning to its data.
>
> A RDBMS can give meaning to its data in the same way as an application
> can give "meaning" to its data.
>
> > So this takes us back to our responsibility as programmers and designers
> > to guard our database's integrity.
>
> I am afraid that you say that "programmers" should guard the database
> because you want to use stored procedures as proxies to the database.
> The data integrity is enforced by primary and foreign key constraints,
> check contraints and triggers, in that preferred order. Hiding the
> database behind procedures is almost never necessary.
>
> /Fredrik

-T-

 |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: Have you heard of SOME?
Next: self referencing UML