From: H. S. Lahman on
Responding to Frebe...

>> invoiceSet = this -> R1 WHERE (date=20080113)
>> FOREACH invoice IN invoiceSet
>> // process invoice objects
>
> or
>
> select invoiceid from invoice
> where customerid=? and date=20080113
>
>> paymentSet = this -> R1 -> R2 WHERE (date=20080113)
>> FOREACH payment IN paymentSet
>> // process payment objects
>
> or
>
> select paymentid from invoice i join payment p on
> i.invoiceid=p.invoiceid
> where customerid=? and p.date=20080113

Yes, they are quite similar. But apropos of your other message, that
just reflects that both are based on the relational model. However, the
models are quite different because the collection sets are object-based
in the AAL but they are table-based for the SQL. The set of invoices and
the set of payments examined in the AAL case will usually be much
smaller that the corresponding invoice and payment sets in the RDB.

>> when the user wants payments. [Note that the WHERE clause filters the
>> date attribute in the target set. When a multistage path is navigated it
>> is assumed that the membership of the intermediate collections already
>> limits the set adequately. If not, one needs to do the second fragment
>> in two stages.]
>
> Using an SQL database we don't have to bother about limiting the
> volume of data. In
> this case we can assume the indexes to do a pretty good job, O(log n).

There is nothing you can do with an RDB index that can't be done in the
implementation of a relationship's collection class. But the 'n' in
O(log n) will usually be much smaller in the OO application because the
collections are object-based rather than class-based.

>> This is fine so long as the filter is simple, such as a date. Each
>> collection is optimized for efficient access via a date. As you point
>> out, it gets somewhat messy if there are multiple filters (e.g., one
>> also may want payments where the amount was less than some percentage of
>> the outstanding balance).
>
> Using relational algebra (SQL) it wouldn't be messy.
>
> select paymentid
> from invoice i join payment p on i.invoiceid=p.invoiceid join customer
> c on c.custid=i.custid
> where customerid=? and p.date=20080113 and p.amount < c.balance*0.10

What is messy is the number of different indices into the same set. When
one adds/removes a member of the set, every index must be updated and
when a member is modified all the relevant indices must be updated.

>> Then the collection class needs an interface
>> for each distinct selection criteria. Each accessor may need a
>> specialized implementation as well.
>
> Yes, the OO (network) solution is indeed a mess.

Let me see if I understand this. A collection class with multiple
interfaces for different selection criteria is a mess but different
database queries for different selection criteria is not a mess. A
collection class with different implementations for those criteria is a
mess but multiple indices on the same table is not a mess.

>> However, that is pretty much why collection classes exist; they manage
>> collections.
>
> That's pretty much why databases exists, they manage data.

True, but I'm talking about collection classes, whose database analogue
is a table index, not the database itself.

>
>> More to the point, that is /all/ they do; they encapsulate
>> the collection management in one place. IOW, the collection class
>> isolates and encapsulates complexity of a highly focused nature. In
>> general that is a Good Thing.
>
> Yes, that's why we uses databases.

For data storage that is independent of applications but this context is
about solving specific customer problems.

>> However, at OOP time one addresses other issues like performance and
>> maintainability. Suppose Invoices and Payments are very rarely added
>> compared to how often the various selection criteria are accessed. In
>> that case it may well be more efficient to have multiple 1:*
>> relationships in parallel, each with its own optimized collection
>> mechanics. Thus one has R2A optimized to access by <arbitrary> date and
>> R2B optimized to access by some amount criteria. For example, one
>> collection is sorted by date while the other is sorted by amount. The
>> overhead of adding the same entry to each collection is <hopefully>
>> small compared to accessing the same elements many times.
>
> What about integrity? If a bug causes an exception after adding the
> payment to the collection sorted by date, but before adding to the
> collection sorted by amount?
>
> What if, you modify the amount and forget to update the collection?
> What if you add another search cirteria, and forget one of the add/
> modify/delete methods?

And your point is...? Data integrity issues are orthogonal to this
context (performance, maintainability).

--
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
hsl(a)pathfindermda.com
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
info(a)pathfindermda.com for your copy.
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
From: frebe on
> >> invoiceSet = this -> R1 WHERE (date=20080113)
> >> FOREACH invoice IN invoiceSet
> >>     // process invoice objects
>
> > or
>
> > select invoiceid from invoice
> > where customerid=? and date=20080113
>
> >> paymentSet = this -> R1 -> R2 WHERE (date=20080113)
> >> FOREACH payment IN paymentSet
> >>     // process payment objects
>
> > or
>
> > select paymentid from invoice i join payment p on
> > i.invoiceid=p.invoiceid
> > where customerid=? and p.date=20080113
>
> Yes, they are quite similar. But apropos of your other message, that
> just reflects that both are based on the relational model. However, the
> models are quite different because the collection sets are object-based
> in the AAL but they are table-based for the SQL. The set of invoices and
> the set of payments examined in the AAL case will usually be much
> smaller that the corresponding invoice and payment sets in the RDB.

That is a limitation you have to do, because your solution will
perform too bad otherwise. A relational database on the contrary isn't
limited to only operate on small amounts of data.

> >> when the user wants payments. [Note that the WHERE clause filters the
> >> date attribute in the target set. When a multistage path is navigated it
> >> is assumed that the membership of the intermediate collections already
> >> limits the set adequately. If not, one needs to do the second fragment
> >> in two stages.]
>
> > Using an SQL database we don't have to bother about limiting the
> > volume of data. In
> > this case we can assume the indexes to do a pretty good job, O(log n).
>
> There is nothing you can do with an RDB index that can't be done in the
> implementation of a relationship's collection class.

Indeed. The only difference is that using a SQL database, the
implementation already exists. Your solution is to create the
implementation by yourself.

> But the 'n' in
> O(log n) will usually be much smaller in the OO application because the
> collections are object-based rather than class-based.

Lets say you want to find all unpaid invoices. Why would the n be much
smaller in a OO solution?

> > select paymentid
> > from invoice i join payment p on i.invoiceid=p.invoiceid join customer
> > c on c.custid=i.custid
> > where customerid=? and p.date=20080113 and p.amount < c.balance*0.10
>
> What is messy is the number of different indices into the same set. When
> one adds/removes a member of the set, every index must be updated and
> when a member is modified all the relevant indices must be updated.

These operations are performed by the database. The application
developer doesn't have to care. Everything is already implemented.
Using your solution, you have to implement this by yourself.

> >> Then the collection class needs an interface
> >> for each distinct selection criteria. Each accessor may need a
> >> specialized implementation as well.
>
> > Yes, the OO (network) solution is indeed a mess.
>
> Let me see if I understand this. A collection class with multiple
> interfaces for different selection criteria is a mess but different
> database queries for different selection criteria is not a mess. A
> collection class with different implementations for those criteria is a
> mess but multiple indices on the same table is not a mess.

The difference is the time the application developer has to spend to
create the application. A SQL database enables him to create the
application much faster, by resuing existing tools. Quality is also a
major benifit, since implementing such features by yourself, you will
inevitable introduce some bugs on the way.

> >> More to the point, that is /all/ they do; they encapsulate
> >> the collection management in one place. IOW, the collection class
> >> isolates and encapsulates complexity of a highly focused nature. In
> >> general that is a Good Thing.
>
> > Yes, that's why we uses databases.
>
> For data storage that is independent of applications but this context is
> about solving specific customer problems.

Your claim that a relational schema is designed for "data storage that
is independent of the applications", is debatable. When I design
schemas, I design them to fit the specific customer problem. Why
wouldn't I? If you look at the schemas for three different invocing
applications, you will find three different schemas. There doesn't
exists any application independent schema for invoices. It all depends
on the specific customer problem.

> >> However, at OOP time one addresses other issues like performance and
> >> maintainability. Suppose Invoices and Payments are very rarely added
> >> compared to how often the various selection criteria are accessed. In
> >> that case it may well be more efficient to have multiple 1:*
> >> relationships in parallel, each with its own optimized collection
> >> mechanics. Thus one has R2A optimized to access by <arbitrary> date and
> >> R2B optimized to access by some amount criteria. For example, one
> >> collection is sorted by date while the other is sorted by amount. The
> >> overhead of adding the same entry to each collection is <hopefully>
> >> small compared to accessing the same elements many times.
>
> > What about integrity? If a bug causes an exception after adding the
> > payment to the collection sorted by date, but before adding to the
> > collection sorted by amount?
>
> > What if, you modify the amount and forget to update the collection?
> > What if you add another search cirteria, and forget one of the add/
> > modify/delete methods?
>
> And your point is...? Data integrity issues are orthogonal to this
> context (performance, maintainability).

The point is, that you have clearly demonstrated the implications with
having to implement these features by yourself. Data management may be
low-level, but it isn't trivial. That is why using existing products,
like a SQL database, is such a benefit.

//frebe
From: frebe on
> > Isn't views a good tool for providing mappings othan than 1:1?
>
> Depends upon which flavor of views you are talking about.

I am talking about database or SQL views.

> >> But mixing paradigms is usually a very bad idea from an OOA/D
> >> perspective once one is outside the realm of CRUD/USER processing.
> >> That's because the data structures needed to optimize the customer's
> >> problem solution are usually not the same as those needed to optimize
> >> data storage or display and the 1:1 mapping breaks down.
>
> > The database schema should be optimized for the customer's problem
> > solution.
>
> Such optimization utterly defeats the entire relational database model,
> which is designed to provide data storage and access that in independent
> of the way the data is used.

1. Do you have any references to support the claim that the
"relational database model" is designed to provide data storage and
access that in independent of the way the data is used?
2. If (1) would be true, how does that fact utterly defeats the entire
model?

> >> I point all this out because it reflects a major disconnect between
> >> OOA/D and the RAD infrastructures. So far your application sounds like
> >> classic CRUD/USER processing. In that case using the RAD paradigm and
> >> infrastructures will probably save you a ton of keystrokes.
>
> > Indeed. One could also argue 80-90% of all business applications seem
> > to be CRUD/USER according to your definition.
>
> That was true in the '60s and CRUD/USER processing is still a major part
> of IT. But today the proportion has shrunk substantially, probably down
> to the 20-30% range.

Do you have references to a definition of CRUD/USER processing? If we
don't have a working definition, we use of the term seem to be rather
pointless.

> >> But that
> >> will necessarily result in different models than I would use as an OOA/D
> >> guy when solving non-CRUD/USER problems.
>
> > Yes, because an OO model would be a network model (with pointers
> > between
> > object/records), and not a relational model.
>
> We've been here before, yet you persist in making statements like this.
> The reality is that an OOA/D Class Model is normalized using exactly the
> same relational rules as an RDB schema.

How do you apply 2NF to your classes?

> The differences between an OO
> Class Model and a Data Model lie in the way their relational model is
> constructed, not in whether they are relational models.

From Wikipedia ("Relational model"): "The fundamental assumption of
the relational model is that all data is represented as mathematical n-
ary relations". Does the apply to an OO class model?

//frebe

From: H. S. Lahman on

Responding to Frebe...

>>> Isn't views a good tool for providing mappings othan than 1:1?
>> Depends upon which flavor of views you are talking about.
>
> I am talking about database or SQL views.

IOW, join datasets. How are they not a 1:1 mapping of the schema?

>>>> But mixing paradigms is usually a very bad idea from an OOA/D
>>>> perspective once one is outside the realm of CRUD/USER processing.
>>>> That's because the data structures needed to optimize the customer's
>>>> problem solution are usually not the same as those needed to optimize
>>>> data storage or display and the 1:1 mapping breaks down.
>>> The database schema should be optimized for the customer's problem
>>> solution.
>> Such optimization utterly defeats the entire relational database model,
>> which is designed to provide data storage and access that in independent
>> of the way the data is used.
>
> 1. Do you have any references to support the claim that the
> "relational database model" is designed to provide data storage and
> access that in independent of the way the data is used?

Pretty much the entire literature of the field. "Data Modeling
Essentials" by Simsion was the first book I took off my shelf. Chapter 1
has sections on "data reusability", "stability and flexibility", and
"simplicity and elegance" -- all of which address the notion that the
RDB provides data storage that is independent of the use of the data.

> 2. If (1) would be true, how does that fact utterly defeats the entire
> model?

If the database is optimized for a particular problem solution then it
will not be optimized properly when different problem solutions need the
same data. IOW, it will not provide generic data storage and access.

>
>>>> I point all this out because it reflects a major disconnect between
>>>> OOA/D and the RAD infrastructures. So far your application sounds like
>>>> classic CRUD/USER processing. In that case using the RAD paradigm and
>>>> infrastructures will probably save you a ton of keystrokes.
>>> Indeed. One could also argue 80-90% of all business applications seem
>>> to be CRUD/USER according to your definition.
>> That was true in the '60s and CRUD/USER processing is still a major part
>> of IT. But today the proportion has shrunk substantially, probably down
>> to the 20-30% range.
>
> Do you have references to a definition of CRUD/USER processing? If we
> don't have a working definition, we use of the term seem to be rather
> pointless.

Google it. You'll find plenty references to CRUD and USER acronyms.

CRUD => Create, Report, Update, Delete
USER => Update, Sort, Extract, Report

The acronyms pretty much say it all.

>>>> But that
>>>> will necessarily result in different models than I would use as an OOA/D
>>>> guy when solving non-CRUD/USER problems.
>>> Yes, because an OO model would be a network model (with pointers
>>> between
>>> object/records), and not a relational model.
>> We've been here before, yet you persist in making statements like this.
>> The reality is that an OOA/D Class Model is normalized using exactly the
>> same relational rules as an RDB schema.
>
> How do you apply 2NF to your classes?

The same as you do. Every non-identity responsibility must be a simple
domain (1NF) and must be fully dependent on the object identity.

I'm surprised you selected 2NF; the DBMS mavens usually follow the Chris
Date lead and attack 3NF applied to behavior responsibilities. However,
I have no intention of getting into Chris Date debate.

>
>> The differences between an OO
>> Class Model and a Data Model lie in the way their relational model is
>> constructed, not in whether they are relational models.
>
> From Wikipedia ("Relational model"): "The fundamental assumption of
> the relational model is that all data is represented as mathematical n-
> ary relations". Does the apply to an OO class model?

Of course.

--
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
hsl(a)pathfindermda.com
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
info(a)pathfindermda.com for your copy.
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
From: H. S. Lahman on
Responding to Frebe...

>>>> invoiceSet = this -> R1 WHERE (date=20080113)
>>>> FOREACH invoice IN invoiceSet
>>>> // process invoice objects
>>> or
>>> select invoiceid from invoice
>>> where customerid=? and date=20080113
>>>> paymentSet = this -> R1 -> R2 WHERE (date=20080113)
>>>> FOREACH payment IN paymentSet
>>>> // process payment objects
>>> or
>>> select paymentid from invoice i join payment p on
>>> i.invoiceid=p.invoiceid
>>> where customerid=? and p.date=20080113
>> Yes, they are quite similar. But apropos of your other message, that
>> just reflects that both are based on the relational model. However, the
>> models are quite different because the collection sets are object-based
>> in the AAL but they are table-based for the SQL. The set of invoices and
>> the set of payments examined in the AAL case will usually be much
>> smaller that the corresponding invoice and payment sets in the RDB.
>
> That is a limitation you have to do, because your solution will
> perform too bad otherwise. A relational database on the contrary isn't
> limited to only operate on small amounts of data.

We do this because using the RDB query model with table-based indices
would be very inefficient for memory-based computing when solving
particular problems. That paradigm is fine for generic data storage and
access but searching large sets sucks for algorithmic processing.

The emphasis on avoiding large searches goes back to before Nicolas
Wirth's "Programs = Algorithms + Data Structures". When solving complex
algorithm problems one /always/ tailors data structures to minimize such
searches based on unique features of the problem in hand. That tailoring
is what application developers /do/. The OO paradigm just provides a bit
of help in formalizing that with object-based relationship instantiation.

>>>> when the user wants payments. [Note that the WHERE clause filters the
>>>> date attribute in the target set. When a multistage path is navigated it
>>>> is assumed that the membership of the intermediate collections already
>>>> limits the set adequately. If not, one needs to do the second fragment
>>>> in two stages.]
>>> Using an SQL database we don't have to bother about limiting the
>>> volume of data. In
>>> this case we can assume the indexes to do a pretty good job, O(log n).
>> There is nothing you can do with an RDB index that can't be done in the
>> implementation of a relationship's collection class.
>
> Indeed. The only difference is that using a SQL database, the
> implementation already exists. Your solution is to create the
> implementation by yourself.

Did you miss the part in the first message where I said that if one is
doing CRUD/USER processing, one should use a RAD IDE rather than an OO
approach?

>> But the 'n' in
>> O(log n) will usually be much smaller in the OO application because the
>> collections are object-based rather than class-based.
>
> Lets say you want to find all unpaid invoices. Why would the n be much
> smaller in a OO solution?

I said, "usually'. You are postulating a class-based search as a problem
requirement.

[Even then there are ways to avoid it. For example, one can define a
relationship to [Invoice] that only includes unpaid invoices. When an
invoice is paid, it is removed from the collection. Whether that (or
other optimizations) is viable will depend on other, mostly
nonfunctional problem requirements. The point is that the application
developer would look for such solutions and would use them if they were
viable.]

>
>>> select paymentid
>>> from invoice i join payment p on i.invoiceid=p.invoiceid join customer
>>> c on c.custid=i.custid
>>> where customerid=? and p.date=20080113 and p.amount < c.balance*0.10
>> What is messy is the number of different indices into the same set. When
>> one adds/removes a member of the set, every index must be updated and
>> when a member is modified all the relevant indices must be updated.
>
> These operations are performed by the database. The application
> developer doesn't have to care. Everything is already implemented.
> Using your solution, you have to implement this by yourself.

Again, did you miss the part where I said the OP might be able to save a
bunch of keystrokes by not doing OO? Where I said that RAD
infrastructures provide a lot of automation that would have to be done
from scratch in an OO approach?

In a context where the OO paradigm would be useful, then such a solution
would be messy and one would look for a better one. The database engine
doesn't have that option, which is pretty much the point of providing
generic facilities vs. highly tailored problem solutions.

[Note that I also never said one shouldn't take advantage of the RAD
facilities. One should do that in a client-side data access subsystem
where the RDB schema is relevant. But when customizing the problem
solution data structures one needs to provide the access to those data
structures anyway and one needs to do it efficiently through custom
tailoring.]

>>>> Then the collection class needs an interface
>>>> for each distinct selection criteria. Each accessor may need a
>>>> specialized implementation as well.
>>> Yes, the OO (network) solution is indeed a mess.
>> Let me see if I understand this. A collection class with multiple
>> interfaces for different selection criteria is a mess but different
>> database queries for different selection criteria is not a mess. A
>> collection class with different implementations for those criteria is a
>> mess but multiple indices on the same table is not a mess.
>
> The difference is the time the application developer has to spend to
> create the application. A SQL database enables him to create the
> application much faster, by resuing existing tools. Quality is also a
> major benifit, since implementing such features by yourself, you will
> inevitable introduce some bugs on the way.

A SQL database allows him to construct CRUD/USER applications quickly
because the only problem being solved is data view conversion. That is
exactly what the Form/Query/Table RAD paradigm is designed to largely
automate and it does a good job of it.

But when you have a problem to solve that requires complex processing of
the data, you have to optimize to the problem and provide custom data
structures. IOW, you are going to have to provide unique access because
the mappings are not the same as in the RDB schema. That's what
application design is all about outside of CRUD/USER contexts.

>>>> More to the point, that is /all/ they do; they encapsulate
>>>> the collection management in one place. IOW, the collection class
>>>> isolates and encapsulates complexity of a highly focused nature. In
>>>> general that is a Good Thing.
>>> Yes, that's why we uses databases.
>> For data storage that is independent of applications but this context is
>> about solving specific customer problems.
>
> Your claim that a relational schema is designed for "data storage that
> is independent of the applications", is debatable. When I design
> schemas, I design them to fit the specific customer problem. Why
> wouldn't I? If you look at the schemas for three different invocing
> applications, you will find three different schemas. There doesn't
> exists any application independent schema for invoices. It all depends
> on the specific customer problem.

IMO, that is very bad practice for a DBA because it precludes reuse of
the data to solve other, different problems.

When one creates a Data Model, it should be based on the overall problem
domain rather than specific problems within the domain. If you do that,
then the same schema should be reusable by all the core accounting
applications.


--
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
hsl(a)pathfindermda.com
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
info(a)pathfindermda.com for your copy.
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH