|
From: H. S. Lahman on 16 Jan 2008 15:29 Responding to Frebe... >> invoiceSet = this -> R1 WHERE (date=20080113) >> FOREACH invoice IN invoiceSet >> // process invoice objects > > or > > select invoiceid from invoice > where customerid=? and date=20080113 > >> paymentSet = this -> R1 -> R2 WHERE (date=20080113) >> FOREACH payment IN paymentSet >> // process payment objects > > or > > select paymentid from invoice i join payment p on > i.invoiceid=p.invoiceid > where customerid=? and p.date=20080113 Yes, they are quite similar. But apropos of your other message, that just reflects that both are based on the relational model. However, the models are quite different because the collection sets are object-based in the AAL but they are table-based for the SQL. The set of invoices and the set of payments examined in the AAL case will usually be much smaller that the corresponding invoice and payment sets in the RDB. >> when the user wants payments. [Note that the WHERE clause filters the >> date attribute in the target set. When a multistage path is navigated it >> is assumed that the membership of the intermediate collections already >> limits the set adequately. If not, one needs to do the second fragment >> in two stages.] > > Using an SQL database we don't have to bother about limiting the > volume of data. In > this case we can assume the indexes to do a pretty good job, O(log n). There is nothing you can do with an RDB index that can't be done in the implementation of a relationship's collection class. But the 'n' in O(log n) will usually be much smaller in the OO application because the collections are object-based rather than class-based. >> This is fine so long as the filter is simple, such as a date. Each >> collection is optimized for efficient access via a date. As you point >> out, it gets somewhat messy if there are multiple filters (e.g., one >> also may want payments where the amount was less than some percentage of >> the outstanding balance). > > Using relational algebra (SQL) it wouldn't be messy. > > select paymentid > from invoice i join payment p on i.invoiceid=p.invoiceid join customer > c on c.custid=i.custid > where customerid=? and p.date=20080113 and p.amount < c.balance*0.10 What is messy is the number of different indices into the same set. When one adds/removes a member of the set, every index must be updated and when a member is modified all the relevant indices must be updated. >> Then the collection class needs an interface >> for each distinct selection criteria. Each accessor may need a >> specialized implementation as well. > > Yes, the OO (network) solution is indeed a mess. Let me see if I understand this. A collection class with multiple interfaces for different selection criteria is a mess but different database queries for different selection criteria is not a mess. A collection class with different implementations for those criteria is a mess but multiple indices on the same table is not a mess. >> However, that is pretty much why collection classes exist; they manage >> collections. > > That's pretty much why databases exists, they manage data. True, but I'm talking about collection classes, whose database analogue is a table index, not the database itself. > >> More to the point, that is /all/ they do; they encapsulate >> the collection management in one place. IOW, the collection class >> isolates and encapsulates complexity of a highly focused nature. In >> general that is a Good Thing. > > Yes, that's why we uses databases. For data storage that is independent of applications but this context is about solving specific customer problems. >> However, at OOP time one addresses other issues like performance and >> maintainability. Suppose Invoices and Payments are very rarely added >> compared to how often the various selection criteria are accessed. In >> that case it may well be more efficient to have multiple 1:* >> relationships in parallel, each with its own optimized collection >> mechanics. Thus one has R2A optimized to access by <arbitrary> date and >> R2B optimized to access by some amount criteria. For example, one >> collection is sorted by date while the other is sorted by amount. The >> overhead of adding the same entry to each collection is <hopefully> >> small compared to accessing the same elements many times. > > What about integrity? If a bug causes an exception after adding the > payment to the collection sorted by date, but before adding to the > collection sorted by amount? > > What if, you modify the amount and forget to update the collection? > What if you add another search cirteria, and forget one of the add/ > modify/delete methods? And your point is...? Data integrity issues are orthogonal to this context (performance, maintainability). -- There is nothing wrong with me that could not be cured by a capful of Drano. H. S. Lahman hsl(a)pathfindermda.com Pathfinder Solutions http://www.pathfindermda.com blog: http://pathfinderpeople.blogs.com/hslahman "Model-Based Translation: The Next Step in Agile Development". Email info(a)pathfindermda.com for your copy. Pathfinder is hiring: http://www.pathfindermda.com/about_us/careers_pos3.php. (888)OOA-PATH
From: frebe on 16 Jan 2008 23:08 > >> invoiceSet = this -> R1 WHERE (date=20080113) > >> FOREACH invoice IN invoiceSet > >> // process invoice objects > > > or > > > select invoiceid from invoice > > where customerid=? and date=20080113 > > >> paymentSet = this -> R1 -> R2 WHERE (date=20080113) > >> FOREACH payment IN paymentSet > >> // process payment objects > > > or > > > select paymentid from invoice i join payment p on > > i.invoiceid=p.invoiceid > > where customerid=? and p.date=20080113 > > Yes, they are quite similar. But apropos of your other message, that > just reflects that both are based on the relational model. However, the > models are quite different because the collection sets are object-based > in the AAL but they are table-based for the SQL. The set of invoices and > the set of payments examined in the AAL case will usually be much > smaller that the corresponding invoice and payment sets in the RDB. That is a limitation you have to do, because your solution will perform too bad otherwise. A relational database on the contrary isn't limited to only operate on small amounts of data. > >> when the user wants payments. [Note that the WHERE clause filters the > >> date attribute in the target set. When a multistage path is navigated it > >> is assumed that the membership of the intermediate collections already > >> limits the set adequately. If not, one needs to do the second fragment > >> in two stages.] > > > Using an SQL database we don't have to bother about limiting the > > volume of data. In > > this case we can assume the indexes to do a pretty good job, O(log n). > > There is nothing you can do with an RDB index that can't be done in the > implementation of a relationship's collection class. Indeed. The only difference is that using a SQL database, the implementation already exists. Your solution is to create the implementation by yourself. > But the 'n' in > O(log n) will usually be much smaller in the OO application because the > collections are object-based rather than class-based. Lets say you want to find all unpaid invoices. Why would the n be much smaller in a OO solution? > > select paymentid > > from invoice i join payment p on i.invoiceid=p.invoiceid join customer > > c on c.custid=i.custid > > where customerid=? and p.date=20080113 and p.amount < c.balance*0.10 > > What is messy is the number of different indices into the same set. When > one adds/removes a member of the set, every index must be updated and > when a member is modified all the relevant indices must be updated. These operations are performed by the database. The application developer doesn't have to care. Everything is already implemented. Using your solution, you have to implement this by yourself. > >> Then the collection class needs an interface > >> for each distinct selection criteria. Each accessor may need a > >> specialized implementation as well. > > > Yes, the OO (network) solution is indeed a mess. > > Let me see if I understand this. A collection class with multiple > interfaces for different selection criteria is a mess but different > database queries for different selection criteria is not a mess. A > collection class with different implementations for those criteria is a > mess but multiple indices on the same table is not a mess. The difference is the time the application developer has to spend to create the application. A SQL database enables him to create the application much faster, by resuing existing tools. Quality is also a major benifit, since implementing such features by yourself, you will inevitable introduce some bugs on the way. > >> More to the point, that is /all/ they do; they encapsulate > >> the collection management in one place. IOW, the collection class > >> isolates and encapsulates complexity of a highly focused nature. In > >> general that is a Good Thing. > > > Yes, that's why we uses databases. > > For data storage that is independent of applications but this context is > about solving specific customer problems. Your claim that a relational schema is designed for "data storage that is independent of the applications", is debatable. When I design schemas, I design them to fit the specific customer problem. Why wouldn't I? If you look at the schemas for three different invocing applications, you will find three different schemas. There doesn't exists any application independent schema for invoices. It all depends on the specific customer problem. > >> However, at OOP time one addresses other issues like performance and > >> maintainability. Suppose Invoices and Payments are very rarely added > >> compared to how often the various selection criteria are accessed. In > >> that case it may well be more efficient to have multiple 1:* > >> relationships in parallel, each with its own optimized collection > >> mechanics. Thus one has R2A optimized to access by <arbitrary> date and > >> R2B optimized to access by some amount criteria. For example, one > >> collection is sorted by date while the other is sorted by amount. The > >> overhead of adding the same entry to each collection is <hopefully> > >> small compared to accessing the same elements many times. > > > What about integrity? If a bug causes an exception after adding the > > payment to the collection sorted by date, but before adding to the > > collection sorted by amount? > > > What if, you modify the amount and forget to update the collection? > > What if you add another search cirteria, and forget one of the add/ > > modify/delete methods? > > And your point is...? Data integrity issues are orthogonal to this > context (performance, maintainability). The point is, that you have clearly demonstrated the implications with having to implement these features by yourself. Data management may be low-level, but it isn't trivial. That is why using existing products, like a SQL database, is such a benefit. //frebe
From: frebe on 17 Jan 2008 02:38 > > Isn't views a good tool for providing mappings othan than 1:1? > > Depends upon which flavor of views you are talking about. I am talking about database or SQL views. > >> But mixing paradigms is usually a very bad idea from an OOA/D > >> perspective once one is outside the realm of CRUD/USER processing. > >> That's because the data structures needed to optimize the customer's > >> problem solution are usually not the same as those needed to optimize > >> data storage or display and the 1:1 mapping breaks down. > > > The database schema should be optimized for the customer's problem > > solution. > > Such optimization utterly defeats the entire relational database model, > which is designed to provide data storage and access that in independent > of the way the data is used. 1. Do you have any references to support the claim that the "relational database model" is designed to provide data storage and access that in independent of the way the data is used? 2. If (1) would be true, how does that fact utterly defeats the entire model? > >> I point all this out because it reflects a major disconnect between > >> OOA/D and the RAD infrastructures. So far your application sounds like > >> classic CRUD/USER processing. In that case using the RAD paradigm and > >> infrastructures will probably save you a ton of keystrokes. > > > Indeed. One could also argue 80-90% of all business applications seem > > to be CRUD/USER according to your definition. > > That was true in the '60s and CRUD/USER processing is still a major part > of IT. But today the proportion has shrunk substantially, probably down > to the 20-30% range. Do you have references to a definition of CRUD/USER processing? If we don't have a working definition, we use of the term seem to be rather pointless. > >> But that > >> will necessarily result in different models than I would use as an OOA/D > >> guy when solving non-CRUD/USER problems. > > > Yes, because an OO model would be a network model (with pointers > > between > > object/records), and not a relational model. > > We've been here before, yet you persist in making statements like this. > The reality is that an OOA/D Class Model is normalized using exactly the > same relational rules as an RDB schema. How do you apply 2NF to your classes? > The differences between an OO > Class Model and a Data Model lie in the way their relational model is > constructed, not in whether they are relational models. From Wikipedia ("Relational model"): "The fundamental assumption of the relational model is that all data is represented as mathematical n- ary relations". Does the apply to an OO class model? //frebe
From: H. S. Lahman on 17 Jan 2008 11:04 Responding to Frebe... >>> Isn't views a good tool for providing mappings othan than 1:1? >> Depends upon which flavor of views you are talking about. > > I am talking about database or SQL views. IOW, join datasets. How are they not a 1:1 mapping of the schema? >>>> But mixing paradigms is usually a very bad idea from an OOA/D >>>> perspective once one is outside the realm of CRUD/USER processing. >>>> That's because the data structures needed to optimize the customer's >>>> problem solution are usually not the same as those needed to optimize >>>> data storage or display and the 1:1 mapping breaks down. >>> The database schema should be optimized for the customer's problem >>> solution. >> Such optimization utterly defeats the entire relational database model, >> which is designed to provide data storage and access that in independent >> of the way the data is used. > > 1. Do you have any references to support the claim that the > "relational database model" is designed to provide data storage and > access that in independent of the way the data is used? Pretty much the entire literature of the field. "Data Modeling Essentials" by Simsion was the first book I took off my shelf. Chapter 1 has sections on "data reusability", "stability and flexibility", and "simplicity and elegance" -- all of which address the notion that the RDB provides data storage that is independent of the use of the data. > 2. If (1) would be true, how does that fact utterly defeats the entire > model? If the database is optimized for a particular problem solution then it will not be optimized properly when different problem solutions need the same data. IOW, it will not provide generic data storage and access. > >>>> I point all this out because it reflects a major disconnect between >>>> OOA/D and the RAD infrastructures. So far your application sounds like >>>> classic CRUD/USER processing. In that case using the RAD paradigm and >>>> infrastructures will probably save you a ton of keystrokes. >>> Indeed. One could also argue 80-90% of all business applications seem >>> to be CRUD/USER according to your definition. >> That was true in the '60s and CRUD/USER processing is still a major part >> of IT. But today the proportion has shrunk substantially, probably down >> to the 20-30% range. > > Do you have references to a definition of CRUD/USER processing? If we > don't have a working definition, we use of the term seem to be rather > pointless. Google it. You'll find plenty references to CRUD and USER acronyms. CRUD => Create, Report, Update, Delete USER => Update, Sort, Extract, Report The acronyms pretty much say it all. >>>> But that >>>> will necessarily result in different models than I would use as an OOA/D >>>> guy when solving non-CRUD/USER problems. >>> Yes, because an OO model would be a network model (with pointers >>> between >>> object/records), and not a relational model. >> We've been here before, yet you persist in making statements like this. >> The reality is that an OOA/D Class Model is normalized using exactly the >> same relational rules as an RDB schema. > > How do you apply 2NF to your classes? The same as you do. Every non-identity responsibility must be a simple domain (1NF) and must be fully dependent on the object identity. I'm surprised you selected 2NF; the DBMS mavens usually follow the Chris Date lead and attack 3NF applied to behavior responsibilities. However, I have no intention of getting into Chris Date debate. > >> The differences between an OO >> Class Model and a Data Model lie in the way their relational model is >> constructed, not in whether they are relational models. > > From Wikipedia ("Relational model"): "The fundamental assumption of > the relational model is that all data is represented as mathematical n- > ary relations". Does the apply to an OO class model? Of course. -- There is nothing wrong with me that could not be cured by a capful of Drano. H. S. Lahman hsl(a)pathfindermda.com Pathfinder Solutions http://www.pathfindermda.com blog: http://pathfinderpeople.blogs.com/hslahman "Model-Based Translation: The Next Step in Agile Development". Email info(a)pathfindermda.com for your copy. Pathfinder is hiring: http://www.pathfindermda.com/about_us/careers_pos3.php. (888)OOA-PATH
From: H. S. Lahman on 17 Jan 2008 11:42
Responding to Frebe... >>>> invoiceSet = this -> R1 WHERE (date=20080113) >>>> FOREACH invoice IN invoiceSet >>>> // process invoice objects >>> or >>> select invoiceid from invoice >>> where customerid=? and date=20080113 >>>> paymentSet = this -> R1 -> R2 WHERE (date=20080113) >>>> FOREACH payment IN paymentSet >>>> // process payment objects >>> or >>> select paymentid from invoice i join payment p on >>> i.invoiceid=p.invoiceid >>> where customerid=? and p.date=20080113 >> Yes, they are quite similar. But apropos of your other message, that >> just reflects that both are based on the relational model. However, the >> models are quite different because the collection sets are object-based >> in the AAL but they are table-based for the SQL. The set of invoices and >> the set of payments examined in the AAL case will usually be much >> smaller that the corresponding invoice and payment sets in the RDB. > > That is a limitation you have to do, because your solution will > perform too bad otherwise. A relational database on the contrary isn't > limited to only operate on small amounts of data. We do this because using the RDB query model with table-based indices would be very inefficient for memory-based computing when solving particular problems. That paradigm is fine for generic data storage and access but searching large sets sucks for algorithmic processing. The emphasis on avoiding large searches goes back to before Nicolas Wirth's "Programs = Algorithms + Data Structures". When solving complex algorithm problems one /always/ tailors data structures to minimize such searches based on unique features of the problem in hand. That tailoring is what application developers /do/. The OO paradigm just provides a bit of help in formalizing that with object-based relationship instantiation. >>>> when the user wants payments. [Note that the WHERE clause filters the >>>> date attribute in the target set. When a multistage path is navigated it >>>> is assumed that the membership of the intermediate collections already >>>> limits the set adequately. If not, one needs to do the second fragment >>>> in two stages.] >>> Using an SQL database we don't have to bother about limiting the >>> volume of data. In >>> this case we can assume the indexes to do a pretty good job, O(log n). >> There is nothing you can do with an RDB index that can't be done in the >> implementation of a relationship's collection class. > > Indeed. The only difference is that using a SQL database, the > implementation already exists. Your solution is to create the > implementation by yourself. Did you miss the part in the first message where I said that if one is doing CRUD/USER processing, one should use a RAD IDE rather than an OO approach? >> But the 'n' in >> O(log n) will usually be much smaller in the OO application because the >> collections are object-based rather than class-based. > > Lets say you want to find all unpaid invoices. Why would the n be much > smaller in a OO solution? I said, "usually'. You are postulating a class-based search as a problem requirement. [Even then there are ways to avoid it. For example, one can define a relationship to [Invoice] that only includes unpaid invoices. When an invoice is paid, it is removed from the collection. Whether that (or other optimizations) is viable will depend on other, mostly nonfunctional problem requirements. The point is that the application developer would look for such solutions and would use them if they were viable.] > >>> select paymentid >>> from invoice i join payment p on i.invoiceid=p.invoiceid join customer >>> c on c.custid=i.custid >>> where customerid=? and p.date=20080113 and p.amount < c.balance*0.10 >> What is messy is the number of different indices into the same set. When >> one adds/removes a member of the set, every index must be updated and >> when a member is modified all the relevant indices must be updated. > > These operations are performed by the database. The application > developer doesn't have to care. Everything is already implemented. > Using your solution, you have to implement this by yourself. Again, did you miss the part where I said the OP might be able to save a bunch of keystrokes by not doing OO? Where I said that RAD infrastructures provide a lot of automation that would have to be done from scratch in an OO approach? In a context where the OO paradigm would be useful, then such a solution would be messy and one would look for a better one. The database engine doesn't have that option, which is pretty much the point of providing generic facilities vs. highly tailored problem solutions. [Note that I also never said one shouldn't take advantage of the RAD facilities. One should do that in a client-side data access subsystem where the RDB schema is relevant. But when customizing the problem solution data structures one needs to provide the access to those data structures anyway and one needs to do it efficiently through custom tailoring.] >>>> Then the collection class needs an interface >>>> for each distinct selection criteria. Each accessor may need a >>>> specialized implementation as well. >>> Yes, the OO (network) solution is indeed a mess. >> Let me see if I understand this. A collection class with multiple >> interfaces for different selection criteria is a mess but different >> database queries for different selection criteria is not a mess. A >> collection class with different implementations for those criteria is a >> mess but multiple indices on the same table is not a mess. > > The difference is the time the application developer has to spend to > create the application. A SQL database enables him to create the > application much faster, by resuing existing tools. Quality is also a > major benifit, since implementing such features by yourself, you will > inevitable introduce some bugs on the way. A SQL database allows him to construct CRUD/USER applications quickly because the only problem being solved is data view conversion. That is exactly what the Form/Query/Table RAD paradigm is designed to largely automate and it does a good job of it. But when you have a problem to solve that requires complex processing of the data, you have to optimize to the problem and provide custom data structures. IOW, you are going to have to provide unique access because the mappings are not the same as in the RDB schema. That's what application design is all about outside of CRUD/USER contexts. >>>> More to the point, that is /all/ they do; they encapsulate >>>> the collection management in one place. IOW, the collection class >>>> isolates and encapsulates complexity of a highly focused nature. In >>>> general that is a Good Thing. >>> Yes, that's why we uses databases. >> For data storage that is independent of applications but this context is >> about solving specific customer problems. > > Your claim that a relational schema is designed for "data storage that > is independent of the applications", is debatable. When I design > schemas, I design them to fit the specific customer problem. Why > wouldn't I? If you look at the schemas for three different invocing > applications, you will find three different schemas. There doesn't > exists any application independent schema for invoices. It all depends > on the specific customer problem. IMO, that is very bad practice for a DBA because it precludes reuse of the data to solve other, different problems. When one creates a Data Model, it should be based on the overall problem domain rather than specific problems within the domain. If you do that, then the same schema should be reusable by all the core accounting applications. -- There is nothing wrong with me that could not be cured by a capful of Drano. H. S. Lahman hsl(a)pathfindermda.com Pathfinder Solutions http://www.pathfindermda.com blog: http://pathfinderpeople.blogs.com/hslahman "Model-Based Translation: The Next Step in Agile Development". Email info(a)pathfindermda.com for your copy. Pathfinder is hiring: http://www.pathfindermda.com/about_us/careers_pos3.php. (888)OOA-PATH |