|
From: jimbalo22 on 18 Jan 2008 15:19 > > The app is for scheduling payments of invoices and has three grids > > (regular DataGridViews): > > A Vendor grid showing a summary view of each vendor (with total Due, > > etc.) > > An Invoice grid listing all invoices for the selected vendor > > (including a total amount of payments scheduled for each invoice) > > A Payment grid listing all payments scheduled for the selected > > Invoice. > > > Any payments added / removed or modifed in the payment grid (or mass- > > updated via code) needs to be reflected instantly in the other two > > grids. > > > Note that the vendor grid (parent) is based on the same DB table as > > the invoice grid (the children), but the vendor grid is a summary view > > and the invoice grid is a detail view. > > > The DB tables are: > > Invoice (VendNo, InvNo, DueDate, Amount) > > Payment (VendNo, InvNo, SeqNo, PayDate, PayAmount) > > > The Vendor Grid should contain the equivalent of this simplified SQL > > statement: > > SELECT Invoice.VendNo, SUM(Invoice.BalDue) as TotalDue, > > SUM(Payment.PayAmount) as TotalPay > > FROM Invoice LEFT OUTER JOIN Payment ON > > Invoice.VendNo = Payment.VendNo AND > > Invoice.InvNo = Payment.InvNo > > GROUP BY Invoice.VendNo > > > I could of course do this as a view on the DB server and use that for > > the vendor grid and use the Invoice table for the Invoice grid, but > > how do you keep them in sync when payments are added / deleted or > > modified from the invoices? Would I have to persist the data to the > > DB and rebind the grids? (too slow) > > After adding a payment, you obviously need to refresh the grids. How > many millis does this take? How many millis does it take to execute > the SQL? What is your time constraint? There are several issues with this. There are thousands of invoices and the vendor grid is bound to an aggregate SQL statement of all these invoices. Anytime the user changes a payment amount or date, something like this would have to happen: 1) The payment record is persisted to the DB 2) The invoice grid (which is also bound to an aggregate SQL statement or view [since it contains payment totals] would have to be rebound and the SQL statement thus re-executed) 3) The rebind above will cause the grid to lose its current row, so you have to reposition using Find or similar. 4) The vendor grid (which is bound to an aggregate SQL statement / view has to be rebound and the SQL thus re-executed) 5) The vendor grid loses it place and have to be repositioned 6) The invoice grid will lose it place again due to the vendor grid's current record changing The above are not the exact steps and there is probably a better sequence, but I think you get the point. I have a working application already using business objects. It was a bit tricky and a bit messy with the filter related methods, but any change to a payment is instantly reflected in all the other grids without the need to persist & reload from DB, etc. (using the standard databinding events of the .NET framework). I have not dealt much with DataSets and I got the apparently errenous impression that they would be a better tool for the job, but it seems OOA/D is still the best way to go. Jim
From: H. S. Lahman on 18 Jan 2008 16:46 Respondng to Frebe... >>>>>> invoiceSet = this -> R1 WHERE (date=20080113) >>>>>> FOREACH invoice IN invoiceSet >>>>>> // process invoice objects >>>>> or >>>>> select invoiceid from invoice >>>>> where customerid=? and date=20080113 >>>>>> paymentSet = this -> R1 -> R2 WHERE (date=20080113) >>>>>> FOREACH payment IN paymentSet >>>>>> // process payment objects >>>>> or >>>>> select paymentid from invoice i join payment p on >>>>> i.invoiceid=p.invoiceid >>>>> where customerid=? and p.date=20080113 >>>> Yes, they are quite similar. But apropos of your other message, that >>>> just reflects that both are based on the relational model. However, the >>>> models are quite different because the collection sets are object-based >>>> in the AAL but they are table-based for the SQL. The set of invoices and >>>> the set of payments examined in the AAL case will usually be much >>>> smaller that the corresponding invoice and payment sets in the RDB. >>> That is a limitation you have to do, because your solution will >>> perform too bad otherwise. A relational database on the contrary isn't >>> limited to only operate on small amounts of data. >> We do this because using the RDB query model with table-based indices >> would be very inefficient for memory-based computing when solving >> particular problems. > > I solve particular problems using "the RDB query model" every day, and > I have not noticed it being "very inefficient". That's because you are either doing CRUD/USER applications or you haven't benchmarked your solutions... >> That paradigm is fine for generic data storage and >> access but searching large sets sucks for algorithmic processing. > > SQL databases sucks for searching large data sets, come on... By your own admission, queries rely on the DBMS providing O(log N) search performance. You also have not denied that RDB indices are instantiated at the n-ary relation (table) level. So N is <almost> always the number of tuples in the table. You don't deny my assertion that I can perform the same O(log N) optimization in the implementation of a <reusable> collection class. But if OO relationships are instantiated at the object (tuple) level, then the N in OO searches will usually be much smaller than the N in table level searches. So one /must/ be able to achieve more efficient searches given OO's object-level instantiation. The price one pays for that efficiency is that the object-level instantiation has to be hand-crafted based on the particular problem context. OTOH, the DBMS provides storage and generic access that is reusable across different problem solutions no matter who "owns" the data. The benefit of that is that all client solutions face the same access paradigm, which is very general. But the price is that it will be uniformly less efficient than if if were tailored on a case-by-case basis. Why do you think that designers of large, complex applications spend so much effort trying to minimize persistence access, regardless of whether they are doing OO or not? They introduce elaborate caching schemes, deliberately present themselves with major data integrity problems, and whatnot because persistence access is almost always the performance bottleneck of such applications. When solving complex problems the same data is quite often accessed and processed many times during the course of the solution. If one used the same accessing paradigm internally in the solution as the RDB uses the application would brought to its knees. (More precisely, it would not be competitive in the marketplace with other applications that optimized for the problem in hand.) So the cardinal rule of complex application development is to read the data once and write it once, no matter how many times it must be accessed in the solution. There's a reason for that rule and seek time is just part of the problem. > >>>> But the 'n' in >>>> O(log n) will usually be much smaller in the OO application because the >>>> collections are object-based rather than class-based. >>> Lets say you want to find all unpaid invoices. Why would the n be much >>> smaller in a OO solution? >> I said, "usually'. You are postulating a class-based search as a problem >> requirement. > > You might think that my example is too extreme, but isn't it good to > use a method/tools that doesn't limit you to work on small amounts of > data? Who is limited to working on small amounts of data?!? In an OO application a single relationship between classes in a Class Model is just implemented as many small relationships between class members rather that as a single relationship for all members. The number of members (tuples) that are related remains exactly the same. >> [Note that I also never said one shouldn't take advantage of the RAD >> facilities. One should do that in a client-side data access subsystem >> where the RDB schema is relevant. But when customizing the problem >> solution data structures one needs to provide the access to those data >> structures anyway and one needs to do it efficiently through custom >> tailoring.] > > Your main argument seem to be that the performance of a relational > databases should be insufficient, compared to your pointer based > solution. There are probably many scenarios there a pointer-based > solution would perform O(1) and a relational O(log n). But history > (and your previous posts in this thread) has showed numerous > disadvantages with data management using pointers (network databases), > and the benefits with the relational model are obvious. These days, > nobody are proposing pointer-based databases, only when it comes to > data in RAM, some people are still proposing this. What if RAM isn't > big enough for the data needed in the processing? Recently I worked > with an (OO) application loading data from database at startup. I > guess I don't have to tell you the problems the applications faced. > One was a startup-time of +6 hours. Another was dirty reads. Pointers are a red herring. They just provide a more efficient mechanism for object (tuple) identity than embedded identifiers. The situation is analogous to using consecutive integers for identity rather than, say, and ASCII name; it just enables more efficient access like arrays (i.e., a simple offset computation does not depend on the number of elements while an element-by-element search/compare does). My main problem is with table-based searches. Table-based searches are a very general access mechanism that provides uniform access regardless of usage context (exactly what one wants in enterprise data storage). But much of the way the OO paradigm manages relationships is focused on eliminating searches by tailoring the solution structure to the specific problem. [The WHERE clause I used in an example is actually very rarely seen in OO applications; it is a kind of last resort when the requirements allow no more efficient alternative. And using a WHERE is very likely to get close scrutiny by OOA/D reviewers.] > >> A SQL database allows him to construct CRUD/USER applications quickly >> because the only problem being solved is data view conversion. That is >> exactly what the Form/Query/Table RAD paradigm is designed to largely >> automate and it does a good job of it. >> >> But when you have a problem to solve that requires complex processing of >> the data, you have to optimize to the problem and provide custom data >> structures. IOW, you are going to have to provide unique access because >> the mappings are not the same as in the RDB schema. That's what >> application design is all about outside of CRUD/USER contexts. > > Are producing invoices, "complex processing"? There are scenarios of > data processing that are not supported very well by existing index > types in current mainstream (SQL) databases. But the major part of all > processing done in common business applications, perform very well > using SQL databases, without having any custom data structures. What > about all COBOL applications out there with extremly simple data > structures, relying heavily on SQL statements to do the job. Doesn't > they perform well enough? In most situations producing invoices is pretty straight forward. Order + Lading Bill = Invoice. IOW, I would normally expect that to be CRUD/USER processing. That's why companies buy OTS Accounts Receivables packages rather than reinventing the wheel. >>> Your claim that a relational schema is designed for "data storage that >>> is independent of the applications", is debatable. When I design >>> schemas, I design them to fit the specific customer problem. Why >>> wouldn't I? If you look at the schemas for three different invocing >>> applications, you will find three different schemas. There doesn't >>> exists any application independent schema for invoices. It all depends >>> on the specific customer problem. >> IMO, that is very bad practice for a DBA because it precludes reuse of >> the data to solve other, different problems. > > My databases solves problems for my applications. Between > applications, data are exported and imported. So you live with the reuse problems and the need to keep revising schemas as new applications come online. > >> When one creates a Data Model, it should be based on the overall problem >> domain rather than specific problems within the domain. If you do that, >> then the same schema should be reusable by all the core accounting >> applications. > > One could also say that it only need to be one core accounting > application. The question is: How do you define the "overall problem", > and the "specific problem". You base almost all your argumentation on > such fuzzy terms, which makes everything derived from it rather fuzzy > too. I didn't say overall problem; my use of 'problem' was as an adjective. Problem domains are quite different things from problems. THe point was that Data models should be constructed based on how the overall business enviroment works, not how individual problems are solved. -- There is nothing wrong with me that could not be cured by a capful of Drano. H. S. Lahman hsl(a)pathfindermda.com Pathfinder Solutions http://www.pathfindermda.com blog: http://pathfinderpeople.blogs.com/hslahman "Model-Based Translation: The Next Step in Agile Development". Email info(a)pathfindermda.com for your copy. Pathfinder is hiring: http://www.pathfindermda.com/about_us/careers_pos3.php. (888)OOA-PATH
From: frebe on 18 Jan 2008 23:53 On 18 Jan, 21:19, jimbal...(a)yahoo.com wrote: > > > The app is for scheduling payments of invoices and has three grids > > > (regular DataGridViews): > > > A Vendor grid showing a summary view of each vendor (with total Due, > > > etc.) > > > An Invoice grid listing all invoices for the selected vendor > > > (including a total amount of payments scheduled for each invoice) > > > A Payment grid listing all payments scheduled for the selected > > > Invoice. > > > > Any payments added / removed or modifed in the payment grid (or mass- > > > updated via code) needs to be reflected instantly in the other two > > > grids. > > > > Note that the vendor grid (parent) is based on the same DB table as > > > the invoice grid (the children), but the vendor grid is a summary view > > > and the invoice grid is a detail view. > > > > The DB tables are: > > > Invoice (VendNo, InvNo, DueDate, Amount) > > > Payment (VendNo, InvNo, SeqNo, PayDate, PayAmount) > > > > The Vendor Grid should contain the equivalent of this simplified SQL > > > statement: > > > SELECT Invoice.VendNo, SUM(Invoice.BalDue) as TotalDue, > > > SUM(Payment.PayAmount) as TotalPay > > > FROM Invoice LEFT OUTER JOIN Payment ON > > > Invoice.VendNo = Payment.VendNo AND > > > Invoice.InvNo = Payment.InvNo > > > GROUP BY Invoice.VendNo > > > > I could of course do this as a view on the DB server and use that for > > > the vendor grid and use the Invoice table for the Invoice grid, but > > > how do you keep them in sync when payments are added / deleted or > > > modified from the invoices? Would I have to persist the data to the > > > DB and rebind the grids? (too slow) > > > After adding a payment, you obviously need to refresh the grids. How > > many millis does this take? How many millis does it take to execute > > the SQL? What is your time constraint? > > There are several issues with this. There are thousands of invoices > and the vendor grid is bound to an aggregate SQL statement of all > these invoices. Anytime the user changes a payment amount or date, > something like this would have to happen: How long time does it take to execute your aggregate SQL? > 1) The payment record is persisted to the DB > 2) The invoice grid (which is also bound to an aggregate SQL statement > or view [since it contains payment totals] would have to be rebound > and the SQL statement thus re-executed) > 3) The rebind above will cause the grid to lose its current row, so > you have to reposition using Find or similar. I don't think you should have a grid showing all invoices for one vendor. It is not practical in the GUI. > 4) The vendor grid (which is bound to an aggregate SQL statement / > view has to be rebound and the SQL thus re-executed) > 5) The vendor grid loses it place and have to be repositioned If the numbers of vendors isn't too high, I think this is acceptable. Otherwise you should not show all vendors in one grid. > 6) The invoice grid will lose it place again due to the vendor grid's > current record changing > > The above are not the exact steps and there is probably a better > sequence, but I think you get the point. > > I have a working application already using business objects. It was a > bit tricky and a bit messy with the filter related methods, but any > change to a payment is instantly reflected in all the other grids > without the need to persist & reload from DB, etc. (using the standard > databinding events of the .NET framework). What about payments made from other client computers? Are they reflected instantly too? > I have not dealt much with DataSets and I got the apparently errenous > impression that they would be a better tool for the job, but it seems > OOA/D is still the best way to go. Unless you tell the performance metrics for the SQL queries, it is hard to tell. //frebe
From: frebe on 19 Jan 2008 00:48 > > I solve particular problems using "the RDB query model" every day, and > > I have not noticed it being "very inefficient". > > That's because you are either doing CRUD/USER applications or you > haven't benchmarked your solutions... Once again, is the statemet below CRUD? select i.invoiceid from invoice i join payment p on i.invoiceid=p.invoiceid group by i.invoiceid having sum(p.amount) < i.amount and datediff(now(), i,duedate) >= 10 This is the kind of code I write every. Even though the number of invoices and payments are very high, the queries perform within a number of millis. The customer is happy, I am happy. Maybe you can show your code performing the same task. > >> That paradigm is fine for generic data storage and > >> access but searching large sets sucks for algorithmic processing. > > > SQL databases sucks for searching large data sets, come on... > > You don't deny my assertion that I can perform the same O(log N) > optimization in the implementation of a <reusable> collection class. Of course not. The difference is that you have to do it by yourself. I can reuse existing tools instead. That is the main difference between using a database and not using a database. > But > if OO relationships are instantiated at the object (tuple) level, then > the N in OO searches will usually be much smaller than the N in table > level searches. Since you qualify with "usually", this statement is pretty pointless. > So one /must/ be able to achieve more efficient searches > given OO's object-level instantiation. The price one pays for that > efficiency is that the object-level instantiation has to be hand-crafted > based on the particular problem context. Yes, I agree that by writing the corresponing code by your self, you might get faster applications than you would if you had used a SQL database. But "usually" development time, cost more than CPU time. And the only thing you have showed is that databases has complexity O(log A) and a tailored solution O(log B), there B < A. This is not a very big deal, compared to the "extra tons of keystrokes", you would have to do. > Why do you think that designers of large, complex applications spend so > much effort trying to minimize persistence access, regardless of whether > they are doing OO or not? They introduce elaborate caching schemes, > deliberately present themselves with major data integrity problems, and > whatnot because persistence access is almost always the performance > bottleneck of such applications. Does "insert into payment (invoiceid, date, amount) values (5, '2008-01-19', 500)" has anything to do with persitance? Caching is managed by the database. Caching is not the concern for the application designer. You are absolutely right that your solution introduces major data integrity problem. That is what I have been trying to show. That is why we should use a database, if possible. Persistence is problably the bottleneck for many kind of applications. That is why databases using caching. In a well-tuned database 98% of all calls, doesn't cause disk access. > When solving complex problems the same data is quite often accessed and > processed many times during the course of the solution. Is this the criteria for a "complex problem"? > If one used the > same accessing paradigm internally in the solution as the RDB uses the > application would brought to its knees. There are scenarios, there mainstream SQL databases wouldn't perform well. But the do excel in many application areas. The major problem with current SQL databases, is the limited set of index types that is used. B-trees is used as a one-size-fits-all solution. Obviously better support for other index types is wanted. I can imagine scenarios there foreign keys implemented as pointers would be a good thing too. As a matter of fact, I think such databases exists, even if I don't have time to find references to support this claim. > So the cardinal rule of complex application > development is to read the data once and write it once, no matter how > many times it must be accessed in the solution. Read and write once from what, disk? Or RAM? > >>>> But the 'n' in > >>>> O(log n) will usually be much smaller in the OO application because the > >>>> collections are object-based rather than class-based. > >>> Lets say you want to find all unpaid invoices. Why would the n be much > >>> smaller in a OO solution? > >> I said, "usually'. You are postulating a class-based search as a problem > >> requirement. > > > You might think that my example is too extreme, but isn't it good to > > use a method/tools that doesn't limit you to work on small amounts of > > data? > > Who is limited to working on small amounts of data?!? You complained about my example, since it was a class-based search, and not a search which could be solved with a limited number of objects in a collection. So why don't show how you would find all unpaind invoices? > Pointers are a red herring. They just provide a more efficient mechanism > for object (tuple) identity than embedded identifiers. This in generally not true for data mangement. Pointer-based databases hardly exists any more. Modern databases uses emedded identifiers. > My main problem is with table-based searches. Table-based searches are a > very general access mechanism that provides uniform access regardless of > usage context (exactly what one wants in enterprise data storage). You seem to contine to claim, that SQL database doesn't perform well enough. I think it is up to the reader to judge, if this is true or not, considering all existing applications using SQL in many different areas. > [The WHERE clause I used in an example is actually very rarely > seen in OO applications; it is a kind of last resort when the > requirements allow no more efficient alternative. And using a WHERE is > very likely to get close scrutiny by OOA/D reviewers.] Since WHERE is not supported by OOPLs, this is obviously true. > > Are producing invoices, "complex processing"? There are scenarios of > > data processing that are not supported very well by existing index > > types in current mainstream (SQL) databases. But the major part of all > > processing done in common business applications, perform very well > > using SQL databases, without having any custom data structures. What > > about all COBOL applications out there with extremly simple data > > structures, relying heavily on SQL statements to do the job. Doesn't > > they perform well enough? > > In most situations producing invoices is pretty straight forward. Order > + Lading Bill = Invoice. IOW, I would normally expect that to be > CRUD/USER processing. That's why companies buy OTS Accounts Receivables > packages rather than reinventing the wheel. I could continue to give you examples of common tasks from business applications, and you would classify them as CRUD as well. I think it is honest by you to make your disclaimer about OO solutions not being appropriate in all scenarios. You are almost the only one at comp.object, doing so. The problem is that knowing when something is CRUD or not. You claim that 20% of business applications out there are CRUD, I claim 80%. > > My databases solves problems for my applications. Between > > applications, data are exported and imported. > > So you live with the reuse problems and the need to keep revising > schemas as new applications come online. The last decade, I have been working as an indepent software provider. Usually I don't allow other applications to access databases for my applications. Most software companies like SAP etc, doesn't allow other applications to access their database either. If you allow access from other applications, you usually create tailored view for the access. > >> When one creates a Data Model, it should be based on the overall problem > >> domain rather than specific problems within the domain. If you do that, > >> then the same schema should be reusable by all the core accounting > >> applications. > > > One could also say that it only need to be one core accounting > > application. The question is: How do you define the "overall problem", > > and the "specific problem". You base almost all your argumentation on > > such fuzzy terms, which makes everything derived from it rather fuzzy > > too. > > I didn't say overall problem; my use of 'problem' was as an adjective. > Problem domains are quite different things from problems. THe point was > that Data models should be constructed based on how the overall business > enviroment works, not how individual problems are solved. The words "overall" and "individual" are such fuzzy in this context, that any argument based on them are rather pointless. //frebe
From: jimbalo22 on 19 Jan 2008 01:37
> I don't think you should have a grid showing all invoices for one > vendor. It is not practical in the GUI. The invoices shown are for the selected vendor AND day. This works well for the GUI. The user requirements are a bit complex, so I am trying to keep it simplified and focused to the main issues. But here is a brief description of data in each grid: Vendor grid: Contains various levels of summary info for ALL invoices and scheduled payments for each vendor. Invoice grid: Contains all invoices that are either due or have payments scheduled for the selected day, plus summary info of all payments for each invoice. Payment grid: All payments for the selected invoice. > What about payments made from other client computers? Are they > reflected instantly too? Payments are only scheduled by one person. In a multi-user application, there would obviously have to be some persistance to a shared medium (RDB or similar). Jim |