From: jimbalo22 on
> > The app is for scheduling payments of invoices and has three grids
> > (regular DataGridViews):
> > A Vendor grid showing a summary view of each vendor (with total Due,
> > etc.)
> > An Invoice grid listing all invoices for the selected vendor
> > (including a total amount of payments scheduled for each invoice)
> > A Payment grid listing all payments scheduled for the selected
> > Invoice.
>
> > Any payments added / removed or modifed in the payment grid (or mass-
> > updated via code) needs to be reflected instantly in the other two
> > grids.
>
> > Note that the vendor grid (parent) is based on the same DB table as
> > the invoice grid (the children), but the vendor grid is a summary view
> > and the invoice grid is a detail view.
>
> > The DB tables are:
> > Invoice (VendNo, InvNo, DueDate, Amount)
> > Payment (VendNo, InvNo, SeqNo, PayDate, PayAmount)
>
> > The Vendor Grid should contain the equivalent of this simplified SQL
> > statement:
> > SELECT Invoice.VendNo, SUM(Invoice.BalDue) as TotalDue,
> > SUM(Payment.PayAmount) as TotalPay
> > FROM Invoice LEFT OUTER JOIN Payment ON
> > Invoice.VendNo = Payment.VendNo AND
> > Invoice.InvNo = Payment.InvNo
> > GROUP BY Invoice.VendNo
>
> > I could of course do this as a view on the DB server and use that for
> > the vendor grid and use the Invoice table for the Invoice grid, but
> > how do you keep them in sync when payments are added / deleted or
> > modified from the invoices? Would I have to persist the data to the
> > DB and rebind the grids? (too slow)
>
> After adding a payment, you obviously need to refresh the grids. How
> many millis does this take? How many millis does it take to execute
> the SQL? What is your time constraint?
There are several issues with this. There are thousands of invoices
and the vendor grid is bound to an aggregate SQL statement of all
these invoices. Anytime the user changes a payment amount or date,
something like this would have to happen:
1) The payment record is persisted to the DB
2) The invoice grid (which is also bound to an aggregate SQL statement
or view [since it contains payment totals] would have to be rebound
and the SQL statement thus re-executed)
3) The rebind above will cause the grid to lose its current row, so
you have to reposition using Find or similar.
4) The vendor grid (which is bound to an aggregate SQL statement /
view has to be rebound and the SQL thus re-executed)
5) The vendor grid loses it place and have to be repositioned
6) The invoice grid will lose it place again due to the vendor grid's
current record changing

The above are not the exact steps and there is probably a better
sequence, but I think you get the point.

I have a working application already using business objects. It was a
bit tricky and a bit messy with the filter related methods, but any
change to a payment is instantly reflected in all the other grids
without the need to persist & reload from DB, etc. (using the standard
databinding events of the .NET framework).

I have not dealt much with DataSets and I got the apparently errenous
impression that they would be a better tool for the job, but it seems
OOA/D is still the best way to go.

Jim
From: H. S. Lahman on
Respondng to Frebe...

>>>>>> invoiceSet = this -> R1 WHERE (date=20080113)
>>>>>> FOREACH invoice IN invoiceSet
>>>>>> // process invoice objects
>>>>> or
>>>>> select invoiceid from invoice
>>>>> where customerid=? and date=20080113
>>>>>> paymentSet = this -> R1 -> R2 WHERE (date=20080113)
>>>>>> FOREACH payment IN paymentSet
>>>>>> // process payment objects
>>>>> or
>>>>> select paymentid from invoice i join payment p on
>>>>> i.invoiceid=p.invoiceid
>>>>> where customerid=? and p.date=20080113
>>>> Yes, they are quite similar. But apropos of your other message, that
>>>> just reflects that both are based on the relational model. However, the
>>>> models are quite different because the collection sets are object-based
>>>> in the AAL but they are table-based for the SQL. The set of invoices and
>>>> the set of payments examined in the AAL case will usually be much
>>>> smaller that the corresponding invoice and payment sets in the RDB.
>>> That is a limitation you have to do, because your solution will
>>> perform too bad otherwise. A relational database on the contrary isn't
>>> limited to only operate on small amounts of data.
>> We do this because using the RDB query model with table-based indices
>> would be very inefficient for memory-based computing when solving
>> particular problems.
>
> I solve particular problems using "the RDB query model" every day, and
> I have not noticed it being "very inefficient".

That's because you are either doing CRUD/USER applications or you
haven't benchmarked your solutions...

>> That paradigm is fine for generic data storage and
>> access but searching large sets sucks for algorithmic processing.
>
> SQL databases sucks for searching large data sets, come on...

By your own admission, queries rely on the DBMS providing O(log N)
search performance. You also have not denied that RDB indices are
instantiated at the n-ary relation (table) level. So N is <almost>
always the number of tuples in the table.

You don't deny my assertion that I can perform the same O(log N)
optimization in the implementation of a <reusable> collection class. But
if OO relationships are instantiated at the object (tuple) level, then
the N in OO searches will usually be much smaller than the N in table
level searches. So one /must/ be able to achieve more efficient searches
given OO's object-level instantiation. The price one pays for that
efficiency is that the object-level instantiation has to be hand-crafted
based on the particular problem context.

OTOH, the DBMS provides storage and generic access that is reusable
across different problem solutions no matter who "owns" the data. The
benefit of that is that all client solutions face the same access
paradigm, which is very general. But the price is that it will be
uniformly less efficient than if if were tailored on a case-by-case basis.

Why do you think that designers of large, complex applications spend so
much effort trying to minimize persistence access, regardless of whether
they are doing OO or not? They introduce elaborate caching schemes,
deliberately present themselves with major data integrity problems, and
whatnot because persistence access is almost always the performance
bottleneck of such applications.

When solving complex problems the same data is quite often accessed and
processed many times during the course of the solution. If one used the
same accessing paradigm internally in the solution as the RDB uses the
application would brought to its knees. (More precisely, it would not be
competitive in the marketplace with other applications that optimized
for the problem in hand.) So the cardinal rule of complex application
development is to read the data once and write it once, no matter how
many times it must be accessed in the solution. There's a reason for
that rule and seek time is just part of the problem.

>
>>>> But the 'n' in
>>>> O(log n) will usually be much smaller in the OO application because the
>>>> collections are object-based rather than class-based.
>>> Lets say you want to find all unpaid invoices. Why would the n be much
>>> smaller in a OO solution?
>> I said, "usually'. You are postulating a class-based search as a problem
>> requirement.
>
> You might think that my example is too extreme, but isn't it good to
> use a method/tools that doesn't limit you to work on small amounts of
> data?

Who is limited to working on small amounts of data?!? In an OO
application a single relationship between classes in a Class Model is
just implemented as many small relationships between class members
rather that as a single relationship for all members. The number of
members (tuples) that are related remains exactly the same.

>> [Note that I also never said one shouldn't take advantage of the RAD
>> facilities. One should do that in a client-side data access subsystem
>> where the RDB schema is relevant. But when customizing the problem
>> solution data structures one needs to provide the access to those data
>> structures anyway and one needs to do it efficiently through custom
>> tailoring.]
>
> Your main argument seem to be that the performance of a relational
> databases should be insufficient, compared to your pointer based
> solution. There are probably many scenarios there a pointer-based
> solution would perform O(1) and a relational O(log n). But history
> (and your previous posts in this thread) has showed numerous
> disadvantages with data management using pointers (network databases),
> and the benefits with the relational model are obvious. These days,
> nobody are proposing pointer-based databases, only when it comes to
> data in RAM, some people are still proposing this. What if RAM isn't
> big enough for the data needed in the processing? Recently I worked
> with an (OO) application loading data from database at startup. I
> guess I don't have to tell you the problems the applications faced.
> One was a startup-time of +6 hours. Another was dirty reads.

Pointers are a red herring. They just provide a more efficient mechanism
for object (tuple) identity than embedded identifiers. The situation is
analogous to using consecutive integers for identity rather than, say,
and ASCII name; it just enables more efficient access like arrays (i.e.,
a simple offset computation does not depend on the number of elements
while an element-by-element search/compare does).

My main problem is with table-based searches. Table-based searches are a
very general access mechanism that provides uniform access regardless of
usage context (exactly what one wants in enterprise data storage). But
much of the way the OO paradigm manages relationships is focused on
eliminating searches by tailoring the solution structure to the specific
problem. [The WHERE clause I used in an example is actually very rarely
seen in OO applications; it is a kind of last resort when the
requirements allow no more efficient alternative. And using a WHERE is
very likely to get close scrutiny by OOA/D reviewers.]

>
>> A SQL database allows him to construct CRUD/USER applications quickly
>> because the only problem being solved is data view conversion. That is
>> exactly what the Form/Query/Table RAD paradigm is designed to largely
>> automate and it does a good job of it.
>>
>> But when you have a problem to solve that requires complex processing of
>> the data, you have to optimize to the problem and provide custom data
>> structures. IOW, you are going to have to provide unique access because
>> the mappings are not the same as in the RDB schema. That's what
>> application design is all about outside of CRUD/USER contexts.
>
> Are producing invoices, "complex processing"? There are scenarios of
> data processing that are not supported very well by existing index
> types in current mainstream (SQL) databases. But the major part of all
> processing done in common business applications, perform very well
> using SQL databases, without having any custom data structures. What
> about all COBOL applications out there with extremly simple data
> structures, relying heavily on SQL statements to do the job. Doesn't
> they perform well enough?

In most situations producing invoices is pretty straight forward. Order
+ Lading Bill = Invoice. IOW, I would normally expect that to be
CRUD/USER processing. That's why companies buy OTS Accounts Receivables
packages rather than reinventing the wheel.

>>> Your claim that a relational schema is designed for "data storage that
>>> is independent of the applications", is debatable. When I design
>>> schemas, I design them to fit the specific customer problem. Why
>>> wouldn't I? If you look at the schemas for three different invocing
>>> applications, you will find three different schemas. There doesn't
>>> exists any application independent schema for invoices. It all depends
>>> on the specific customer problem.
>> IMO, that is very bad practice for a DBA because it precludes reuse of
>> the data to solve other, different problems.
>
> My databases solves problems for my applications. Between
> applications, data are exported and imported.

So you live with the reuse problems and the need to keep revising
schemas as new applications come online.

>
>> When one creates a Data Model, it should be based on the overall problem
>> domain rather than specific problems within the domain. If you do that,
>> then the same schema should be reusable by all the core accounting
>> applications.
>
> One could also say that it only need to be one core accounting
> application. The question is: How do you define the "overall problem",
> and the "specific problem". You base almost all your argumentation on
> such fuzzy terms, which makes everything derived from it rather fuzzy
> too.

I didn't say overall problem; my use of 'problem' was as an adjective.
Problem domains are quite different things from problems. THe point was
that Data models should be constructed based on how the overall business
enviroment works, not how individual problems are solved.



--
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
hsl(a)pathfindermda.com
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
info(a)pathfindermda.com for your copy.
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
From: frebe on
On 18 Jan, 21:19, jimbal...(a)yahoo.com wrote:
> > > The app is for scheduling payments of invoices and has three grids
> > > (regular DataGridViews):
> > > A Vendor grid showing a summary view of each vendor (with total Due,
> > > etc.)
> > > An Invoice grid listing all invoices for the selected vendor
> > > (including a total amount of payments scheduled for each invoice)
> > > A Payment grid listing all payments scheduled for the selected
> > > Invoice.
>
> > > Any payments added / removed or modifed in the payment grid (or mass-
> > > updated via code) needs to be reflected instantly in the other two
> > > grids.
>
> > > Note that the vendor grid (parent) is based on the same DB table as
> > > the invoice grid (the children), but the vendor grid is a summary view
> > > and the invoice grid is a detail view.
>
> > > The DB tables are:
> > > Invoice (VendNo, InvNo, DueDate, Amount)
> > > Payment (VendNo, InvNo, SeqNo, PayDate, PayAmount)
>
> > > The Vendor Grid should contain the equivalent of this simplified SQL
> > > statement:
> > > SELECT Invoice.VendNo, SUM(Invoice.BalDue) as TotalDue,
> > > SUM(Payment.PayAmount) as TotalPay
> > > FROM Invoice LEFT OUTER JOIN Payment ON
> > >  Invoice.VendNo = Payment.VendNo AND
> > >  Invoice.InvNo = Payment.InvNo
> > > GROUP BY Invoice.VendNo
>
> > > I could of course do this as a view on the DB server and use that for
> > > the vendor grid and use the Invoice table for the Invoice grid, but
> > > how do you keep them in sync when payments are added / deleted or
> > > modified from the invoices?  Would I have to persist the data to the
> > > DB and rebind the grids? (too slow)
>
> > After adding a payment, you obviously need to refresh the grids. How
> > many millis does this take? How many millis does it take to execute
> > the SQL? What is your time constraint?
>
> There are several issues with this.  There are thousands of invoices
> and the vendor grid is bound to an aggregate SQL statement of all
> these invoices.  Anytime the user changes a payment amount or date,
> something like this would have to happen:

How long time does it take to execute your aggregate SQL?

> 1) The payment record is persisted to the DB
> 2) The invoice grid (which is also bound to an aggregate SQL statement
> or view [since it contains payment totals] would have to be rebound
> and the SQL statement thus re-executed)
> 3) The rebind above will cause the grid to lose its current row, so
> you have to reposition using Find or similar.

I don't think you should have a grid showing all invoices for one
vendor. It is not practical in the GUI.

> 4) The vendor grid (which is bound to an aggregate SQL statement /
> view has to be rebound and the SQL thus re-executed)
> 5) The vendor grid loses it place and have to be repositioned

If the numbers of vendors isn't too high, I think this is acceptable.
Otherwise you should not show all vendors in one grid.

> 6) The invoice grid will lose it place again due to the vendor grid's
> current record changing
>
> The above are not the exact steps and there is probably a better
> sequence, but I think you get the point.
>
> I have a working application already using business objects.  It was a
> bit tricky and a bit messy with the filter related methods, but any
> change to a payment is instantly reflected in all the other grids
> without the need to persist & reload from DB, etc. (using the standard
> databinding events of the .NET framework).

What about payments made from other client computers? Are they
reflected instantly too?

> I have not dealt much with DataSets and I got the apparently errenous
> impression that they would be a better tool for the job, but it seems
> OOA/D is still the best way to go.

Unless you tell the performance metrics for the SQL queries, it is
hard to tell.

//frebe
From: frebe on
> > I solve particular problems using "the RDB query model" every day, and
> > I have not noticed it being "very inefficient".
>
> That's because you are either doing CRUD/USER applications or you
> haven't benchmarked your solutions...

Once again, is the statemet below CRUD?

select i.invoiceid
from invoice i join payment p on i.invoiceid=p.invoiceid
group by i.invoiceid
having sum(p.amount) < i.amount and datediff(now(), i,duedate) >= 10

This is the kind of code I write every. Even though the number of
invoices and payments are very high, the queries perform within a
number of millis. The customer is happy, I am happy.

Maybe you can show your code performing the same task.

> >> That paradigm is fine for generic data storage and
> >> access but searching large sets sucks for algorithmic processing.
>
> > SQL databases sucks for searching large data sets, come on...
>
> You don't deny my assertion that I can perform the same O(log N)
> optimization in the implementation of a <reusable> collection class.

Of course not. The difference is that you have to do it by yourself. I
can reuse existing tools instead. That is the main difference between
using a database and not using a database.

> But
> if OO relationships are instantiated at the object (tuple) level, then
> the N in OO searches will usually be much smaller than the N in table
> level searches.

Since you qualify with "usually", this statement is pretty pointless.

> So one /must/ be able to achieve more efficient searches
> given OO's object-level instantiation. The price one pays for that
> efficiency is that the object-level instantiation has to be hand-crafted
> based on the particular problem context.

Yes, I agree that by writing the corresponing code by your self, you
might get faster applications than you would if you had used a SQL
database.

But "usually" development time, cost more than CPU time. And the only
thing you have showed is that databases has complexity O(log A) and a
tailored solution O(log B), there B < A. This is not a very big deal,
compared to the "extra tons of keystrokes", you would have to do.

> Why do you think that designers of large, complex applications spend so
> much effort trying to minimize persistence access, regardless of whether
> they are doing OO or not? They introduce elaborate caching schemes,
> deliberately present themselves with major data integrity problems, and
> whatnot because persistence access is almost always the performance
> bottleneck of such applications.

Does
"insert into payment (invoiceid, date, amount) values (5,
'2008-01-19', 500)"
has anything to do with persitance?

Caching is managed by the database. Caching is not the concern for the
application designer.

You are absolutely right that your solution introduces major data
integrity problem. That is what I have been trying to show. That is
why we should use a database, if possible.

Persistence is problably the bottleneck for many kind of applications.
That is why databases using caching. In a well-tuned database 98% of
all calls, doesn't cause disk access.

> When solving complex problems the same data is quite often accessed and
> processed many times during the course of the solution.

Is this the criteria for a "complex problem"?

> If one used the
> same accessing paradigm internally in the solution as the RDB uses the
> application would brought to its knees.

There are scenarios, there mainstream SQL databases wouldn't perform
well. But the do excel in many application areas. The major problem
with current SQL databases, is the limited set of index types that is
used. B-trees is used as a one-size-fits-all solution. Obviously
better support for other index types is wanted. I can imagine
scenarios there foreign keys implemented as pointers would be a good
thing too. As a matter of fact, I think such databases exists, even if
I don't have time to find references to support this claim.

> So the cardinal rule of complex application
> development is to read the data once and write it once, no matter how
> many times it must be accessed in the solution.

Read and write once from what, disk? Or RAM?

> >>>> But the 'n' in
> >>>> O(log n) will usually be much smaller in the OO application because the
> >>>> collections are object-based rather than class-based.
> >>> Lets say you want to find all unpaid invoices. Why would the n be much
> >>> smaller in a OO solution?
> >> I said, "usually'. You are postulating a class-based search as a problem
> >> requirement.
>
> > You might think that my example is too extreme, but isn't it good to
> > use a method/tools that doesn't limit you to work on small amounts of
> > data?
>
> Who is limited to working on small amounts of data?!?

You complained about my example, since it was a class-based search,
and not a search which could be solved with a limited number of
objects in a collection. So why don't show how you would find all
unpaind invoices?

> Pointers are a red herring. They just provide a more efficient mechanism
> for object (tuple) identity than embedded identifiers.

This in generally not true for data mangement. Pointer-based databases
hardly exists any more. Modern databases uses emedded identifiers.

> My main problem is with table-based searches. Table-based searches are a
> very general access mechanism that provides uniform access regardless of
> usage context (exactly what one wants in enterprise data storage).

You seem to contine to claim, that SQL database doesn't perform well
enough. I think it is up to the reader to judge, if this is true or
not, considering all existing applications using SQL in many different
areas.

> [The WHERE clause I used in an example is actually very rarely
> seen in OO applications; it is a kind of last resort when the
> requirements allow no more efficient alternative. And using a WHERE is
> very likely to get close scrutiny by OOA/D reviewers.]

Since WHERE is not supported by OOPLs, this is obviously true.

> > Are producing invoices, "complex processing"? There are scenarios of
> > data processing that are not supported very well by existing index
> > types in current mainstream (SQL) databases. But the major part of all
> > processing done in common business applications, perform very well
> > using SQL databases, without having any custom data structures. What
> > about all COBOL applications out there with extremly simple data
> > structures, relying heavily on SQL statements to do the job. Doesn't
> > they perform well enough?
>
> In most situations producing invoices is pretty straight forward. Order
> + Lading Bill = Invoice. IOW, I would normally expect that to be
> CRUD/USER processing. That's why companies buy OTS Accounts Receivables
> packages rather than reinventing the wheel.

I could continue to give you examples of common tasks from business
applications, and you would classify them as CRUD as well. I think it
is honest by you to make your disclaimer about OO solutions not being
appropriate in all scenarios. You are almost the only one at
comp.object, doing so. The problem is that knowing when something is
CRUD or not. You claim that 20% of business applications out there are
CRUD, I claim 80%.

> > My databases solves problems for my applications. Between
> > applications, data are exported and imported.
>
> So you live with the reuse problems and the need to keep revising
> schemas as new applications come online.

The last decade, I have been working as an indepent software provider.
Usually I don't allow other applications to access databases for my
applications. Most software companies like SAP etc, doesn't allow
other applications to access their database either.

If you allow access from other applications, you usually create
tailored view for the access.

> >> When one creates a Data Model, it should be based on the overall problem
> >> domain rather than specific problems within the domain. If you do that,
> >> then the same schema should be reusable by all the core accounting
> >> applications.
>
> > One could also say that it only need to be one core accounting
> > application. The question is: How do you define the "overall problem",
> > and the "specific problem". You base almost all your argumentation on
> > such fuzzy terms, which makes everything derived from it rather fuzzy
> > too.
>
> I didn't say overall problem; my use of 'problem' was as an adjective.
> Problem domains are quite different things from problems. THe point was
> that Data models should be constructed based on how the overall business
> enviroment works, not how individual problems are solved.

The words "overall" and "individual" are such fuzzy in this context,
that any argument based on them are rather pointless.

//frebe
From: jimbalo22 on
> I don't think you should have a grid showing all invoices for one
> vendor. It is not practical in the GUI.
The invoices shown are for the selected vendor AND day. This works
well for the GUI. The user requirements are a bit complex, so I am
trying to keep it simplified and focused to the main issues. But here
is a brief description of data in each grid:
Vendor grid: Contains various levels of summary info for ALL invoices
and scheduled payments for each vendor.
Invoice grid: Contains all invoices that are either due or have
payments scheduled for the selected day, plus summary info of all
payments for each invoice.
Payment grid: All payments for the selected invoice.

> What about payments made from other client computers? Are they
> reflected instantly too?
Payments are only scheduled by one person. In a multi-user
application, there would obviously have to be some persistance to a
shared medium (RDB or similar).

Jim