From: frebe on
> > The problem is to create reminders for all invoices are not payed and
> > due date is passed since 10 days. The solution in SQL
> > could look like.
>
> > insert into reminder (invoiceid, remind_date)
> > select i.invoiceid, now()
> > from invoice i join payment p on i.invoiceid=p.invoiceid
> > group by i.invoiceid
> > having sum(p.amount) < i.amount and datediff(now(), i,duedate) >= 10
>
> > If you want to solve the problem another way, you are welcome. There
> > are of course other alternatives. The question is if they are better,
> > or not.
>
> >> IOW, you are providing a common generic data access query problem and
> >> asking for an OO equivalent. But I'm talking about solving customer
> >> specific problems, not generic data storage access problems.
>
> > This is a customer specific problem. The problem has very little to do
> > with storage.
>
> But you are not solving some problem for the customer.

Of course I am. Reminders are created. There are a lot of application
out there, not having a UI.

> All you are doing
> is presenting data that already exists in the data store in a convenient
> format for the user (i.e., the notification).

Did you notice the "now()". Data is created. Reminders may be created
during a night batch. How they are presented are a completely
different story. The application might send an email to the customer,
print the reminder on a paper, or display on the screen. Data is one
thing, presentation is another.

> The techniques used for view conversion are specialized to that
> particular <software> problem. In particular, they are specialized to
> the generic data storage access provided by an RDB. IOW, you query is
> about /extracting/ the data and /presenting/ it, but it doesn't /do/
> anything with the data.

It does. The reminders reflect the time that the reminder batch was
executed. If I run the batch at different times, I will have different
data. It has nothing to do with presentation.

> Now suppose we need to perform an analysis of accounts payables that
> forecasts how many accounts will be paid late, what the total late
> payments will be, and what the probability is that each individual
> customer will be late. To do that one would need credit information
> about the customers and a historical track record of past payments.

This would probably be possible to do with a number of views.

> To obtain the track record one needs to extract the historical payment
> data from the data store. One might use a query that looks very much
> like yours in the subsystem dedicated to talking to the DBMS. But once
> that data is extracted it will be stored in objects that are tailored to
> the regression and probabilistic analysis. IOW, the real problem
> solution requires different data structures and will use different
> techniques to solve the customer's problem.

I don't know if SQL databases would perform good enough for the
calculations you have in mind. But you must not forget that there are
a lot of data mining out there, that relies heavily on SQL.

> The extraction of the data
> from the DBMS is quite peripheral to that and would be isolated from the
> rest of the problem solution by encapsulation in a subsystem that hides
> the DBMS mechanisms.

This is only the case if SQL database really can't do the job.

> To put it another way, the non-CRUD/USER problem starts after the data
> is extracted from the database using a query like yours.

Your definition of non-CRUD/USER seem to be: Anything that can't be
solved with SQL. I could agree to this definition, but I just want to
point out that pretty much can be solved with SQL.

> The analogous
> problem solved compared to your example is what the customer does with
> the notifications after they have been produced by your query.

That is presentation and communication. In most cases, SQL is about
business logic, and the host language do the presentation and
communication.

> >> The OO
> >> solution would be tailored to a specific problem so it would very rarely
> >> have need of that approach.
>
> > The solution (SQL code) is tailored to the specific problem.
>
> >> That's kind of the point. The "algorithms" for CRUD/USER processing are
> >> very similar because they all solve the same problem: converting RDB
> >> views of the data to UI views and vice versa.
>
> > My example aboive has nothing to do with UI.
>
> Those notifications /are/ the UI; they are the interface through which
> the application communicates with the user.

In that case a lot of other things like invoices, bus tickets,
receipts are UI too. Most data is supposed to be read by someone, so
most data will end up at the UI sooner or later.

> > 1. You need a user-defined function in your SQL statements. (select
> > age(birthdate) from employee)
>
> You will have to put more words around that. Any selection one does in
> SQL is based on values for schema fields.

Or derived values, like "select * from employee where age(birthdate) >
60".

> > 2. Performance reasons. By putting "complex processing" inside the
> > DBMS, you can eliminate the inter-process overhead and have major
> > performance improvments. Browse the source code for one of the TPC
> > implementations, and you will see how they do it.
>
> But the only thing being optimized is the data access _given the RDB
> storage paradigm_. The DBMS has no capability that can optimize for
> particular problem solutions.

My example showed the opposite.

> Try this question: can you identify any piece of information in the
> notification that is not a field in the RDB or is not directly derived
> from fields in the RDB as a dependent variable?

Yes, the reminder date.

//frebe
From: H. S. Lahman on
Responding to Frebe...

>>> The problem is to create reminders for all invoices are not payed and
>>> due date is passed since 10 days. The solution in SQL
>>> could look like.
>>> insert into reminder (invoiceid, remind_date)
>>> select i.invoiceid, now()
>>> from invoice i join payment p on i.invoiceid=p.invoiceid
>>> group by i.invoiceid
>>> having sum(p.amount) < i.amount and datediff(now(), i,duedate) >= 10
>>> If you want to solve the problem another way, you are welcome. There
>>> are of course other alternatives. The question is if they are better,
>>> or not.
>>>> IOW, you are providing a common generic data access query problem and
>>>> asking for an OO equivalent. But I'm talking about solving customer
>>>> specific problems, not generic data storage access problems.
>>> This is a customer specific problem. The problem has very little to do
>>> with storage.
>> But you are not solving some problem for the customer.
>
> Of course I am. Reminders are created. There are a lot of application
> out there, not having a UI.
>
>> All you are doing
>> is presenting data that already exists in the data store in a convenient
>> format for the user (i.e., the notification).
>
> Did you notice the "now()". Data is created. Reminders may be created
> during a night batch. How they are presented are a completely
> different story. The application might send an email to the customer,
> print the reminder on a paper, or display on the screen. Data is one
> thing, presentation is another.

And where does now() get its value? One way would be to read an RDB
field that is updated by the DBMS in the background. Whether the value
is in an RDB data store or in the OS' memory data store doesn't matter.
Your query is extracting data from a data store and presenting it.

Nor does the fact that the reminder might be placed in intermediate
storage temporarily before direct presentation matter. That is just part
of the view conversion mechanics. (Note that you could also sort the
reminders before presentation and that would still be a basic USER
activity.) What is ultimately presented to the user is simply the
extracted data without significant manipulation other that view conversion.

>> Now suppose we need to perform an analysis of accounts payables that
>> forecasts how many accounts will be paid late, what the total late
>> payments will be, and what the probability is that each individual
>> customer will be late. To do that one would need credit information
>> about the customers and a historical track record of past payments.
>
> This would probably be possible to do with a number of views.

Possible but highly inefficient, which is <one of> my point.

>> To obtain the track record one needs to extract the historical payment
>> data from the data store. One might use a query that looks very much
>> like yours in the subsystem dedicated to talking to the DBMS. But once
>> that data is extracted it will be stored in objects that are tailored to
>> the regression and probabilistic analysis. IOW, the real problem
>> solution requires different data structures and will use different
>> techniques to solve the customer's problem.
>
> I don't know if SQL databases would perform good enough for the
> calculations you have in mind. But you must not forget that there are
> a lot of data mining out there, that relies heavily on SQL.

The subject matter of data mining is the RDB view. The goal of data
mining is to sift through great gobs of data to extract that which is of
interest. Since the data of interest may be quite arbitrary, one needs a
very generic mechanism -- exactly what complex table joins and searches
provide.

But there is a big difference between designing a join and executing it.
The algorithms used to design those joins for data mining can be quite
sophisticated. In that case one leaves the realm of CRUD/USER and the
resulting queries are simply a uniquely formatted text output of a more
complex process. IOW, the execution of the queries becomes a peripheral
activity like printing a report.

>> The extraction of the data
>> from the DBMS is quite peripheral to that and would be isolated from the
>> rest of the problem solution by encapsulation in a subsystem that hides
>> the DBMS mechanisms.
>
> This is only the case if SQL database really can't do the job.

It's not a matter of 'can't'; it is a matter of 'should'.

>
>> To put it another way, the non-CRUD/USER problem starts after the data
>> is extracted from the database using a query like yours.
>
> Your definition of non-CRUD/USER seem to be: Anything that can't be
> solved with SQL. I could agree to this definition, but I just want to
> point out that pretty much can be solved with SQL.

No, that's not my definition. As I have already stated non-CRUD/USER
processing is any processing that requires complex data manipulations
beyond a simple sequence of creating, sorting, updating, deleting,
extracting, or reporting data.

>> The analogous
>> problem solved compared to your example is what the customer does with
>> the notifications after they have been produced by your query.
>
> That is presentation and communication. In most cases, SQL is about
> business logic, and the host language do the presentation and
> communication.

Say what?!? How does SQL know anything about the rules and policies that
govern a business? SQL is about creating, sorting, updating, deleting,
and extracting data in a persistent data store. More important, it is
about doing those things generically so that they don't depend on the
particular problem context where the data is used (i.e., the business
rules and policies for manipulating the data to solve specific problems).

Show me a large application with a lot of business logic being buried in
SQL queries and I will show you an application that is difficult to
maintain because it is bleeding cohesion across subject matter boundaries.

>>>> The OO
>>>> solution would be tailored to a specific problem so it would very rarely
>>>> have need of that approach.
>>> The solution (SQL code) is tailored to the specific problem.
>>>> That's kind of the point. The "algorithms" for CRUD/USER processing are
>>>> very similar because they all solve the same problem: converting RDB
>>>> views of the data to UI views and vice versa.
>>> My example aboive has nothing to do with UI.
>> Those notifications /are/ the UI; they are the interface through which
>> the application communicates with the user.
>
> In that case a lot of other things like invoices, bus tickets,
> receipts are UI too. Most data is supposed to be read by someone, so
> most data will end up at the UI sooner or later.

That last sentence is a real tell. If your view of applications is to
enable data in a database to be read by someone, that is quintessential
CRUD/USER. The primary differentiator for non-CRUD/USER is that the
software processes the data in the data store in complex ways so that
the user can be presented with different data that is not in the data
store (or trivially derived from it).

>>> 1. You need a user-defined function in your SQL statements. (select
>>> age(birthdate) from employee)
>> You will have to put more words around that. Any selection one does in
>> SQL is based on values for schema fields.
>
> Or derived values, like "select * from employee where age(birthdate) >
> 60".

Pretty much my point. Birthdate is a schema field and 60 is the
parametric value for the selection. The age macro is just a red herring
to derive an age (dependent variable) based on birthdate. One could have
encoded "(currentDate - birthdate) > 60".

>>> 2. Performance reasons. By putting "complex processing" inside the
>>> DBMS, you can eliminate the inter-process overhead and have major
>>> performance improvments. Browse the source code for one of the TPC
>>> implementations, and you will see how they do it.
>> But the only thing being optimized is the data access _given the RDB
>> storage paradigm_. The DBMS has no capability that can optimize for
>> particular problem solutions.
>
> My example showed the opposite.

I don't think it did for the reasons above. Quite the opposite, in fact.

>> Try this question: can you identify any piece of information in the
>> notification that is not a field in the RDB or is not directly derived
>> from fields in the RDB as a dependent variable?
>
> Yes, the reminder date.

If that is the best you can come up to demonstrate complex processing
beyond view conversions, you are in deep trouble.

However, I am signing off here. I am just repeating the same things I
have already said.



--
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
hsl(a)pathfindermda.com
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
info(a)pathfindermda.com for your copy.
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
From: frebe on
> >>> The problem is to create reminders for all invoices are not payed and
> >>> due date is passed since 10 days. The solution in SQL
> >>> could look like.
> >>> insert into reminder (invoiceid, remind_date)
> >>> select i.invoiceid, now()
> >>> from invoice i join payment p on i.invoiceid=p.invoiceid
> >>> group by i.invoiceid
> >>> having sum(p.amount) < i.amount and datediff(now(), i,duedate) >= 10
> >>> If you want to solve the problem another way, you are welcome. There
> >>> are of course other alternatives. The question is if they are better,
> >>> or not.
> >>>> IOW, you are providing a common generic data access query problem and
> >>>> asking for an OO equivalent. But I'm talking about solving customer
> >>>> specific problems, not generic data storage access problems.
> >>> This is a customer specific problem. The problem has very little to do
> >>> with storage.
> >> But you are not solving some problem for the customer.
>
> > Of course I am. Reminders are created. There are a lot of application
> > out there, not having a UI.
>
> >> All you are doing
> >> is presenting data that already exists in the data store in a convenient
> >> format for the user (i.e., the notification).
>
> > Did you notice the "now()". Data is created. Reminders may be created
> > during a night batch. How they are presented are a completely
> > different story. The application might send an email to the customer,
> > print the reminder on a paper, or display on the screen. Data is one
> > thing, presentation is another.
>
> And where does now() get its value?

The system clock, I guess.

> One way would be to read an RDB
> field that is updated by the DBMS in the background. Whether the value
> is in an RDB data store or in the OS' memory data store doesn't matter.
> Your query is extracting data from a data store and presenting it.

You are trying to twist the universe to fit your weird definitions.
now() is a function as any other. It is a simple function, but any
function can be used in relational algebra.

> Nor does the fact that the reminder might be placed in intermediate
> storage temporarily before direct presentation matter. That is just part
> of the view conversion mechanics. (Note that you could also sort the
> reminders before presentation and that would still be a basic USER
> activity.) What is ultimately presented to the user is simply the
> extracted data without significant manipulation other that view conversion.

My example doesn't have "significant manipulation" because I want to
giving a simple example still showing the point. You base almost all
your argumentation on fuzzy definitions like "significant
manipulation", "complex processing", "CRUD/USER realm", which makes
anything derived from such definitions pretty fuzzy and useless too.

> >> Now suppose we need to perform an analysis of accounts payables that
> >> forecasts how many accounts will be paid late, what the total late
> >> payments will be, and what the probability is that each individual
> >> customer will be late. To do that one would need credit information
> >> about the customers and a historical track record of past payments.
>
> > This would probably be possible to do with a number of views.
>
> Possible but highly inefficient, which is <one of> my point.

If it would be inefficient or not, is still not showed.

> >> To obtain the track record one needs to extract the historical payment
> >> data from the data store. One might use a query that looks very much
> >> like yours in the subsystem dedicated to talking to the DBMS. But once
> >> that data is extracted it will be stored in objects that are tailored to
> >> the regression and probabilistic analysis. IOW, the real problem
> >> solution requires different data structures and will use different
> >> techniques to solve the customer's problem.
>
> > I don't know if SQL databases would perform good enough for the
> > calculations you have in mind. But you must not forget that there are
> > a lot of data mining out there, that relies heavily on SQL.
>
> The subject matter of data mining is the RDB view. The goal of data
> mining is to sift through great gobs of data to extract that which is of
> interest.

Data mining is also about data derivation, not only about extraction.
Relational algebra can be used for data derivation almost beyond
imagination.

> But there is a big difference between designing a join and executing it.

Yes, executing joins is what a DBMS do.

> >> To put it another way, the non-CRUD/USER problem starts after the data
> >> is extracted from the database using a query like yours.
>
> > Your definition of non-CRUD/USER seem to be: Anything that can't be
> > solved with SQL. I could agree to this definition, but I just want to
> > point out that pretty much can be solved with SQL.
>
> No, that's not my definition. As I have already stated non-CRUD/USER
> processing is any processing that requires complex data manipulations
> beyond a simple sequence of creating, sorting, updating, deleting,
> extracting, or reporting data.

Since "select sum(amount) from invoice" is not about sorting nor
extracting, I guess it is "complex processing".

> >> The analogous
> >> problem solved compared to your example is what the customer does with
> >> the notifications after they have been produced by your query.
>
> > That is presentation and communication. In most cases, SQL is about
> > business logic, and the host language do the presentation and
> > communication.
>
> Say what?!? How does SQL know anything about the rules and policies that
> govern a business?

In the same way as C++ or Java knows about it.

> SQL is about creating, sorting, updating, deleting,
> and extracting data in a persistent data store.

SQL is a relational language which can also do a lot of data
derivation.

> > In that case a lot of other things like invoices, bus tickets,
> > receipts are UI too. Most data is supposed to be read by someone, so
> > most data will end up at the UI sooner or later.
>
> That last sentence is a real tell. If your view of applications is to
> enable data in a database to be read by someone, that is quintessential
> CRUD/USER. The primary differentiator for non-CRUD/USER is that the
> software processes the data in the data store in complex ways so that
> the user can be presented with different data that is not in the data
> store (or trivially derived from it).

SQL is able to much more than trivial derivations.

> >>> 1. You need a user-defined function in your SQL statements. (select
> >>> age(birthdate) from employee)
> >> You will have to put more words around that. Any selection one does in
> >> SQL is based on values for schema fields.
>
> > Or derived values, like "select * from employee where age(birthdate) >
> > 60".
>
> Pretty much my point. Birthdate is a schema field and 60 is the
> parametric value for the selection. The age macro is just a red herring
> to derive an age (dependent variable) based on birthdate. One could have
> encoded "(currentDate - birthdate) > 60".

But the function could also be much more complex.

> >> Try this question: can you identify any piece of information in the
> >> notification that is not a field in the RDB or is not directly derived
> >> from fields in the RDB as a dependent variable?
>
> > Yes, the reminder date.
>
> If that is the best you can come up to demonstrate complex processing
> beyond view conversions, you are in deep trouble.

No, but you very asking for a piece of information that is not a field
in the RDB, in an attempt to defend your position. I gave you an
example which contradicted your claim. The fact that the example is
simple, doesn't make any difference.

// frebe
From: AndyW on
On Mon, 21 Jan 2008 23:48:09 -0800 (PST), frebe <frebe73(a)gmail.com>
wrote:

>> > select i.invoiceid
>> > from invoice i join payment p on i.invoiceid=p.invoiceid
>> > group by i.invoiceid
>> > having sum(p.amount) < i.amount and datediff(now(), i,duedate) >= 10
>>
>> > This is the kind of code I write every. Even though the number of
>> > invoices and payments are very high, the queries perform within a
>> > number of millis. The customer is happy, I am happy.
>>
>> Once again, if that is all you are doing, that is CRUD/USER processing;
>> you are just moving piles of data back and forth between the UI and the RDB.
>>
>> > Maybe you can show your code performing the same task.
>>
>> You want code for inventory forecasts? A Linear programs to allocate
>> advertising budget to various markets' media? A simulation model of
>> atmospheric diffusion?
>
>No. I was asking for the OO equivalence for the SQL statement above.

There isnt one. But there is a way to achieve the end result -
locating an invoice.

One has to remember that OO isnt about lists of related data as
relational DB are. Its about collections of unique objects.

To me in OO every instance of an object is unique and that I feel
means if one knows enough of the attributes one can go directly to the
item without searching for it (in the rare case there are duplicate
items they are reference counted). In close appromimation to
relational, I think its like knowing the primary key but in OO its a
composite key based on the sum of all of the attributes and methods at
a given point in time (this means that an objects identity can
change).

In OO, I would suggest that we also dont have collections of related
data, we have schemas of objects (a set of rules that define the
collection). Schema can also be composite. In the SQL statement above
as I interpreted it one has a collection of invoices each with
possibly a collection of payments. The most ideal way of searching
is to go for the one that is easier to find - so if one knows the
payment find that, then transverse to its invoice (like using a
foreign key). In other words, sometime its better to reverse the
search order given the information known.

I think that things get slightly wacky tho in OO when one starts
dealing bulk operations (such as updating lots of data) because of
several techniques.

The first one is that since objects know who they are, one doesn't
need to find them, one can just ask them to do things (such as updates
) for themselves. Think of it like Bob going into a crowded hall and
shouting out, can Wilma of 43 someplace street please go to the
information kiosk fill in the appropriate form. Then a short time
later, a faint voice shouts back, ok I am done what next. (this is the
observer pattern)

So to me in reference to the select statment above, the request is
simply - can everyone who has a sum of money that is now due for
payment, please now execute their request payment routines (remember,
objects contain methods based on their classification as well). If
one were being smart one could actually provide them a method
invocation to execute - which is called dynamic method invocation.

So I think the techniques in OO are that objects can find themselves,
objects can manipulate themselves and it can all work in parallel.


b.t.w I like to think that CRUD in OO does not involve moving data.
The data is quite capable of moving itself :)





----------------
AndyW,
Mercenary Software Developer
From: frebe on
> >> > select i.invoiceid
> >> > from invoice i join payment p on i.invoiceid=p.invoiceid
> >> > group by i.invoiceid
> >> > having sum(p.amount) < i.amount and datediff(now(), i,duedate) >= 10
>
> >> > This is the kind of code I write every. Even though the number of
> >> > invoices and payments are very high, the queries perform within a
> >> > number of millis. The customer is happy, I am happy.
>
> >> Once again, if that is all you are doing, that is CRUD/USER processing;
> >> you are just moving piles of data back and forth between the UI and the RDB.
>
> >> > Maybe you can show your code performing the same task.
>
> >> You want code for inventory forecasts? A Linear programs to allocate
> >> advertising budget to various markets' media? A simulation model of
> >> atmospheric diffusion?
>
> >No. I was asking for the OO equivalence for the SQL statement above.
>
> There isnt one.   But there is a way to achieve the end result -
> locating an invoice.

The end result in my query is to find invoices, so please show the OO
equivalence.

> One has to remember that OO isnt about lists of related data as
> relational DB are.   Its about collections of unique objects.

Is that an advantage or a disadvantage?

> To me in OO every instance of an object is unique and that I feel
> means if one knows enough of the attributes one can go directly to the
> item without searching for it

By pointer traversal? Doesn't you need some kind of hashmaps etc, in
order to "go directtly". Isn't this also a kind of "search`"?

> (in the rare case there are duplicate
> items they are reference counted).  In close appromimation to
> relational, I think its like knowing the primary key but in OO its a
> composite key based on the sum of all of the attributes and methods at
> a given point in time (this means that an objects identity can
> change).

Object identity is a memory adress.

> In OO, I would suggest that we also dont have collections of related
> data, we have schemas of objects (a set of rules that define the
> collection). Schema can also be composite.  In the SQL statement above
> as I interpreted it one has a collection of invoices each with
> possibly a collection of payments.   The most ideal way of searching
> is to go for the one that is easier to find - so if one knows the
> payment find that, then transverse to its invoice (like using a
> foreign key).   In other words, sometime its better to reverse the
> search order given the information known.

A SQL database is very good at doing thing kind of decisions.
Depending on the query and current statistics, it will choose the best
set of indexes to use.

> Think of it like Bob going into a crowded hall and
> shouting out, can Wilma of 43 someplace street please go to the
> information kiosk fill in the appropriate form.  Then a short time
> later, a faint voice shouts back, ok I am done what next. (this is the
> observer pattern)

I order to "shout out" to every object, you need to traverse every
object. The complexity of the algorithm will be O(N). In a SQL
database, with appropriate indexes, most searches will be O(log N).

> So to me in reference to the select statment above, the request is
> simply - can everyone who has a sum of money that is now due for
> payment, please now execute their request payment routines (remember,
> objects contain methods based on their classification as well).  If
> one were being smart one could actually provide them a method
> invocation to execute - which is called dynamic method invocation.

That means that every object need to calculate the sums and decide if
calling the callback or not. The complexity will be O(N). Using SQL,
you could create a materialized view with the sum columns, create
indexes for them, and you will have O(log N).

> So I think the techniques in OO are that objects can find themselves,
> objects can manipulate themselves and it can all work in parallel.

That is why so many OO applications have such bad performance.

> b.t.w I like to think that CRUD in OO does not involve moving data.
> The data is quite capable of moving itself :)

Relational algebra (or SQL) has nothing to do with moving data.

//frebe