SQL [OOP]

Prev: Use Case Point Estimation
Next: delegation vs. inheritance

From: topmind on 21 Jan 2006 14:06

> >>>>The point is that there are alternative /implementations/ for
> >>>>persistence to RDBs in the computing space. SQL has already made that
> >>>>implementation choice.
> >>>
> >>>
> >>>SQL is not an implementation. What is the difference between locking
> >>>yourself to SQL instead of locking yourself to Java? If you want
> >>>open-source, then go with PostgreSQL. What is the diff? Java ain't no
> >>>universal language either.
> >>
> >>Of course it's an implementation! It implements access to physical
> >>storage.
> >
> >
> > Just as Java implements access to physical RAM etc.
>
> Exactly. Java is a specific implementation of a 3GL. 3GL is the
> abstraction, Java is an implementation. Persistence access is the
> abstraction, SQL is an implementation.

Why do you keep saying "persistence"? I don't think you get the idea of
RDBMS and query languages. Like I said, think of a RDBMS as an
"attribute management system". Forget about disk drives for now. Saying
it is only about "persistence" is simply misleading.

>
> >> More important to the context here, that implementation is
> >>quite specific to one single paradigm for stored data.
> >
> >
> > Any language or API is pretty much going to target a specific paradigm
> > or two. I don't see any magic way around this, at least not that you
> > offer. UML is no different.
>
> 4GLs get around it because they are independent of /all/ computing space
> implementations.

I am not sure UML qualifies as 4th Gen. Just because it can be
translated into multiple languages does not mean anything beyond Turing
Equivalency. C can be translated into Java and visa verse.

>
> However, that's not the point. SQL is a 3GL but comparing it to Java is
> specious because Java is a general purpose 3GL.

Again, this gets into the definiton of "general purpose". I agree that
query languages are not meant to do the *entire* application, but that
does not mean it is not general purpose. File systems are "general
purpose", but that does not mean that one writes an entire application
in *only* a file system. It is a general purpose *tool*, NOT intended
to be the whole enchilata.

A hammer is a general purpose tool, but that does not mean one is
supposed to ONLY use a hammer. You need to clarify your working
definition of "general purpose", and then show it the consensus
definition for 4GL.

> SQL represents a
> solution to persistence access that is designed around a particular
> model of persistence itself. So one can't even use it for general
> purpose access to persistence, much less general computing.

Please clarify. Something can still be within a paradigm and be general
purpose. Further GP does not necessarily mean "all purpose", for
nothing is practially all purpose.

>
> >>Requirements -> 4GL -> 3GL -> Assembly -> machine code executable
> >>
> >>Everything on the left is a specification for what is immediately to its
> >>right. Similarly, everything to the right is a solution implementation
> >>for the specification on its immediate left.
> >
> >
> > Well that is a bit outdated. For one, the distinction between 4GL and
> > 3GL is fuzzy, and many compilers/interpreters don't use assembler.
>
> My 4GL definition isn't ambiguous, which is why I like it. Reviewers of
> OOA models have no difficulty recognizing implementation pollution.

Argument by authority.

>
> All compilers generate object code (relocatable Assembly). Most modern
> interpreters can produce storable bytecodes that are equivalent to
> Assembly from the VM's viewpoint. At run time one can view an
> interpreter as simply combining link and load functions that transform
> the bytecode to a machine instruction. But at some level the
> interpreter still has to understand that MUL,R1,R2 maps into bits.
>
> But you reverting to ploys again by deflecting. The context is
> specification vs. implementation, not how machine instructions are encoded.

You have not finished your analogy on the 3G and 4G side. Besides,
analogies often make poor evidence, being better for teaching or
illuminating.

>
> >>Go look at an SA/D Data Flow Diagram or a UML Activity Diagram. They
> >>express data store access at a high level of abstraction that is
> >>independent of the actual storage mechanism. SQL, ISAM, CODASYL, gets,
> >>or any other access mechanism, is then an implementation of that generic
> >>access specification.
> >
> >
> > SQL is independent of the "actual storage mechanism". It is an
> > interface. You may not like the interface, but that is another matter.
> > Repeat after me: "SQL is an interface, SQL is an interface, SQL is an
> > interface"....
>
> Try using SQL vs. flat files if you think it is independent of the
> actual storage mechanism. (Actually, you probably could if the flat
> files happened to be normalized to the RDM, but the SQL engine would be
> a doozy and would have to be tailored locally to the files.) SQL
> implements the RDB view of persistence and only the RDB view.

How is that different than ANY other interface? You are claiming magic
powers of UML that it simply does not have.

And as somebody pointed out, one can use SQL on flat files too. ODBC
drivers can be created to hook SQL to spreadsheets, flat files, etc.

> >>Java is certainly a general purpose 3GL. Like most 3GLs there are
> >>situations where there are better choices (e.g., lack of BCD arithmetic
> >>support makes it a poor choice for a General Ledger), but one could
> >>still use it in those situations. SQL, in contrast, is a niche language
> >>that just doesn't work for many situations outside its niche.
> >
> >
> > You could be right, but I have yet to see a good case outside of
> > split-second timing issues where there is a limit to the max allowed
> > response time. (This does not mean that rdbms are "slow", just less
> > predictable WRT response time.)
> >
> > If you can give an example outside of timing, please do. (I don't doubt
> > they exist, but I bet they are rarer than you imply. Some scientic
> > applications that use imaginary numbers and lots of calculus may also
> > fall outside.)
>
> Compute a logarithm. You can't hedge by dismissing "scientific"
> computations.

I didn't. Nothing is ideal for everything under the sun. Nothing. See
above about general-purpose tools.

> Try doing forecasting in an inventory control system w/o
> "scientific" computations.

I am not sure what you are implying here. I did not claim that
scientific computation was not necessary.

> Or try encoding the pattern recognition that
> the user of a CRUD/USER application applies to the presented data. The
> reality is that IT is now solving a bunch of problems that are
> computationally intensive.

As usual, "it depends". Problems where there is a lot of "chomping" on
a small set of data are probably not something DB's are good at (at
this time). An example might be the Travaling Salesman puzzle. However,
problems where the input is large and from multiple entities are more
up the DB's alley.

(It may be possible to use a DB to solve Salesmen quickly, but few
bother to research that area.)

>
> >>BTW, remember that I am a translationist. When I do a UML model, I
> >>don't care what language the transformation engine targets in the model
> >>implementation. (In fact, transformation engines for R-T/E typically
> >>target straight C from the OOA models for performance reasons.) Thus
> >>every 3GL (or Assembly) represents a viable alternative implementation
> >>of the notion of '3GL'.
> >
> >
> >
> > Well, UML *is* language. It is a visual language just like LabView is.
>
> Exactly. But solutions at the OOA level are 4GLs because they can be
> unambiguously implemented without change on any platform with any
> computing technologies.

So can any Turing Complete language.

>
> >>>>>>UML with a compliant AAL is an example of a 4GL. If I build an OOA
> >>>>>>model for, say, a POS Order Entry System, that model can be
> >>>>>>unambiguously implemented without change either manually as a print mail
> >>>>>>order catalogue or as software for a browser-based 'net application.
> >>>>>>The fundamental processing logic of catalogue organization and order
> >>>>>>entry is expressed the same way regardless of the implementation context.
> >>>>>
> >>>>>
> >>>>>And if other people/vendors made their own flavor of this tool with
> >>>>>differences between the implimentation, then it would be in the same
> >>>>>boat. Why should implementation A1 and A2 demote the "generation"
> >>>>>ranking of A?
> >>>>
> >>>>It is not the same thing at all. The 4GL solution does not care if
> >>>>persistence is /implemented/ with RDBs, OODBs, flat files, paper files,
> >>>>or clay tablets.
> >>>
> >>>
> >>>For the zillionth time, RDBMS are far more than just "persistence".
> >>
> >>It is only if one refuses to manage complexity by separating logical
> >>concerns.
> >
> >
> >
> > "Separation" is generally irrelavent in cyber-land. It is a phsycial
> > concept, not a logical one. Perhaps you mean "isolatable", which can be
> > made to be dynamic, based on needs. "Isolatable" means that there is
> > enough info to produce a seperated *view* if and when needed. This is
> > the nice thing about DB's: you don't have to have One-and-only-one
> > separation/taxonomy up front. OO tends to want one-taxonomy-fits-all
> > and tries to find the One True Taxonomy, which is the fast train the
> > Messland. Use the virtual power of computers to compute as-need
> > groupings based on metadata.
>
> You know very well what I mean by 'separation of concerns' in a software
> context, so don't waste our time recasting it. Modularity has been a
> Good Practice since the late '50s.

If there is only one concern set where each concern is mutually
exclusive, then we have no disagreement. In practice there are usually
multiple "partioning" candidates, and that is where the disagreements
usually arise. File and text systems don't make it easy to have
partitioning in all dimensions, so compromises must be made. It is "my
factor is more important than your factor, neener neener". If there is
only one way to slice the pizza, then there is no problem. But if there
are multiple ways, then a fight breaks out.

This is one reason why DB's are useful: the more info you put into the
DB instead of code, the more ad-hoc, situational partitionings you can
view. You are not forced to pick the One Right Taxonomy of
partitioning. Categorizational philosphers came to the consensus that
there is no One Right Taxonomy for most real-world things.

>
> >>Render unto the Disk generic static storage and render unto
> >>the Application context-dependent dynamics.
> >>
> >> * 1
> >>[Context] ----------------- [Data]
> >>
> >>
> >> 1 *
> >>[Problem Solution] -------- [Data]
> >>
> >>The first view if the basis of the RDB paradigm -- generic storage of
> >>the same data for access by many different contexts. The second view is
> >>the one that is relevant for solving large problems -- access of data
> >>that is carefully tailored to the problem in hand. Storing and
> >>accessing data for many different contexts is a quite different problem
> >>than formatting and manipulating data to solve a specific problem.
> >
> >
> >
> > Again, DB's are not JUST for "storage". There are RAM-only RDBMS's.
>
> I agree they are used that for more, but it is not my problem if
> developers are determined to shoot themselves in the foot by bleeding
> cohesion all over the place. It is plain bad software practice to
> ignore logical modularity.

Again, in practice there are multiple incompatable modularity
candidates in non-trivial software. Life if multi-dimensional, and the
more complex the software the more factors there are.

Change impact analysis often does not help either because I found out
that people perceive change and change probabilities different. It is
hard to plan for change when people don't perceive the future the same.

>
> As far as RAM RDBs go, for any large non-CRUD/USER problem I can
> formulate a solution (which doesn't have to be OO) that will beat your
> RAM RDB for performance, and often by integer factors.

Claims claims claims. Yaawwwn.

> The RDB paradigm
> is not designed for context-dependent problem solving; it is designed
> for generic static data storage and access in a context-independent manner.

I think what you view as context-dependent is not really context
dependent after all. It is just your pet way of viewing the world
because of all the OOP anti-DB hype.

>
> Before you argue that the RAM RDB saves developer effort because it is
> largely reusable and that may be worth more than performance, I agree.
> But IME for /large/ non-CRUD/USER problems the computer is usually too
> small and performance is critical.

Please clarify. Ideally the RDBMS would determine what goes into RAM
and what to disk such that the app developer doesn't have to give a
rat's rear. Cache management generally does this, but a both-way system
is probably not as fast as a dedicated RAM DB. Even if the two-way
ideal is not fully reached, one will soon have the *option* to switch
some or all of an app to a full-RAM DB as needed without rewriting the
app. The query language abstracts/encapsulates/hides that detail way.

>
> [I could also argue that an OO solution will provide one with optimum
> performance "for free" because it falls out of basic OOA/D for the
> solution logic. IOW, one doesn't need that sort of reuse. But I won't
> argue that because that would be going down the rabbit hole. B-)]

No, it often hard-wires in the early usage paths such that future usage
paths that go against those early paths turn into a mess. OO tends to
be really lousy at many-to-many relationships, for example.

>
> >>For a non-CRUD/USER application, SQL and the DBMS provide the first
> >>relationship while a persistence access subsystem provides the
> >>reformatting for the second relationship.
> >
> >
> > Reformatting? Please clarify.
>
> The solution needs a different view of the data that is tailored to the
> problem in hand. So the RDB view needs to be converted to the solution
> view (and vice versa). IOW, one needs to reformat the RDB data
> structures to the solution data structures.

This is called a "result set" or "view". Most queries customize the
data to a particular task. Thus, it *is* a solution view.

>
> >>>>I am talking about the abstracting the domain where the original problem
> >>>>exists rather than the computing domain where a software solution will
> >>>>be executed. SQL only abstracts a very narrow part of the computing domain.
> >>>
> >>>
> >>>I disagree. A large part of *most* apps I have seen involves
> >>>database-oriented stuff. P. May mentioned security. Security can be
> >>>viewed as a dealing with large ACL tables. Most algorithms can be
> >>>reduced to mostly DB-oriented operations. I had to build a 3D graphics
> >>>system in college, and most of it could be reduced to DB-operations:
> >>>having "parts" reference each other in many-to-many tables,
> >>>transformation steps tracking, looking up polygons, cross-referencing
> >>>those polygons with their "parent part", storing scan-lines for later
> >>>inspection, etc. I will agree that DB's are not (currently) fast at
> >>>such, but still from a logical perspective the operations were
> >>>essentially DB-oriented. (Because I couldn't use a DB, I ended up
> >>>reinventing a lot of DB idioms and it was not very fun.)
> >>
> >>When the only tool you have is a Hammer, then everything looks like a
> >>Nail.
> >
> >
> >
> > No, out of necessity I started my career without DB usage, and I never
> > want to return there.
>
> That's because you are in a CRUD/USER environment where P/R works quite
> well. Try a problem like allocating a fixed marketing budget to various
> national, state, and local media outlets in an optimal fashion for a
> Fortunate 500.

Again, I never said that DB's are good for every problem. I don't know
enough about that particular scenario to propose a DB-centric solution
and to know whether it is an exception or not.

Unless you provide some specific use-case or detailed sceneria, it is
anecdote against anecdote here.

RDBMS are a common tool. The sales of Oracle, DB2, and Sybase are
gigantic.

> >> >><moved>What I implied was that CRUD/USER applications tend to be not very
> >> >>complex. Report generation was never very taxing even back in the COBOL
> >> >>days, long before SQL, RDBs, or even the RDM. Substituting a GUI or
> >> >>browser UI for a printed report doesn't change the fundamental nature of
> >> >>the processing.
> >> >
> >> >
> >> > Please clarify. If a process was "not taxing", then you are simply
> >> > given more duties and projects to work on. Management loads you up
> >> > based on your productivity and work-load.
> >>
> >>Back in the '60s and early '70s writing COBOL code to extract data and
> >>format reports was a task given to the entry level programmers. That's
> >>where the USER acronym (Update, Sort, Extract, Report) came from. The
> >>stars went on to coding Payroll and Inventory Control where one had to
> >>encode business rules and policies to solve specific problems.
> >
> >
> >
> > Fine, show how OO better solves business rule management. (Many if not
> > most biz rules can be encoded as data, BTW, if you know how.)
>
> Why? That has nothing to do with whether a DBMS should execute dynamic
> business rules and policies. This isn't an OO vs. P/R discussion, much
> as you would like to make it so.

Are you saying it is a UML-versus-RDB debate?

>
> >>It is only when the
> >>problem solution gets drawn into the software that one leaves the realm
> >>of CRUD/USER processing and thing start to get tricky.
> >>
> >>
> >>>>Unfortunately, I agree with May that the rest of the paragraph makes no
> >>>>sense; it just seems to be your personal jargon and mantras.
> >>>
> >>>

>
> >>They are part of
> >>the predictable collection of forensic ploys you use when debating OO
> >>people. It's all designed to have an emotional effect to put the
> >>opponent on tilt.
> >>
> >>You seem to get your amusement out of having OO people go bonkers.
> >
> >
> >
> > I will admit there is a certain satisfaction of using other people's
> > own logic against themselves, especially if they have insulted me
> > prior.
>
> That doesn't answer why you went to the trouble of creating an
> inflammatory website and have been here for years. A simple dislike of
> OO? I don't think so. How many converts have you made to justify your
> crusade?

I get enough "amen brother's" to provide all the social satisfaction I
need from it.

> It just wouldn't be worth the effort of beating your head
> against the wall all these years. So you have to have some other
> reason. The only plausible reasons I see are Quixotic masochism or you
> enjoy pulling people's chains.

Perhaps an Asperger's Syndrome: obsession with a specific narrow topic.
Whatever, if you want to sit around and speculate on my motivation, be
my guest. Frankly, I am not that important to waste time on.

>
> As far as insulting you is concerned, what do you expect? You throw out
> inflammatory statements, especially misconceptions about what OO
> development is about, that are designed to drive anyone who understands
> OO up a tree. If I used my knowledge of OO and tried to design a
> website that would drive OO people to outrage, it would be your
> geocities website. It pushes all the buttons in admirable fashion.
> (That you can push all the right buttons is what makes me believe you
> actually understand a lot more about OO than you pretend; it would be
> difficult to be so inflammatory without that knowledge.) So I have to
> conclude it is intentional. When you jump up and down on the bellows
> long enough, you will get burned.

Whatever. If OO was truely great you could demonstrate it with a coded
business example that many if not most OO proponents would agree is
good OO. You can't. BilliOOns of dOOllars spent based on anecdotes,
bragging, and brochure-talk.

[....]

> >>>You are so cute when you paint me as bad, manipulative, and evil.
> >>
> >>Not bad or evil, but definitely manipulative. You just find it amusing
> >>to pull people's chains and the OO community is providing plenty of soft
> >>targets. As I've said before, I think you are actually know a lot more
> >>about OO development than you let on and you are pretty clever about the
> >>way you tweak the OO people who engage with you.
> >
> >
> >
> > You are spreading falsehoods about RDBMS. They are NOT low-level. You
> > only treat them as low level.
>
> Where did I say that? I said that once one is out of the realm of
> CRUD/USER processing, /talking/ to persistence is a low level service
> _within the application_. How persistence is implemented outside the
> application is a whole other story.
>

We'll, we both agree that DB's are not for everything. However, we
disagree widely on where the limits lay.

RDBMS tend to not be the right tool where performance, hardware
packaging, or timing is more critical than change management. If
something changes often, then a RDBMS is a more general-purpose
solution. This is not to say that RDBMS are slow, they just will not be
competitive with a critial system designed for a very specific,
slow-changing purpose. But for the budget-minded who don't want to
build low-level tools from scratch and want flexibility, DB's are the
way to go.

I believe most cases where DB's are not appropriate for the application
will fall into the above category.

>
> *************
> There is nothing wrong with me that could
> not be cured by a capful of Drano.
>
> H. S. Lahman

-T-

From: frebe on 22 Jan 2006 05:33

> SQL is not limited to persistence.
Finally we can agree about something. Does this mean that you will stop
making this claim?

> However, that is probably where 99.99% of the usage lies.
I suppose that you are talking about your usage of SQL. In an average
enterprise application, non-persitence features like queries,
transactions, referential integrity, caching, etc, are heavily used.

> SQL is specifically designed for the RDB implementation paradigm of the
> RDM.
Because you have created a new definition of the term RDM, that is
different from Codd's definition, the distinction between RDB and RDM
is your own invention.

> You could develop a SQL driver to use file names as table identity and
> read lines via an implied line number as a key, but good luck on
> correctly dealing with line insertions and deletions without an embedded
> key.
Why would line number be the key?

> The RDM is basic set theory.
Are you saying that the RDM is based on basic set theory or that the
RDM is nothing more than basic set theory?

> Codd was explicitly dealing with
> persistence in a computing environment so he expressed the rules in
> terms of embedded identity attributes (keys). However the set theory
> only requires that each tuple have unique identity.

In what way does emedded identity attributes limits Codd's RDM to
persistence? The option appear to be pointers, which was used in
network databases. Emedded keys vs pointers is orthogonal to persistent
vs in-memory.

> Thus you will see a discussion of normalization of
> Class Models in most standard OOA/D books
I was asking for a definition of the second definition of the RDM,
broader than Codd's definition. I was not asking for discussions about
class model normalization.

Fredrik Bertilsson
http://butler.sourceforge.net

From: frebe on 22 Jan 2006 08:09

> There
> is a reason why the CRUD work is typically given to new hires and
> junior developers.

The reason why CRUD work is given to junior developers is the fact that
using OO design, CRUD applications are very bloated. If RAD tools were
used instead, the same work would be done in a few minutes, saving a
lot of money instead of hiring an army of (junior) developers.

Fredrik Bertilsson
http://butler.sourceforge.net

From: H. S. Lahman on 22 Jan 2006 12:13

Responding to Frebe...

>>SQL is not limited to persistence.
>
> Finally we can agree about something. Does this mean that you will stop
> making this claim?

I never made that claim.

>>However, that is probably where 99.99% of the usage lies.
>
> I suppose that you are talking about your usage of SQL. In an average
> enterprise application, non-persitence features like queries,
> transactions, referential integrity, caching, etc, are heavily used.

If they are using SQL for that in a non-CRUD/USER application for
anythign other than persistence access, then they are misusing SQL.
Even in a CRUD/USER application it doesn't make much sense from a
performance viewpoint if the data is in memory.

>>SQL is specifically designed for the RDB implementation paradigm of the
>>RDM.
>
> Because you have created a new definition of the term RDM, that is
> different from Codd's definition, the distinction between RDB and RDM
> is your own invention.

Codd's definition /is/ the RDB view; it is a specialized application of
more general set theory...

>>You could develop a SQL driver to use file names as table identity and
>>read lines via an implied line number as a key, but good luck on
>>correctly dealing with line insertions and deletions without an embedded
>>key.
>
> Why would line number be the key?

How else would you uniquely identify each line for individual access in
a text file?

>>The RDM is basic set theory.
>
> Are you saying that the RDM is based on basic set theory or that the
> RDM is nothing more than basic set theory?

The RDM is a combination of basic set theory and predicate logic that
deals with relational calculus using terminology like relation, tuple,
and attribute. Codd's data model is an application of the RDM that
deals with relational algebra using terminology like table, row, and
field (see his original 1970 paper, "A Relational Model of Data in Large
Shared Data Banks", ACM Communications, 13, pgs. 377-387 where he
introduced the notion of representing data in tables).

While Codd was the first to provide a formal and consistent view of the
RDM, the RDM itself has been greatly expanded over the years beyond the
RDB view. Today it can be applied to such disparate arenas as OO
development and OODBs...

>>Codd was explicitly dealing with
>>persistence in a computing environment so he expressed the rules in
>>terms of embedded identity attributes (keys). However the set theory
>>only requires that each tuple have unique identity.
>
>
> In what way does emedded identity attributes limits Codd's RDM to
> persistence?

It doesn't. But Codd's goal was to describe persisted data and he
developed the initial view of the RDM around the notion of RDB storage.
Just look at the titles of Codd's early books and papers and try to
convince me that his research wasn't /focused/ on RDBs and persistence:

A Relational Model of Data for Large Shared Data Banks, 1970

Normalized Data Base Structure: A Tutorial, 1971

A Data Base Sublanguage Founded on the Relational Calculus, 1971

Further Normalization of the Data Base Relational Model, 1972

Relational Completeness of Data Base Languages, 1972

The Gamma-0 n-ary Relational Data Base Interface Specifications of
Objects and Operations, 1973

Recent Investigation in Relational Data Base Systems, 1974

Implementation of Relational Data Base Management Systems, 1975

He was a researcher in IBM's hard disk division, for Pete's sake!

>>Thus you will see a discussion of normalization of
>>Class Models in most standard OOA/D books
>
> I was asking for a definition of the second definition of the RDM,
> broader than Codd's definition. I was not asking for discussions about
> class model normalization.

For starters, try set theory. Though I am not a fan, you might also
look at Chris Date's work for descriptions of the RDM well beyond the
RDB view.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
hsl(a)pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

From: H. S. Lahman on 22 Jan 2006 14:57

Responding to Jacobs...

>>>>>SQL is not an implementation. What is the difference between locking
>>>>>yourself to SQL instead of locking yourself to Java? If you want
>>>>>open-source, then go with PostgreSQL. What is the diff? Java ain't no
>>>>>universal language either.
>>>>
>>>>Of course it's an implementation! It implements access to physical
>>>>storage.
>>>
>>>
>>>Just as Java implements access to physical RAM etc.
>>
>>Exactly. Java is a specific implementation of a 3GL. 3GL is the
>>abstraction, Java is an implementation. Persistence access is the
>>abstraction, SQL is an implementation.
>
>
>
> Why do you keep saying "persistence"? I don't think you get the idea of
> RDBMS and query languages. Like I said, think of a RDBMS as an
> "attribute management system". Forget about disk drives for now. Saying
> it is only about "persistence" is simply misleading.

Persistent data is data that is stored externally between executions of
an application. RDBs are a response to that need combined with a
requirement that access be generic (i.e., the data can be accessed by
many different applications, each with unique usage contexts). That's
what DBMSes do -- they manage persistent data storage and provide
generic, context-independent access to that data storage.

My point in this subthread is that such responsibilities are complicated
enough in practice that one does not want the DBMS to also manage and
execute dynamic business rules and policies. IOW, the DBMS should just
mind its own store. [This thread has been a veritable hotbed of puns.
I've probably made more in this thread than I've done in the last
decade. B-)]

>>>>More important to the context here, that implementation is
>>>>quite specific to one single paradigm for stored data.
>>>
>>>
>>>Any language or API is pretty much going to target a specific paradigm
>>>or two. I don't see any magic way around this, at least not that you
>>>offer. UML is no different.
>>
>>4GLs get around it because they are independent of /all/ computing space
>>implementations.
>
>
> I am not sure UML qualifies as 4th Gen. Just because it can be
> translated into multiple languages does not mean anything beyond Turing
> Equivalency. C can be translated into Java and visa verse.

A UML OOA model can be implemented unambiguously and without change in a
manual system. In fact, that is a test reviewers use to detect
implementation pollution. The OOA model for, say, a catalogue-driven
order entry system will look exactly the same whether it is implemented
as a 19th century mail-in Sears catalogue or a modern broswer-based web
application. That is not true for any 3GL.

>>However, that's not the point. SQL is a 3GL but comparing it to Java is
>>specious because Java is a general purpose 3GL.
>
>
> Again, this gets into the definiton of "general purpose". I agree that
> query languages are not meant to do the *entire* application, but that
> does not mean it is not general purpose. File systems are "general
> purpose", but that does not mean that one writes an entire application
> in *only* a file system. It is a general purpose *tool*, NOT intended
> to be the whole enchilata.

Huh?!? If you can't write the entire application in it, then it isn't
general purpose by definition.

> A hammer is a general purpose tool, but that does not mean one is
> supposed to ONLY use a hammer. You need to clarify your working
> definition of "general purpose", and then show it the consensus
> definition for 4GL.

huh**2?!? A hammer is not a general purpose tool by any stretch of the
imagination.

>>SQL represents a
>>solution to persistence access that is designed around a particular
>>model of persistence itself. So one can't even use it for general
>>purpose access to persistence, much less general computing.
>
>
> Please clarify. Something can still be within a paradigm and be general
> purpose. Further GP does not necessarily mean "all purpose", for
> nothing is practially all purpose.

SQL is designed around the RDB paradigm for persistence. It can't be
used for, say, accessing lines in a text flat file because the text file
is not does organize the data the way SQL expects. So SQL is not a
general purpose interface to stored data. Apropos of your point,
though, SQL is quite general purpose for accessing /any/ data in a
uniform way from a data store _organized like an RDB_.

>>>>Requirements -> 4GL -> 3GL -> Assembly -> machine code executable
>>>>
>>>>Everything on the left is a specification for what is immediately to its
>>>>right. Similarly, everything to the right is a solution implementation
>>>>for the specification on its immediate left.
>>>
>>>
>>>Well that is a bit outdated. For one, the distinction between 4GL and
>>>3GL is fuzzy, and many compilers/interpreters don't use assembler.
>>
>>My 4GL definition isn't ambiguous, which is why I like it. Reviewers of
>>OOA models have no difficulty recognizing implementation pollution.
>
>
> Argument by authority.

I prefer to think of it as argument by rational practicality.

>>>>Go look at an SA/D Data Flow Diagram or a UML Activity Diagram. They
>>>>express data store access at a high level of abstraction that is
>>>>independent of the actual storage mechanism. SQL, ISAM, CODASYL, gets,
>>>>or any other access mechanism, is then an implementation of that generic
>>>>access specification.
>>>
>>>
>>>SQL is independent of the "actual storage mechanism". It is an
>>>interface. You may not like the interface, but that is another matter.
>>>Repeat after me: "SQL is an interface, SQL is an interface, SQL is an
>>>interface"....
>>
>>Try using SQL vs. flat files if you think it is independent of the
>>actual storage mechanism. (Actually, you probably could if the flat
>>files happened to be normalized to the RDM, but the SQL engine would be
>>a doozy and would have to be tailored locally to the files.) SQL
>>implements the RDB view of persistence and only the RDB view.
>
>
>
> How is that different than ANY other interface? You are claiming magic
> powers of UML that it simply does not have.

There is a distinction between describing an interface and designing its
semantics. UML is quite capable of describing the semantics of any
interface. Deciding what the semantics should be is quite another thing
that the developer owns.

When I have a subsystem in my application to access persistent data,
that subsystem has an interface that the rest of the application talks
to. That interface is designed around the rest of the application's
data needs, not the persistence mechanisms. It is the job of the
persistence access subsystem to convert the problem solution's data
needs into the access mechanisms de jour.

If the persistence is an RDB, then the subsystem implementation will
<probably> use SQL. If the persistence is flat text files, it will use
the OS file manager and streaming facilities. If it is clay tablets, it
will use an OCR and stylus device driver API. That allows me to plug &
play the persistence mechanisms without touching the application
solution because it still talks to the same interface regardless of the
implementation of the subsystem.

IOW, the semantics of the interface to the subsystem is /designed/ at a
different level of abstraction than that of the subsystem
implementation. UML doesn't care about the design process; it just
represents the results.

> And as somebody pointed out, one can use SQL on flat files too. ODBC
> drivers can be created to hook SQL to spreadsheets, flat files, etc.

Only if the data is organized around embedded identity and normalized.
Even then such drivers carry substantial overhead and tend to be highly
tailored to specific applications. IOW, you need a different driver for
every context (e.g., a spreadsheet) and then it won't be as efficient as
an access paradigm designed specifically for the storage paradigm.

>>>>Java is certainly a general purpose 3GL. Like most 3GLs there are
>>>>situations where there are better choices (e.g., lack of BCD arithmetic
>>>>support makes it a poor choice for a General Ledger), but one could
>>>>still use it in those situations. SQL, in contrast, is a niche language
>>>>that just doesn't work for many situations outside its niche.
>>>
>>>
>>>You could be right, but I have yet to see a good case outside of
>>>split-second timing issues where there is a limit to the max allowed
>>>response time. (This does not mean that rdbms are "slow", just less
>>>predictable WRT response time.)
>>>
>>>If you can give an example outside of timing, please do. (I don't doubt
>>>they exist, but I bet they are rarer than you imply. Some scientic
>>>applications that use imaginary numbers and lots of calculus may also
>>>fall outside.)
>>
>>Compute a logarithm. You can't hedge by dismissing "scientific"
>>computations.
>
>
>
> I didn't. Nothing is ideal for everything under the sun. Nothing. See
> above about general-purpose tools.
>
>
>
>>Try doing forecasting in an inventory control system w/o
>>"scientific" computations.
>
>
> I am not sure what you are implying here. I did not claim that
> scientific computation was not necessary.

I was just anticipating your deflection; you've been using the
give-me-an-example ploy for years. B-) When the example is provided
you deflect by attacking it on grounds unrelated to the original point.
That's usually easy to do because examples are deliberately kept
simple to focus on the point in hand. That allows you to bring in
unstated requirements, programming practices designed for other
contexts, and whatnot to attack the example on grounds unrelated to the
original point. In this case, though, you screwed up by setting up a
basis for the deflection ahead of time.

You asked for an example outside of "timing". The main reason SQL isn't
a general purpose 3GL is that it can't handle dynamics (algorithmic
processing) very well. So the obvious examples are going to tend to be
algorithmic, such as computing a logarithm. But your parenthetical
hedge set up a basis for dismissing any obvious example as "scientific"
when you subsequently deflect. Then later you can argue the point was
never demonstrated.

>>Or try encoding the pattern recognition that
>>the user of a CRUD/USER application applies to the presented data. The
>>reality is that IT is now solving a bunch of problems that are
>>computationally intensive.
>
>
> As usual, "it depends". Problems where there is a lot of "chomping" on
> a small set of data are probably not something DB's are good at (at
> this time). An example might be the Travaling Salesman puzzle. However,
> problems where the input is large and from multiple entities are more
> up the DB's alley.

The Traveling Salesman problem can be arbitrarily large and the RDB
model will still probably not be useful because...

<aside>
FYI, most of the Operations Research algorithms are actually pretty
simple when written out in equations and the core processing doesn't
require a lot of code. Typically most of the code is involved in
getting the data into the application, setting up data structures, and
reporting the results. In addition, the interesting problems are huge
and involve vast amounts of data.

For example, the logistics for the '44 D-Day invasion of Normandy held
the record as the largest linear programming problem ever solved well
into the '70s. The equations for the Simplex solution were written in a
few lines but the pile of data processed was humongous and the actual
execution took months. (It had to be split up into many chunks because
of the MTTF of the computer hardware and a lot of preprocessing was done
by acres of clerks with hand-cranked calculators.)
</aside>

> (It may be possible to use a DB to solve Salesmen quickly, but few
> bother to research that area.)

Unlikely. It's an np-Complete problem so the worst case always involves
an exhaustive search of all possible combinations (i.e., O(N*N)). The
exotic algorithms just provide /average/ performance that approaches
O(NlogN). But those algorithms require data structures that are highly
tailored to the solution. And because of the crunching one wants
identity in the form of array indices, not embedded in tables or the
problem doesn't get solved in a lifetime.

>>>>BTW, remember that I am a translationist. When I do a UML model, I
>>>>don't care what language the transformation engine targets in the model
>>>>implementation. (In fact, transformation engines for R-T/E typically
>>>>target straight C from the OOA models for performance reasons.) Thus
>>>>every 3GL (or Assembly) represents a viable alternative implementation
>>>>of the notion of '3GL'.
>>>
>>>
>>>
>>>Well, UML *is* language. It is a visual language just like LabView is.
>>
>>Exactly. But solutions at the OOA level are 4GLs because they can be
>>unambiguously implemented without change on any platform with any
>>computing technologies.
>
>
> So can any Turing Complete language.

And your point is...?

On separation of concerns of problem solving dynamics vs. data
persistence and access:

>>>"Separation" is generally irrelavent in cyber-land. It is a phsycial
>>>concept, not a logical one. Perhaps you mean "isolatable", which can be
>>>made to be dynamic, based on needs. "Isolatable" means that there is
>>>enough info to produce a seperated *view* if and when needed. This is
>>>the nice thing about DB's: you don't have to have One-and-only-one
>>>separation/taxonomy up front. OO tends to want one-taxonomy-fits-all
>>>and tries to find the One True Taxonomy, which is the fast train the
>>>Messland. Use the virtual power of computers to compute as-need
>>>groupings based on metadata.
>>
>>You know very well what I mean by 'separation of concerns' in a software
>>context, so don't waste our time recasting it. Modularity has been a
>>Good Practice since the late '50s.
>
>
>
> If there is only one concern set where each concern is mutually
> exclusive, then we have no disagreement. In practice there are usually
> multiple "partioning" candidates, and that is where the disagreements
> usually arise. File and text systems don't make it easy to have
> partitioning in all dimensions, so compromises must be made. It is "my
> factor is more important than your factor, neener neener". If there is
> only one way to slice the pizza, then there is no problem. But if there
> are multiple ways, then a fight breaks out.
>
> This is one reason why DB's are useful: the more info you put into the
> DB instead of code, the more ad-hoc, situational partitionings you can
> view. You are not forced to pick the One Right Taxonomy of
> partitioning. Categorizational philosphers came to the consensus that
> there is no One Right Taxonomy for most real-world things.

There are three accepted criteria for application partitioning (i.e.,
separating concerns at the scale of subsystems): Subject matter, level
of abstraction, and requirements allocation via client/service
relationships. (BTW, this has nothing to do with OO; it is basic
Systems Engineering.)

Subject matter: Clearly static data storage and providing generic access
to it is a different subject matter than solving Problem X.

Level of abstraction: Outside CRUD/USER processing the detailed
manipulation of data storage (e.g. ,two-phased commit) is clearly at a
much lower level of abstraction than the algorithmic processing the
solves a particular problem. IOW, the application solution is
completely indifferent to where and how data is stored. One should be
able to solve the problem the same way regardless of what the
persistence mechanisms are. That substitutability means that the
problem solution is at a higher level of abstraction than the
persistence mechanisms.

Requirements Allocation: Clearly the requirements for persistence
implementation and access are quite different than the requirements on
the specific solution of Problem X.

So under all three of these criteria it makes sense to separate the
concerns of persistence from individual problem solutions. That's
exactly what DBMSes do. The problems only come into play when one
violates that separation of concerns and starts bleeding specific
problem solutions into the DBMS itself.

>>The RDB paradigm
>>is not designed for context-dependent problem solving; it is designed
>>for generic static data storage and access in a context-independent manner.
>
>
> I think what you view as context-dependent is not really context
> dependent after all. It is just your pet way of viewing the world
> because of all the OOP anti-DB hype.

My view of context-dependence is the solution to a /particular/ problem.
Each application solves a unique problem. IOW, the problem is the
context. RDBs provide persistence that allows all the applications to
access the data in a uniform way regardless of what specific problem
they are solving.

Whether one can solve the problem in a reasonable fashion with the data
structures mapped to the RDB structure depends on the nature of the
problem. For CRUD/USER processing one can. For problems outside that
realm one can't so one needs to convert data into structures tailored to
the problem in hand.

[Note that this is relevant to the point above about providing SQL
drivers for different storage paradigms. That makes sense for CRUD/USER
environments because one is already employing SQL as the norm. So long
as the exceptions requiring a special driver are fairly rare, one can
justify the single access paradigm. However, it makes no sense at all
for non-CRUD/USER environments because one has to reformat the data to
the problem solution anyway. So rather than reformatting twice, one
should just reformat once from a driver that optimizes for the storage
paradigm.]

>>Before you argue that the RAM RDB saves developer effort because it is
>>largely reusable and that may be worth more than performance, I agree.
>>But IME for /large/ non-CRUD/USER problems the computer is usually too
>>small and performance is critical.
>
>
> Please clarify. Ideally the RDBMS would determine what goes into RAM
> and what to disk such that the app developer doesn't have to give a
> rat's rear. Cache management generally does this, but a both-way system
> is probably not as fast as a dedicated RAM DB. Even if the two-way
> ideal is not fully reached, one will soon have the *option* to switch
> some or all of an app to a full-RAM DB as needed without rewriting the
> app. The query language abstracts/encapsulates/hides that detail way.

This is another non sequitur deflection. Caching and whatnot is not
relevant to the point I was making. There is business a trade-off
between run-time performance and developer development time that every
shop must make. Sometimes greater developer productivity can justify
reusing the RDB paradigm when more efficient specific solutions are
available.

However, my point was that those situations tend to map to CRUD/USER
processing. Once problems become more complex than format conversions
in UI/DB pipeline applications, performance becomes the dominant
consideration. I spent years solving large np-Complete problems on
machines like PDP11s and there was no contest on that issue; customers
simply would not spring for Crays in their systems but they would spring
for a marginal extra developer cost prorated across all systems.

>>>>For a non-CRUD/USER application, SQL and the DBMS provide the first
>>>>relationship while a persistence access subsystem provides the
>>>>reformatting for the second relationship.
>>>
>>>
>>>Reformatting? Please clarify.
>>
>>The solution needs a different view of the data that is tailored to the
>>problem in hand. So the RDB view needs to be converted to the solution
>>view (and vice versa). IOW, one needs to reformat the RDB data
>>structures to the solution data structures.
>
>
> This is called a "result set" or "view". Most queries customize the
> data to a particular task. Thus, it *is* a solution view.

That formatting is cosmetic. The most sophisticated reformatting is
combing data from multiple tables in a join into a single table dataset.
I am talking about data structures whose semantics are different,
whose access paradigms are different, whose relationships are different,
and/or whose structure is different. IOW, there isn't a 1:1 mapping to
the RDB. For example, if my solution requires the data to be organized
hierarchically SQL queries can't do that.

>>>>When the only tool you have is a Hammer, then everything looks like a
>>>>Nail.
>>>
>>>
>>>
>>>No, out of necessity I started my career without DB usage, and I never
>>>want to return there.
>>
>>That's because you are in a CRUD/USER environment where P/R works quite
>>well. Try a problem like allocating a fixed marketing budget to various
>>national, state, and local media outlets in an optimal fashion for a
>>Fortunate 500.
>
>
> Again, I never said that DB's are good for every problem. I don't know
> enough about that particular scenario to propose a DB-centric solution
> and to know whether it is an exception or not.
>
> Unless you provide some specific use-case or detailed sceneria, it is
> anecdote against anecdote here.
>
> RDBMS are a common tool. The sales of Oracle, DB2, and Sybase are
> gigantic.

Of course they are. They provide a generic, context-independent access
to stored data that any application can use. That's why they exist.
But that is beside the point.

The issue here is where individual business problems should get solved.
My assertion is that is an application responsibility. For CRUD/USER
processing one can use the same data structures in the solution as in
the RDB so P/R as a software development paradigm works well.
Generally, though, one can't use the same data structures once one is
out of the CRUD/USER realm so P/R doesn't work very well.

>>>>>><moved>What I implied was that CRUD/USER applications tend to be not very
>>>>>>complex. Report generation was never very taxing even back in the COBOL
>>>>>>days, long before SQL, RDBs, or even the RDM. Substituting a GUI or
>>>>>>browser UI for a printed report doesn't change the fundamental nature of
>>>>>>the processing.
>>>>>
>>>>>
>>>>>Please clarify. If a process was "not taxing", then you are simply
>>>>>given more duties and projects to work on. Management loads you up
>>>>>based on your productivity and work-load.
>>>>
>>>>Back in the '60s and early '70s writing COBOL code to extract data and
>>>>format reports was a task given to the entry level programmers. That's
>>>>where the USER acronym (Update, Sort, Extract, Report) came from. The
>>>>stars went on to coding Payroll and Inventory Control where one had to
>>>>encode business rules and policies to solve specific problems.
>>>
>>>
>>>
>>>Fine, show how OO better solves business rule management. (Many if not
>>>most biz rules can be encoded as data, BTW, if you know how.)
>>
>>Why? That has nothing to do with whether a DBMS should execute dynamic
>>business rules and policies. This isn't an OO vs. P/R discussion, much
>>as you would like to make it so.
>
>
> Are you saying it is a UML-versus-RDB debate?

Another deflection. How do you get from how complex report generation
software is to UML vs. RDB? The topic here has nothing to do with OO,
P/R, or UML. It is about the complexity of processing for CRUD/USER
applications vs. other applications.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
hsl(a)pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

First | Prev | Next | Last
Pages: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Prev: Use Case Point Estimation
Next: delegation vs. inheritance