From: Panu on
H. S. Lahman wrote:
....
> When in doubt, think: encapsulation. The mechanisms for accessing stored
> data in flat files, an RDB, an OODB, or on clay tablets will be quite
> different. Do the applications that need to access stored data care
> about those details?

I find it interesting ... wouldn't it be
better to encapsulate further, and only
access 'objects' instead of flat data which
needs to be parsed somehow according to
some agreed-upon syntax?

If yes, it would seem that object-databases
are the better solution. In practice they
are not around much these days. But if we
say "relational data better", then this would
seem to be against the advice of encapsulation,
no?

I'm truly interested in this question - not
trying to argue either way:

*If encapsulation is good, why are relational
databases so prevalent?*


-Panu Viljamaa
From: H. S. Lahman on
Responding to Panu...

>> When in doubt, think: encapsulation. The mechanisms for accessing
>> stored data in flat files, an RDB, an OODB, or on clay tablets will be
>> quite different. Do the applications that need to access stored data
>> care about those details?
>
>
> I find it interesting ... wouldn't it be
> better to encapsulate further, and only
> access 'objects' instead of flat data which
> needs to be parsed somehow according to
> some agreed-upon syntax?

First, let me qualify that I am not talking about CRUD/USER processing
where the primary problem being solved by an application is conversion
between the DB and UI views of data.

Databases of any flavor exist to provide storage of data that is
independent of how the data are used. IOW, the database provides
persistence in a fashion that is reusable across broad classes of data
usage. It is also provides optimum efficiency of access when usage is
arbitrary. To do that it must provide standardized access mechanisms
like SQL. But the databases access mechanisms are optimized for the
particular database storage paradigm. Thus SQL isn't very useful for
accessing an OODB.

OTOH, applications solve very specific and unique problems for the
customer. To do that efficiently the application often needs a
customized view of that data that is different than the view in a
particular persistence paradigm. In addition, the problem the
application is solving really doesn't care which particular storage
paradigm is used for storage. So the application needs to encapsulate
the persistence paradigm and provide an interface to it that suits its
specific solution needs.

Thus the application is going to abstract data objects that it needs to
store/recover just as you suggest. But those data objects will be
tailored to the particular application problem context. Similarly, the
database is going to deal with data objects that are tailored to its
particular paradigm. So a mapping is needed to the storage view de jour.
Thus it really doesn't matter what the storage paradigm is; a mapping
still needs to be provided between the application and storage views.

As an obvious example, consider how relationships are managed in an OO
application vs. an RDB. In the RDB they are instantiated at the table
(class) level and one needs explicit embedded identity in the tuple. In
an OO application relationships are instantiated at the object (tuple)
level rather than the class level and identity is usually implicit in a
memory address. Thus one /constructs/ objects and their relationships
differently in an OO application than in an RDB. The result is that
query-based processing and joins are very rare in OO applications.

>
> If yes, it would seem that object-databases
> are the better solution. In practice they
> are not around much these days. But if we
> say "relational data better", then this would
> seem to be against the advice of encapsulation,
> no?

One needs a different accessing paradigm for OODBs. That's because OODBs
are optimized to deal with data where there are many complex
relationships among data elements and those relationships need to be
instantiated at the object level rather than the class (table) level.
[It is no accident that the memory-mapped OODBs provide a literal
mapping to the <OO> application structure. The price of that literal
mapping is that the OODB access is ubiquitous in the code so that one
cannot switch OODB vendors without massive shotgun refactoring.]

The real issue is that the database -- RDB or OODB -- provides an
interface for generic data access. But that interface is necessarily
optimized around the particular storage paradigm. Thus the database
provides a quite abstract access mechanism in its interface, but that
mechanism is limited to the particular storage paradigm (however, common
it may be). Think of it this way:

+-------------+ +-----------------+
| Application | | Database |
| +------+ +-------+ |
| | Iin |<-------------------| Iout | |
| +------+ +-------+ |
| | | |
| +------+ +-------+ |
| | Iout |------------------->| Iin | |
| +------+ +-------+ |
| | | |
+-------------+ +-----------------+

The application has an input interface, Iin, that the rest of the world
uses to talk to it. Similarly, the database has in input interface, Iin,
that the rest of the world talks to, such as a SQL driver interface.

However, because the application doesn't care about specific persistence
paradigms and has its own unique view of persistence, internally it
talks to its own output interface when it needs to communicate with
persistence. Similarly, the database will have its own internal view of
data that it needs to convert to the more generic view the rest of the
world wants to see (e.g., datasets), so it internally talks to an output
interface, Iout. [That is trivial for an RDB since Iin is query-based
and synchronous; in effect there is no Iout interface. But it can raise
all sorts of interesting issues for virtual memory in a memory-mapped OODB.]

What the Iin and Iout interfaces provide is decoupling. They allow the
implementations of the application and database engine be completely
independent of their context. The interfaces presented by Iin and Iout
are always fixed. The "glue" that resolves syntactic mismatches between
paradigms resides in the implementation of the Iout interface. It
provides the view conversion from its interface to the relevant Iin
interface of the service. (For a complex application Iout will tend
encapsulated in the substitutable subsystem that is reusable across
applications, depending on the persistence paradigm currently in favor.)

Bottom line: Iin/Iout are at different levels of abstraction and serve
different masters for the application and the database.

> I'm truly interested in this question - not
> trying to argue either way:
>
> *If encapsulation is good, why are relational
> databases so prevalent?*

But RDBs /are/ encapsulated. That is exactly what SQL provides; a quite
abstract interface to a particular storage paradigm. That allows RDBs to
be plug & play across applications. It is just at a different level of
abstraction than the application problem solution's view.

The answer to this question is more about /how/ the data is structured
and used in the intended market. RDBs are ideally suited to
read-many/write-few contexts where relationships among data are
relatively simple. CRUD/USER processing is the quintessential example of
where the RDB paradigm shines and it absolutely dominated IT through the
'70s when RDBs came on the scene. Those criteria still dominate IT
because the data is looked at a whole lot more than it is updated and
relationships just aren't that complicated in IT. Since IT still
dominates the softwre market, it is not a surprise that RDBs prevail.

[There is a chicken-and-egg issue. Are IT relationships simple because
the IT world is coerced by the presence of RDBs left over from when
CRUD/USER processing ruled? FWIW, I think so. I think it is because the
IT problem domain naturally works that way because it is the easiest way
to manage processes where a lot of people are involved. IOW, KISS rules
for business processes.]

OODBs are ideally suited to situations where relationships tend to be
quite complex at the tuple level and, to a lesser extent, where data is
constantly being updated (more precisely, where data changes need to be
synchronized among multiple clients simultaneously). Those sorts of
problems are much less common. In fact, the only obvious examples that
come quickly to mind are mapping software (e.g., MapQuest) and MMORPGs.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
hsl(a)pathfindermda.com
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
info(a)pathfindermda.com for your copy.
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH



From: topmind on

H. S. Lahman wrote:
> Responding to Panu...

[snip]

> OTOH, applications solve very specific and unique problems for the
> customer. To do that efficiently the application often needs a
> customized view of that data that is different than the view in a
> particular persistence paradigm.

Relational is perfectly capable of handling "local" app-specific views
and/or copies of data. I will agree that the current crop of tools
often does not make that very easy though. But I used to do it all the
time back in the days before OOP hype killed the market for nimble
table-oriented tools. Plus, on a small app or task level, OOP is
usually overkill anyhow.

(In a nearby message I already griped about calling RDBMS mere
"storage mechanisms", so I won't repeat that here.)


> But RDBs /are/ encapsulated. That is exactly what SQL provides; a quite
> abstract interface to a particular storage paradigm. That allows RDBs to
> be plug & play across applications. It is just at a different level of
> abstraction than the application problem solution's view.

I would not call it a "different level of abstraction". SQL is very
high-level, in some ways even more high-level than OOP. Many OO
proponents want to hide it away because they either don't like it or
don't want to bother to learn it and deal on a 1960's pointer-by-
pointer-like approach instead.

>
> The answer to this question is more about /how/ the data is structured
> and used in the intended market. RDBs are ideally suited to
> read-many/write-few contexts where relationships among data are
> relatively simple.

I've seen a lot of RDBMS where the relationships were far from simple.
(True, they probably could have been cleaned up a bit, but the
businesses were inherantly non-trivial.)


> OODBs are ideally suited to situations where relationships tend to be
> quite complex at the tuple level and, to a lesser extent, where data is
> constantly being updated (more precisely, where data changes need to be
> synchronized among multiple clients simultaneously). Those sorts of
> problems are much less common. In fact, the only obvious examples that
> come quickly to mind are mapping software (e.g., MapQuest) and MMORPGs.

GIS (mapping) applications often use RDBMS. ESRI comes to mind.


> H. S. Lahman

-T-

From: Phlip on
Panu wrote:

> *If encapsulation is good, why are relational
> databases so prevalent?*

Because one good way to process tables of data is with declarative
statements. SQL typically allows you to declare the results you want,
encapsulating their mechanism.

The DSLs (Domain Specific Languages) that wrap SQL are often also
declarative. And they are still encapsulating, and still based in an OO
language.

--
Phlip
http://www.oreilly.com/catalog/9780596510657/
^ assert_xpath