From: frebe73 on
> Just so I have things straight in the future: when you're talking about
> persistence, you're talking about the means by which data is moved to a
> persistent storage medium, like a hard disk?

Yes.

> I was talking about a way of treating object instances as if they exist
> independent of any particular running process. So they would be
> 'persistent' in the sense that they stick around (conceptually, at
> least) until they are explicitly destroyed (and terminating the process
> which created them does not count as explicit destruction).

The most common (or maybe the only) way to archieve that is to move the
data to a persistence storage medium.

> > Everybody with a solid background using RDBMS knows that a RDBMS is
> > about much more than persistence, and would still use a RDBMS even if
> > persistence is not needed. Many people from OO-land implements a lot of
> > data management features by them self in every application, instead of
> > using the features already provided by the RDBMS, and uses the RDBMS
> > only for persistence. If you want to, I can give you real-world
> > examples with the various downsides with this approach.
>
> I would certainly agree that re-implementing RDBMS features in
> application-level code would be A Bad Thing. But it seems to me that if
> those of us in OO-land can avoid making that mistake and instead rely
> on those features (here I'm thinking particularly of ACID transactions
> and query capabilities), we can achieve useful results.
>
> In the interest of identifying things which OO people should /not/ do,
> can you post what you would consider to be a particularly pathological
> example of inappropriately re-implementing the data management features
> an RDBMS provides for free?

Queries (or predicate logic) is the first obvious one. When OO people
separates SQL statements into a dedicated layer, they also try to limit
the number of distinct SQL statements because the burden or modifying
interfaces for every new statement. The consequence is that rather
simple select statements are used and additional filtering is done in
the application. This will hit performance and reduce maintainability
of the application.

Caching is the second issue. Because OO people want to play with an
object graph instead of predicate logic, they need the graph or parts
of it to be virtually in memory all the time. This will very quickly
lead to huge RAM consumtion, unless you use caching in your
application. The DBMS already do caching for you, which synchronizes
the cach with transactions and handles all concurrency issues. But if
you try to do application caching, the realibility of the cached data
will be rather low.

Transactions and concurrency is the thirst issue. Because OO
applications like to have state that are shared between different
threads (client calls), you end up with having to solve concurrency
issues in the application. In Java it is done using "synchronized". As
soon as you are locking resources, you have the risk of deadlock. But a
RDBMS is much better detecting deadlocks, than for example the JVM.
Emulating rollback in your application is also a very tricky task.

There are also a common miconception that databases only should be used
for "permanent" data. Other data should be handled using low-level
collection features in the applications. But temporary tables are a
very useful if you need features like sorting and searching, but don't
need persistence.

Views are also very underused. As soon as you have a indentical select
statement that are called from multiple points in your application, a
view should be created. A view can also contain a considerable about of
business logic that can be reused in an effecient way, and accross
different programming languages.

The main cause for all these problems is the fact that OO people like
to use objects as data structures and creating a domain model.
According to Ted Codd and Chris Date, the table (relation) is the only
(high-level) data structure. Using other data structures will cause an
impedance mismatch. But objects are still very useful for other
purposes. As a matter of fact, the relational model needs
classes/objects for defining data types others but the existing onces
like strings and dates.

Fredrik Bertilsson
http://frebe.php0h.com

From: AndyW on
On Sat, 4 Nov 2006 15:42:35 +0100, "Dmitry A. Kazakov"
<mailbox(a)dmitry-kazakov.de> wrote:

>On 3 Nov 2006 14:14:29 -0800, aloha.kakuikanu wrote:
>
>> Matt McGill wrote:
>>> Evidently
>>> 'persistence' evokes very different concepts in my mind and in yours,
>>> perhaps as a result of different development backgrounds? Anyway, I'm
>>> going to stop cluttering this thread.
>>
>> Persistence is a significant idea from a programmer perspective who is
>> unfamiliar with database management fundamentals. "You can save a
>> subgraph of your program's object spaghetty into a file and it can
>> outlive the program!" Wow, what a big deal.
>>
>> In relational world you deal with logical concepts, and lifetime
>> doesn't apply to logical entities.
>
>This is of course wrong. Persistence addresses not time, but scope. If the
>scope is bound to time, that makes the thing "real-time," which alone is
>unrelated to persistence.

Not sure where you get your definition from, but persistance refers
something over time. An item is transient if it exists in a shorter
timeframe that its creator and persistent if it lasts longer.

You may need to explain what you mean by the word 'scope'.

From: Dmitry A. Kazakov on
On Mon, 06 Nov 2006 00:36:43 +1300, AndyW wrote:

> On Sat, 4 Nov 2006 15:42:35 +0100, "Dmitry A. Kazakov"
> <mailbox(a)dmitry-kazakov.de> wrote:
>
>>On 3 Nov 2006 14:14:29 -0800, aloha.kakuikanu wrote:
>>
>>> Matt McGill wrote:
>>>> Evidently
>>>> 'persistence' evokes very different concepts in my mind and in yours,
>>>> perhaps as a result of different development backgrounds? Anyway, I'm
>>>> going to stop cluttering this thread.
>>>
>>> Persistence is a significant idea from a programmer perspective who is
>>> unfamiliar with database management fundamentals. "You can save a
>>> subgraph of your program's object spaghetty into a file and it can
>>> outlive the program!" Wow, what a big deal.
>>>
>>> In relational world you deal with logical concepts, and lifetime
>>> doesn't apply to logical entities.
>>
>>This is of course wrong. Persistence addresses not time, but scope. If the
>>scope is bound to time, that makes the thing "real-time," which alone is
>>unrelated to persistence.
>
> Not sure where you get your definition from, but persistance refers
> something over time. An item is transient if it exists in a shorter
> timeframe that its creator and persistent if it lasts longer.
>
> You may need to explain what you mean by the word 'scope'.

Scope = frame, {} brackets in C. It is not necessarily a time frame. When X
is persistent relatively to the program A, that merely means that the scope
where X exists exceeds one of A. It does not mean that X exists out of any
scope. There must always be a larger scope of some other program B, where X
does exist. After all, X is a program artefact, it can't live in vacuum. In
the case of a database, B could be DBMS. Within the scope of B X is no more
persistent. Obviously, persistence is relative to the scope of the
beholder. There is no absolutely persistent thing.

Now time is an orthogonal issue. You can associate some execution time with
scopes, but it would be an artificial association. Because the same program
might run on different computers in different times and places. Time and
space are abstracted away. This is not the case for real-time programs,
which are called *real* time for exactly that matter.

------
There are two logical errors made in both camps. One is to bind things to
absolute time. Another is to claim that things themselves are absolute. In
reality contradictory things both co- and not exist. There is no
distinguished "set of facts" one could put into a DB and then deduce
everything else...

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
From: Matt McGill on
I'll preface my replies by saying that almost all of my experience with
persistent objects has involved Hibernate, one of a number of ORM
frameworks out there for Java (and now .NET, I believe) programmers to
use. My aim here is not so much to focus on the strengths and
weaknesses of various ORM implementations (I think Toplink was already
mentioned, in a none-to-appreciative way), but rather to explore how
well an ORM framework, when used properly, might be able to address the
examples you've brought up. Naturally, /mis/using an ORM framework, or
using it naively, will not end well.

> Queries (or predicate logic) is the first obvious one. When OO people
> separates SQL statements into a dedicated layer, they also try to limit
> the number of distinct SQL statements because the burden or modifying
> interfaces for every new statement. The consequence is that rather
> simple select statements are used and additional filtering is done in
> the application. This will hit performance and reduce maintainability
> of the application.

So the problem here would be that SQL statements are being seperated
out into a layer or particular module, and that the task of maintaining
the interface for that layer or module causes OO programmers to
artificially limit themselves to simple queries. Being thus limitted,
they start performing join and selections in-memory on the query
results, when more complicated queries could have done those things
much more efficiently. Correct?

I can see how maintaining such a data access layer by hand might
predispose OO developers to use SQL in a naive way, or implement
functionality in-memory that a query could have given for free (doing
selections or joins this way could end up being particularly
disasterous). But when making use of an ORM framework, you don't need a
hand-created data access layer at all. Moreover, you get (at least with
Hibernate) an object-aware query language which, while limitting in
some ways (you can only join on mapped relationships, for example), is
actually more flexible than straight SQL in other ways (polymorphic
queries).

I'm not trying to claim HQL (Hibernate's query language) is the
greatest thing since sliced bread - it has to be used with care. But
when it /is/ used correctly I think there are some benefits over SQL. I
hope to post some code examples that I think can illustrate these
benefits as soon as I get a chance.

> Caching is the second issue. Because OO people want to play with an
> object graph instead of predicate logic, they need the graph or parts
> of it to be virtually in memory all the time. This will very quickly
> lead to huge RAM consumtion, unless you use caching in your
> application. The DBMS already do caching for you, which synchronizes
> the cach with transactions and handles all concurrency issues. But if
> you try to do application caching, the realibility of the cached data
> will be rather low.

I'm not sure I follow. My experience has been with multi-user web
applications, so the goal is typically to pull only those parts of the
object graph into memory that are actually necessary, and to keep them
there only during the processing of a request. If you let the instances
(which are /not/ shared between threads, as you indicate below) hang
around in memory for long periods of time, the data goes stale.

My understanding is that for many (most?) database-backed applications,
the database resides on a different server than the client of the
database (whether that client is a web/application server, or
individual client machines). The database's caching helps to mitigate
disk access times, but there's still the network overhead of
transferring the data to the client of the database. Unless I'm missing
something, you've got that problem to some extent regardless of whether
you're using OO/ORM on top of an RDBMS, or straight SQL. Sure, if most
of your business logic is implemented in stored procedures on the
database, then the only data going over the wire is the data to
display. But for the applications I've worked on thus far most of the
data requested ends up being displayed in some form.

Hibernate, on the other hand, has some sophisticated caching mechanisms
built in. The actual cache functionality is supplied by third-party
cache libraries, and you can take your pick (there are a bunch). I
agree that rolling your own would be a dumb idea, but there's no reason
to.

Hibernate's cache obviously can't always be made use of. Rich clients
all connecting to a shared database would not be able to make use of
the cache, for obvious reasons. But in the event that an application
server is hosting a multi-user application, and that application is the
only one which writes to a particular set of tables, Hibernate's cache
is simply a tool which can give significant performance improvement in
the form of saved round-trips to the database, and which is not to my
knowledge readily available to those using a SQL-only approach. Memory
consumption is obviously a concern, but the cache can be configured on
a per-object basis, with maximum sizes, object counts, and expiration
times. There is even support for caching the results of queries (this
only makes sense in a small set of situations, but for those situations
it's a huge help).

> Transactions and concurrency is the thirst issue. Because OO
> applications like to have state that are shared between different
> threads (client calls), you end up with having to solve concurrency
> issues in the application. In Java it is done using "synchronized". As
> soon as you are locking resources, you have the risk of deadlock. But a
> RDBMS is much better detecting deadlocks, than for example the JVM.
> Emulating rollback in your application is also a very tricky task.

ACID transactions are my favorite RDBMS function, for this very reason
=) I'd prefer not to think about the complexity that rolling my own
transaction support would add to even the simplest of situations, but
this is not necessary. The transaction support provided by the
underlying RDBMS is more than sufficient.

Even better, transaction demarcation can be done declaratively rather
than programmatically, with a little AOP. If I want some class's
saveChanges() method (which will modify 'persistent' objects) to be
transactional, I don't even have to begin/commit the transaction in the
code, and remembe
From: Matt McGill on

topmind wrote:
> > The answer depends a great deal on the application itself. There is
> > nothing intrinsically wrong with dealing directly with table data;
> > indeed, that is often the simplest solution. On the other hand,
> > complex applications often need to have complex behavior associated
> > with the data, and so objects become important. The objects are
> > sometimes related to the relational tables, but often they are not.
>
> This is an OOP myth. Often much or most of the behavior CAN be
> converted into a declarative form (data).
>
> For an extreme example, a brain could be more or less modeled with a
> schema such as:
>
> table: Links
> =================
> sourceNode_ID
> destinationNode_ID
> weight // weighting factor, can be negative in some models
>
> table: Node
> ===============
> node_ID
> activationFuncIndicator // see note
> activationWeight // the "volume" given to activation function
>
> There are about 5 activation functions in common use: unit_step,
> sigmoid, piecewise_linear, gaussian, and identity. (I haven't reviewed
> my schema model closely, so buyer beware. This model allows "Y splits",
> which real neurons don't directly allow IIRC, but can be modeled with
> explicit neurons such that they are still interchangable.)

All you've done is represent the relationships between the neurons and
a couple weights. Assuming you had inserted the data for every neuron
in your brian, and each interconnection, into your tables. Would the
RDBMS suddenly start establishing network connections and posting
inflamatory comments to usenet? Or would it just sit there, waiting to
be queried?

The data about connections and weights doesn't represent behavior, it
assumes its existence elsewhere. If you don't write some code to
activate some neurons, run queries to find connected neurons, apply the
activation functions (which would themselves need to be implemented in
code), and repeat, then nothing happens. You've not made any behavior
declarative. In fact, you've effectively provided an example for
Martin's argument - the logic which would be necessary to drive a brain
simulation off of the data in these tables could be encapsulated in
Neuron and NeuronLink objects, which get their data (weights and
function indicator) from the tables and then do the right things with
them. Those objects could then be used as part of a larger simulation
involving sensory organs, nerves, muscle tissue, etc. You don't /have/
to use objects naturally. You could use structured programming just as
well.

-Matt McGill