|
From: frebe73 on 5 Nov 2006 01:44 > Just so I have things straight in the future: when you're talking about > persistence, you're talking about the means by which data is moved to a > persistent storage medium, like a hard disk? Yes. > I was talking about a way of treating object instances as if they exist > independent of any particular running process. So they would be > 'persistent' in the sense that they stick around (conceptually, at > least) until they are explicitly destroyed (and terminating the process > which created them does not count as explicit destruction). The most common (or maybe the only) way to archieve that is to move the data to a persistence storage medium. > > Everybody with a solid background using RDBMS knows that a RDBMS is > > about much more than persistence, and would still use a RDBMS even if > > persistence is not needed. Many people from OO-land implements a lot of > > data management features by them self in every application, instead of > > using the features already provided by the RDBMS, and uses the RDBMS > > only for persistence. If you want to, I can give you real-world > > examples with the various downsides with this approach. > > I would certainly agree that re-implementing RDBMS features in > application-level code would be A Bad Thing. But it seems to me that if > those of us in OO-land can avoid making that mistake and instead rely > on those features (here I'm thinking particularly of ACID transactions > and query capabilities), we can achieve useful results. > > In the interest of identifying things which OO people should /not/ do, > can you post what you would consider to be a particularly pathological > example of inappropriately re-implementing the data management features > an RDBMS provides for free? Queries (or predicate logic) is the first obvious one. When OO people separates SQL statements into a dedicated layer, they also try to limit the number of distinct SQL statements because the burden or modifying interfaces for every new statement. The consequence is that rather simple select statements are used and additional filtering is done in the application. This will hit performance and reduce maintainability of the application. Caching is the second issue. Because OO people want to play with an object graph instead of predicate logic, they need the graph or parts of it to be virtually in memory all the time. This will very quickly lead to huge RAM consumtion, unless you use caching in your application. The DBMS already do caching for you, which synchronizes the cach with transactions and handles all concurrency issues. But if you try to do application caching, the realibility of the cached data will be rather low. Transactions and concurrency is the thirst issue. Because OO applications like to have state that are shared between different threads (client calls), you end up with having to solve concurrency issues in the application. In Java it is done using "synchronized". As soon as you are locking resources, you have the risk of deadlock. But a RDBMS is much better detecting deadlocks, than for example the JVM. Emulating rollback in your application is also a very tricky task. There are also a common miconception that databases only should be used for "permanent" data. Other data should be handled using low-level collection features in the applications. But temporary tables are a very useful if you need features like sorting and searching, but don't need persistence. Views are also very underused. As soon as you have a indentical select statement that are called from multiple points in your application, a view should be created. A view can also contain a considerable about of business logic that can be reused in an effecient way, and accross different programming languages. The main cause for all these problems is the fact that OO people like to use objects as data structures and creating a domain model. According to Ted Codd and Chris Date, the table (relation) is the only (high-level) data structure. Using other data structures will cause an impedance mismatch. But objects are still very useful for other purposes. As a matter of fact, the relational model needs classes/objects for defining data types others but the existing onces like strings and dates. Fredrik Bertilsson http://frebe.php0h.com
From: AndyW on 5 Nov 2006 06:36 On Sat, 4 Nov 2006 15:42:35 +0100, "Dmitry A. Kazakov" <mailbox(a)dmitry-kazakov.de> wrote: >On 3 Nov 2006 14:14:29 -0800, aloha.kakuikanu wrote: > >> Matt McGill wrote: >>> Evidently >>> 'persistence' evokes very different concepts in my mind and in yours, >>> perhaps as a result of different development backgrounds? Anyway, I'm >>> going to stop cluttering this thread. >> >> Persistence is a significant idea from a programmer perspective who is >> unfamiliar with database management fundamentals. "You can save a >> subgraph of your program's object spaghetty into a file and it can >> outlive the program!" Wow, what a big deal. >> >> In relational world you deal with logical concepts, and lifetime >> doesn't apply to logical entities. > >This is of course wrong. Persistence addresses not time, but scope. If the >scope is bound to time, that makes the thing "real-time," which alone is >unrelated to persistence. Not sure where you get your definition from, but persistance refers something over time. An item is transient if it exists in a shorter timeframe that its creator and persistent if it lasts longer. You may need to explain what you mean by the word 'scope'.
From: Dmitry A. Kazakov on 5 Nov 2006 09:15 On Mon, 06 Nov 2006 00:36:43 +1300, AndyW wrote: > On Sat, 4 Nov 2006 15:42:35 +0100, "Dmitry A. Kazakov" > <mailbox(a)dmitry-kazakov.de> wrote: > >>On 3 Nov 2006 14:14:29 -0800, aloha.kakuikanu wrote: >> >>> Matt McGill wrote: >>>> Evidently >>>> 'persistence' evokes very different concepts in my mind and in yours, >>>> perhaps as a result of different development backgrounds? Anyway, I'm >>>> going to stop cluttering this thread. >>> >>> Persistence is a significant idea from a programmer perspective who is >>> unfamiliar with database management fundamentals. "You can save a >>> subgraph of your program's object spaghetty into a file and it can >>> outlive the program!" Wow, what a big deal. >>> >>> In relational world you deal with logical concepts, and lifetime >>> doesn't apply to logical entities. >> >>This is of course wrong. Persistence addresses not time, but scope. If the >>scope is bound to time, that makes the thing "real-time," which alone is >>unrelated to persistence. > > Not sure where you get your definition from, but persistance refers > something over time. An item is transient if it exists in a shorter > timeframe that its creator and persistent if it lasts longer. > > You may need to explain what you mean by the word 'scope'. Scope = frame, {} brackets in C. It is not necessarily a time frame. When X is persistent relatively to the program A, that merely means that the scope where X exists exceeds one of A. It does not mean that X exists out of any scope. There must always be a larger scope of some other program B, where X does exist. After all, X is a program artefact, it can't live in vacuum. In the case of a database, B could be DBMS. Within the scope of B X is no more persistent. Obviously, persistence is relative to the scope of the beholder. There is no absolutely persistent thing. Now time is an orthogonal issue. You can associate some execution time with scopes, but it would be an artificial association. Because the same program might run on different computers in different times and places. Time and space are abstracted away. This is not the case for real-time programs, which are called *real* time for exactly that matter. ------ There are two logical errors made in both camps. One is to bind things to absolute time. Another is to claim that things themselves are absolute. In reality contradictory things both co- and not exist. There is no distinguished "set of facts" one could put into a DB and then deduce everything else... -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de
From: Matt McGill on 5 Nov 2006 23:23 I'll preface my replies by saying that almost all of my experience with persistent objects has involved Hibernate, one of a number of ORM frameworks out there for Java (and now .NET, I believe) programmers to use. My aim here is not so much to focus on the strengths and weaknesses of various ORM implementations (I think Toplink was already mentioned, in a none-to-appreciative way), but rather to explore how well an ORM framework, when used properly, might be able to address the examples you've brought up. Naturally, /mis/using an ORM framework, or using it naively, will not end well. > Queries (or predicate logic) is the first obvious one. When OO people > separates SQL statements into a dedicated layer, they also try to limit > the number of distinct SQL statements because the burden or modifying > interfaces for every new statement. The consequence is that rather > simple select statements are used and additional filtering is done in > the application. This will hit performance and reduce maintainability > of the application. So the problem here would be that SQL statements are being seperated out into a layer or particular module, and that the task of maintaining the interface for that layer or module causes OO programmers to artificially limit themselves to simple queries. Being thus limitted, they start performing join and selections in-memory on the query results, when more complicated queries could have done those things much more efficiently. Correct? I can see how maintaining such a data access layer by hand might predispose OO developers to use SQL in a naive way, or implement functionality in-memory that a query could have given for free (doing selections or joins this way could end up being particularly disasterous). But when making use of an ORM framework, you don't need a hand-created data access layer at all. Moreover, you get (at least with Hibernate) an object-aware query language which, while limitting in some ways (you can only join on mapped relationships, for example), is actually more flexible than straight SQL in other ways (polymorphic queries). I'm not trying to claim HQL (Hibernate's query language) is the greatest thing since sliced bread - it has to be used with care. But when it /is/ used correctly I think there are some benefits over SQL. I hope to post some code examples that I think can illustrate these benefits as soon as I get a chance. > Caching is the second issue. Because OO people want to play with an > object graph instead of predicate logic, they need the graph or parts > of it to be virtually in memory all the time. This will very quickly > lead to huge RAM consumtion, unless you use caching in your > application. The DBMS already do caching for you, which synchronizes > the cach with transactions and handles all concurrency issues. But if > you try to do application caching, the realibility of the cached data > will be rather low. I'm not sure I follow. My experience has been with multi-user web applications, so the goal is typically to pull only those parts of the object graph into memory that are actually necessary, and to keep them there only during the processing of a request. If you let the instances (which are /not/ shared between threads, as you indicate below) hang around in memory for long periods of time, the data goes stale. My understanding is that for many (most?) database-backed applications, the database resides on a different server than the client of the database (whether that client is a web/application server, or individual client machines). The database's caching helps to mitigate disk access times, but there's still the network overhead of transferring the data to the client of the database. Unless I'm missing something, you've got that problem to some extent regardless of whether you're using OO/ORM on top of an RDBMS, or straight SQL. Sure, if most of your business logic is implemented in stored procedures on the database, then the only data going over the wire is the data to display. But for the applications I've worked on thus far most of the data requested ends up being displayed in some form. Hibernate, on the other hand, has some sophisticated caching mechanisms built in. The actual cache functionality is supplied by third-party cache libraries, and you can take your pick (there are a bunch). I agree that rolling your own would be a dumb idea, but there's no reason to. Hibernate's cache obviously can't always be made use of. Rich clients all connecting to a shared database would not be able to make use of the cache, for obvious reasons. But in the event that an application server is hosting a multi-user application, and that application is the only one which writes to a particular set of tables, Hibernate's cache is simply a tool which can give significant performance improvement in the form of saved round-trips to the database, and which is not to my knowledge readily available to those using a SQL-only approach. Memory consumption is obviously a concern, but the cache can be configured on a per-object basis, with maximum sizes, object counts, and expiration times. There is even support for caching the results of queries (this only makes sense in a small set of situations, but for those situations it's a huge help). > Transactions and concurrency is the thirst issue. Because OO > applications like to have state that are shared between different > threads (client calls), you end up with having to solve concurrency > issues in the application. In Java it is done using "synchronized". As > soon as you are locking resources, you have the risk of deadlock. But a > RDBMS is much better detecting deadlocks, than for example the JVM. > Emulating rollback in your application is also a very tricky task. ACID transactions are my favorite RDBMS function, for this very reason =) I'd prefer not to think about the complexity that rolling my own transaction support would add to even the simplest of situations, but this is not necessary. The transaction support provided by the underlying RDBMS is more than sufficient. Even better, transaction demarcation can be done declaratively rather than programmatically, with a little AOP. If I want some class's saveChanges() method (which will modify 'persistent' objects) to be transactional, I don't even have to begin/commit the transaction in the code, and remembe
From: Matt McGill on 5 Nov 2006 23:40
topmind wrote: > > The answer depends a great deal on the application itself. There is > > nothing intrinsically wrong with dealing directly with table data; > > indeed, that is often the simplest solution. On the other hand, > > complex applications often need to have complex behavior associated > > with the data, and so objects become important. The objects are > > sometimes related to the relational tables, but often they are not. > > This is an OOP myth. Often much or most of the behavior CAN be > converted into a declarative form (data). > > For an extreme example, a brain could be more or less modeled with a > schema such as: > > table: Links > ================= > sourceNode_ID > destinationNode_ID > weight // weighting factor, can be negative in some models > > table: Node > =============== > node_ID > activationFuncIndicator // see note > activationWeight // the "volume" given to activation function > > There are about 5 activation functions in common use: unit_step, > sigmoid, piecewise_linear, gaussian, and identity. (I haven't reviewed > my schema model closely, so buyer beware. This model allows "Y splits", > which real neurons don't directly allow IIRC, but can be modeled with > explicit neurons such that they are still interchangable.) All you've done is represent the relationships between the neurons and a couple weights. Assuming you had inserted the data for every neuron in your brian, and each interconnection, into your tables. Would the RDBMS suddenly start establishing network connections and posting inflamatory comments to usenet? Or would it just sit there, waiting to be queried? The data about connections and weights doesn't represent behavior, it assumes its existence elsewhere. If you don't write some code to activate some neurons, run queries to find connected neurons, apply the activation functions (which would themselves need to be implemented in code), and repeat, then nothing happens. You've not made any behavior declarative. In fact, you've effectively provided an example for Martin's argument - the logic which would be necessary to drive a brain simulation off of the data in these tables could be encapsulated in Neuron and NeuronLink objects, which get their data (weights and function indicator) from the tables and then do the right things with them. Those objects could then be used as part of a larger simulation involving sensory organs, nerves, muscle tissue, etc. You don't /have/ to use objects naturally. You could use structured programming just as well. -Matt McGill |