|
From: Matt McGill on 10 Nov 2006 17:44 frebe73(a)gmail.com wrote: > > Worse, the application processing was often structured around the > > preferred organization for the paradigm. > > How could we do it different know, if we don't know anything about the > future paradigm? > > > Encapsulate the persistence mechanism behind a single subsystem > > interface (an API in the Procedural Days). Design the subsystem > > interface in terms of what the problem solution's needs for data are, > > which will be independent of how the data is stored. > > What is your definition of "store"? Store to a persistent medium, store > into a variable, store into another process, or? > > > Then let the > > subsystem provide the mapping of that interface into the persistence > > paradigm de jour. > > If we use a RDBMS, the persistence part is already separated. The > application has no idea about if, when or how data is persisted. > > > Thus the application solution always requests, "Save this pile of data I > > call X" and "Give me the pile of data I saved before as X." The > > persistence access subsystem maps the X identity and the pile of data > > into records in ISAM files, RDB tables, clay tablets, or whatever. > > Is X always an identifier? Should you be allowed to use any predicate > logic in this interface ("give me the pile of data I saved before > having X=5 or Y=6")? > > Fredrik Bertilsson I think you might be misunderstanding. As you've already pointed out to me, 'store' is a rather overloaded word, which can be applied at many different conceptual levels. In this case, I think there are at least three distinct conceptual levels: 1. What the data /is/ This would be the highest level of abstraction - information systems deal with /information/, which is data interpreted in a particular context. The application is typically dealing with things at this level. IOW, the application is presenting the user with a timesheet (information), as opposed to just a collection of numbers (data). 2. How the data is /represented/ We can represent time sheet data in a number of different ways - the relational model is a particularly good one, but other models were used before it existed. 3. How is the data /saved to a persistent storage medium/ This is, as you pointed out, entirely encapsulated by an RDBMS, or by a file system for example. I think Lahman is talking about abstracting an interface (making an API, if you will) between levels one and two. So your application would have a generic way of saying "give me time sheet X" or "give me all time sheets for user Y". If flat files were being used at the time the application was written, some implementation of the API would be written which uses flat files. Perhaps one file is named after each user, and stores a series of time sheets. When the RDBMS came along, it would then be far easier to convert the legacy data, and very little (no?) application code would need to be rewritten. Note that I'm not trying to imply that the above API has to have anything to do with objects. It could just as easily been procedural in nature and achieved the same result. -Matt McGill
From: frebe73 on 11 Nov 2006 00:37 > Will you explain what you mean by "Transactions and peristence are two > orthogonal features?" Persistence is about storing something to a persistent medium. Transactions is about letting either all operations complete or none operation complete. A subset of operations in a transaction can never complete if not all operations complete. In many cases a transaction ends with writing the changelog to persistent medium, but that is not necessary. For all-in-RAM databases, data is never written to persistent medium, but you might still benefit from transactions. Fredrik Bertilsson
From: frebe73 on 11 Nov 2006 01:06 > 1. What the data /is/ > > This would be the highest level of abstraction - information systems > deal with /information/, which is data interpreted in a particular > context. The application is typically dealing with things at this > level. IOW, the application is presenting the user with a timesheet > (information), as opposed to just a collection of numbers (data). Can you give some examples of data in this level? > 2. How the data is /represented/ > > We can represent time sheet data in a number of different ways - the > relational model is a particularly good one, but other models were used > before it existed. Classes is a very good tool for represent data. Numbers, strings and dates are built-in in most databases, other custom data types might be represented by custom classes registered to the database. > 3. How is the data /saved to a persistent storage medium/ > > This is, as you pointed out, entirely encapsulated by an RDBMS, or by a > file system for example. The strange thing is that you don't mention data structures at all. I think we have a agreement that classes is a good tool for representing data (level 2), but the disagreement is about how to represent data structures. > I think Lahman is talking about abstracting an interface (making an > API, if you will) between levels one and two. So your application would > have a generic way of saying "give me time sheet X" or "give me all > time sheets for user Y". That means that predicate logic should be included in the interface? > If flat files were being used at the time the > application was written, some implementation of the API would be > written which uses flat files. Perhaps one file is named after each > user, and stores a series of time sheets. When the RDBMS came along, it > would then be far easier to convert the legacy data, and very little > (no?) application code would need to be rewritten. What if I want to know who is working a particular day? Using flat files, I would use the existing API function "give me all time sheets for user Y", parsing the time sheets and find out if the person is working a particalar day or not. All this processing would be done in the layer on top the "persistence layer". Later when RDBMS came along, I would have to write a SQL query to join all tables that form a time sheet for user Y. In the layer above I would have to parse and process this verbose data in the same way as before. But the best way using a RDBMS would be to write a new API function returning the the users working a particular day, supported by a SQL select statement. But before SQL was introduced, we would never realize that the API should have this function, because it contained non-persistence processing. The problem is if you want to prepare for a future "paradigm", you have to know something about it, which we don't. Otherwise we will use the next generation database using the previous generations interface. In every database generation shift, the border between what we consider "persistence logic" and "business logic" has moved. It is likely to happen in the future, why it is impossible to define an interface that will stand for future changes. Fredrik Bertilsson
From: H. S. Lahman on 11 Nov 2006 15:40 Responding to Frebe73... >>Worse, the application processing was often structured around the >>preferred organization for the paradigm. > > > How could we do it different know, if we don't know anything about the > future paradigm? The rest of the message explains exactly that. >>Encapsulate the persistence mechanism behind a single subsystem >>interface (an API in the Procedural Days). Design the subsystem >>interface in terms of what the problem solution's needs for data are, >>which will be independent of how the data is stored. > > > What is your definition of "store"? Store to a persistent medium, store > into a variable, store into another process, or? The application solution doesn't care if the data is stored in an RDB, flat files, an OODB, shared memory, or on clay tablets. >> Then let the >>subsystem provide the mapping of that interface into the persistence >>paradigm de jour. > > > If we use a RDBMS, the persistence part is already separated. The > application has no idea about if, when or how data is persisted. And if you decide to use an OODBMS? Or flat files? An RDBMS is a very particular, albeit currently very common, persistence mechanism. The application solution needs to be decoupled from particular persistence mechanisms. >>Thus the application solution always requests, "Save this pile of data I >>call X" and "Give me the pile of data I saved before as X." The >>persistence access subsystem maps the X identity and the pile of data >>into records in ISAM files, RDB tables, clay tablets, or whatever. > > > Is X always an identifier? Should you be allowed to use any predicate > logic in this interface ("give me the pile of data I saved before > having X=5 or Y=6")? X=5 is just another way of defining identity. ************* There is nothing wrong with me that could not be cured by a capful of Drano. H. S. Lahman hsl(a)pathfindermda.com Pathfinder Solutions http://www.pathfindermda.com blog: http://pathfinderpeople.blogs.com/hslahman "Model-Based Translation: The Next Step in Agile Development". Email info(a)pathfindermda.com for your copy. Pathfinder is hiring: http://www.pathfindermda.com/about_us/careers_pos3.php. (888)OOA-PATH
From: H. S. Lahman on 11 Nov 2006 16:28
Responding to Frebe73... >>>Additionally, relational data models can be more easily proven >>>correct--or correct enough--before an investment is made in coding. >> >>I'm not sure I buy that. More easily than what? The RDM normalization >>can be applied beyond the RDB's table/tuple paradigm. > > > What is the "RDB's table/tuple paradigm"? Say, what?!? Are you saying you don't know what an RDB table is or what a tuple is within the table? Or that the tables, keyed tuples, and relationships in an RDB represent a specific implementation of the relational data model? >>And OO Class Models are routinely normalized as >>part of the basic paradigm methodology. > > > Many class diagrams would break 1NF. I also see a problem with applying > to 2 & 3 NF because the id of the object is not a value itself, but a > pointer. Because object may be easily cloned, I suppose that would > break 2NF. Actually, 1NF is much more commonly broken in RDBs than in Class Models. A classic example is a telephone number, which will almost always be stored in the RDB as a single number but if the elements of the number (e.g., area code) are important to the problem in hand, they will always be broken out as distinct attributes in a Class. Objects abstract uniquely identifiable problem space entities. An address in process memory is unique, so that satisfies the mapping. It is actually more versatile that the RDB paradigm. Consider 6-32 screws in an inventory. They are effectively clones without explicit identity values but they are still uniquely identifiable in the problem space. So long as the object corresponding to each screw has a unique address, it is identifiable in the same sense that the physical screws in the problem space are. The only way you can avoid 2/3NF problems for that situation in an RDB is by adding an artificial explicit identity (e.g., autonumber) to the tuple itself. >>However, I don't see that as being very relevant. My point is that the >>application's problem solution doesn't care how the data is stored. > > > Neither do the relational model or SQL. The RDM, yes. But try using SQL on flat sequential files or an OODB. Note that RDM and RDB are not synonyms. An RDB is a special case of the RDM. >>If it doesn't care how it is stored, it certainly doesn't care how the >>storage mechanism is validated. > > > I guess Thomas is talking about how the business rules are validated. You are pulling sentences out of context and responding to them out of context. >>>Lastly, your database is language-neutral. It shouldn't matter what >>>language the application sitting in front of the database is written in, >>>or even what paradigm it's born from. Flexibility starts with a good >>>database design and extends through the application--not the other way >>>around. >> >>That's true enough but I would make it even stronger. RDBs are designed >>to be problem-independent, not just language independent, which is >>pretty much my point. > > > The relational model is used for modelling data, problem-independent or > not. Just because some data could be considered "problem-dependent", it > may very well me modelled using the RM. I said RDBs, not the RDM. An RDB is one of many possible implementations of the RDM. >>The data structures one needs to optimize the solution to a /particular/ >>problem in an application are often quite different than the structures >>best suited to optimizing generic, ad hoc access of the same data. > > > In some special scenarios, B-trees might not be the best solution and > arrays or hashtables might be better choices. But I think that is > low-level optimization without significant impact on average enterprise > applications. I guess you don't do a lot of high performance applications. >>So >>if one is solving a non-CRUD/USER problem where special optimization is >>usually required, one wants to separate the views of the solution from >>those of the RDB. > > > Using low-level collection classes is not a good idea for modern > enterprise applications. There are a lot of issues like concurrency or > transactions, that you have to solve by yourself in that case. By "enterprise applications" do you mean server-side layers? I am talking about client-side applications solving a particular business problem. Any concurrency relevant to client-side applications is completely different than the concurrency related to processing parallel transactions in the DBMS. The tuple-based relationships of the OO paradigm work quite well in concurrent environments. ************* There is nothing wrong with me that could not be cured by a capful of Drano. H. S. Lahman hsl(a)pathfindermda.com Pathfinder Solutions http://www.pathfindermda.com blog: http://pathfinderpeople.blogs.com/hslahman "Model-Based Translation: The Next Step in Agile Development". Email info(a)pathfindermda.com for your copy. Pathfinder is hiring: http://www.pathfindermda.com/about_us/careers_pos3.php. (888)OOA-PATH |