Java Data Objects

 

Presentation to Adelaide Chapter of Australian Java Users Group Nov 2002

 

John Sharley

Cognitive Machines Pty Ltd

www.cognitivemachines.com

 

Approach

 

Rather than working through the APIs at this hour of the day in a completely eye-glazing manner, let’s focus more on concepts and an analysis of JDO.

 

There is a larger context for Java Data Objects, that being persistence research. Let’s have a look at that larger context before delving into JDO proper.

 

The Persistence Abstraction

 

An object that outlives the program that creates it is said to have long-term persistence. The view that an object’s persistence is an orthogonal property of the object is at odds with historical practice in which a custom persistence mechanism is dependent on a specific class, for example writing JDBC to persist objects of a particular class. In seeing persistence as an object property independent of the object class we are allowing the possibility that persistence can be factored out of application specific programming and placed in general purpose programming. This is of course the ideal of general reuse.

 

Having a data manipulation language separate from the programming language is quite some distance from having an integrated environment. Integrating the database view of data with the programming language view of data allows type correctness to be enforced and for there to be a coherent computational model. There is no need to visualize more than one model of a program’s data. Data is created and manipulated in a way independent of its lifetime, which has simpler semantics. There are no application code mappings between persistent and active data representations.

 

Persistence perfectly abstracted from object class is termed orthogonal persistence. The persistence property is orthogonal from all other properties of an object.

Research into orthogonal persistence has been underway for a quarter of century. M. Atkinson identified the value of orthogonal persistence at the VLDB conference in 1978. A 1978 paper by F. King “IBM Report on the Content of a Sample of Programs Surveyed” found that 35% of code size of the database programs surveyed was due to mapping between program representations of data and database representations of data. The first orthogonally persistent language, PS-algol, was built in 1980.

 

Being perfectly independent of class, orthogonal persistence applies to language system classes. A thread object can have long-term persistence, which is quite unusual. A computation in progress can be persisted to disk and resumed later. Class objects can persisted to a store, so code definition is persisted in the same way as data objects. The significance of uniform rights to persistence is commented on by Atkinson http://research.sun.com/forest/COM.Sun.Labs.Forest.doc.pjama_review.abs.html

“Where there are even a few classes whose instances are not given the same rights to persistence, application programmers have a remarkable propensity to trip over them. The most notorious problems arise with GUI classes (mainly in the java.awt class library) which prove popular and then entrap programmers into struggling with how to avoid making them persistent and how to build their own abstraction of the interface’s state that they can preserve and reconstitute.”

 

JDO is not orthogonally persistent, but goes a lot further toward independence of persistence than any other standardized Java approach. Persistence independence across application classes is a good start. This level of persistence independence has been termed transparent persistence. I think it a safe prediction that uptake of JDO will heighten interest in orthogonal persistence. The desire to move from a safe 20% reduction in project costs to a safe 30% reduction will see to it.

 

The Orthogonal Persistence for Java specification was proposed as Java Specification Request 20 http://www.jcp.org/en/jsr/detail?id=20 and rejected in August of 1999.

 

There was no compelling technology to support automated concurrency, but a direction for Automated Object Locking for Persistent Programming Languages has promise http://research.sun.com/features/tenyears/volcd/papers/intros/I23Czaj.pdf.

 

PJama is an OPJ implementation http://research.sun.com/forest/COM.Sun.Labs.Forest.doc.pjama_review.abs.html.

 

The Charm operating system offers efficiencies in implementation of persistence http://citeseer.nj.nec.com/dearle00operating.html

 

Automated concurrency is not with us yet and won’t be for quite a while. JDO’s explicit transactions will do for now. There is no attempt to reconcile database concurrency handling mechanisms with program concurrency handling mechanisms.

 

JDO

 

Discussions of object persistence assume use of domain objects and that the question is how best to persist them. If a procedural rather than domain object approach is used as is too often the case in Java, the discussion is moot.   

 

The JDO standard makes possible the direct expression of a domain object model to standard persistence services.

 

An application object is persisted by a statement such as

persistenceManager.makePersistent(applicationObject);

and then only when the need for persistence must be explicitly stated.

Any scheme for persistence of objects must state which objects are to be persisted and which are to remain transient, and an invocation of makePersistent() is par for the course. This method does not have to be invoked on all objects that are to be persistent. Any object which is reachable from a persistent object via persistent references will also be made persistent. Reachability is recursive proposition.

 

When a reference to a persistent object is available, the persistent objects it has persistent references to are available for use without any application programming for database activity. It appears that entire object graph is in the JVM, rather like a virtual address space giving the appearance that everything is in RAM.

 

There is no application use of a data manipulation language.

 

The Java Data Objects specification was developed under by JSR 12 http://www.jcp.org/en/jsr/detail?id=12

 

In the JDO API, there are six interfaces, two classes and nine exceptions. These counts are quite low for an API. It will become rare that you have to lookup a method in the javadoc.

The interfaces are:

PersistenceManagerFactory;

PersistenceManager;

Transaction;

InstanceCallbacks;

Extent;

Query.

The classes are:

JDOHelper;

I18NHelper.

 

Here is sample code for standalone JDO:

 

PersistenceManagerFactory pmf = JDOHelper.getPersistenceManagerFactory(properties);

 

PersistenceManager pm = pmf.getPersistenceManager();

Transaction transaction = pm.currentTransaction();

transaction.begin();

 

pm.makePersistent(appObj1);

pm.makePersistent(appObj2);

pm.makePersistent(appObj3);

 

transaction.commit();

persistenceManager.close();

 

 

A PersistenceManager has at most one active transaction. A PersistenceManager’s transaction can be sequentially reused.

 

Persistence descriptor

 

Application objects are not coded differently to be persistence capable. Any changes needed for JDO are implemented by bytecode enhancement of the compiled application classes. A persistence descriptor provides JDO definitions that direct the enhancement. The convention is that the persistence descriptor has the name of the class or package followed by “.jdo”. The package form is neater.

 

The enhancer adds the PersistenceCapable interface and JDO implementation methods to classes mentioned in the persistence descriptor. While it is possible to do this manually it is not a profitable activity. Application code must not use the methods of the PersistenceCapable interface. Perhaps checks that this is so will be introduced. JDO implementations work through PersistenceCapable interface.

 

Unlike RDBMSs, ODBMSs do not require the generation of DDL during enhancement.

 

One attribute of particular interest in the class element is identity-type that may have a value of application, datastore or nondurable. An application identity uses values of object state to form a unique identity. This is like a primary key of a relational table. A datastore identity is assigned by the datastore and is not based on field values in the object. This is something like a generated sequence field. Nondurable specifies that persistent objects are not uniquely identifiable, which is a performance option as the maintenance of JDO identity has a cost. Such a persistent object may be represented by multiple instances in the JVM as a result of multiple retrievals of the same object. Updating through only one of those representations is permitted in the transactional context.

 

The field element has three attributes of particular interest, those being persistence-modifier, default-fetch-group and embedded.

 

Persistence-modifier defaults to persistent for all fields in a persistent class that are not final, static or transient. By overriding away from the default for transient fields of none to persistent, what objects are persisted by JDO may be varied independently of what object are subject to standard Java serialization. A value of transactional makes the field subject to rollback without being persistent, and a value of none means that the field is not managed by JDO.

 

Default-fetch-group takes a boolean value and defines whether the object(s) referred to by the field are retrieved when the object having the field is retrieved.

 

Embedded takes a boolean value and defines whether objects referred to by the field behave as a second class object, that is one that does not have a JDO identity but makes the first class object in which it is stored aware of a change to its state. If the SCO is changed, the FCO is flagged dirty. This attribute is a hint to the implementation which may ignore it by storing the field as FCO where that is possible.

 

The embedded attribute of the collection, map and array elements defines whether the objects contained in the collection, map or array are also second class. Note that a class may be enhanced as PersistenceCapable but objects of that class may function as first or second class depending on the context specified by the embedded parameter.

 

The default values for default-fetch-group and embedded for collections are false. Had this not been the case for default-fetch-group, large parts of the persistent object graph could have been recursively retrieved without that being fully anticipated, and had it not been the case for embedded large parts of the persistent object graph might have been stored as one object.

 

Life cycle states

 

There are 10 possible life cycle states:

This one is only possible if the implementation supports management of nontransactional instances: 

These two are only possible if the implementation supports transient transactional instances:

 

Note that OPJ does not have states beyond those of the Java Language Specification.

 

Callbacks

 

The InstanceCallbacks interface can be implemented by persistence capable classes.

 

jdoPostLoad() – this is invoked after the default fetch group has been retrieved. This is the place to store values in non-persistent fields that are derived from persistent fields in the default fetch group. Other persistence instances cannot be accessed from here.

jdoPreStore() – this is invoked before values in the instance are stored to the datastore. This is the place to modify persistent fields based on changed transient values.

jdoPreClear() – invoked before values are cleared in the transition to hollow. Non-transactional values should be set to defaults here.

jdoPreDelete() – last opportunity for processing before the instance is deleted.

 

Also, a javax.jdo.Transaction object may be associated with an object which implements the javax.jdo.Synchronization interface through which the implementing object is given the opportunity to do processing before and after the transaction completes successfully or otherwise.

 

Cache management

 

Methods exist to influence a persistence manager’s cache.

A single object, collection, array or the entire cache may be the subject of these methods.

 

Query Language

 

Direct program navigation of object references and identities could be used for all retrieval but to facilitate usage of retrieval patterns JDOQL is introduced. Also, JDOQL permits greater retrieval performance than direct program reference navigation.

 

It is vital for performance that JDO implementations do not instantiate application objects during query execution. There may be an enormous number of objects which have their state examined during query filter evaluation, and instantiation of these objects from their datastore state is a very significant performance cost. Invocation of application methods is not supported in standard JDOQL (it is a “non-standard extension”), and it is this restriction that permits application objects not to be instantiated during query execution. Where an object is already in the cache there is no instantiation penalty to pay. The standard JDOQL filtering methods are:

isEmpty();       // null singleton or collection or empty collection.     

contains();      // collection

startsWith();   // String matching

endsWith();    // String matching

More string matching JDOQL methods may be expected in later versions of the standard. Prohibition of application method invocation by standard JDOQL implementations means that JDOQL cannot be regarded as a full Object Query Language, but the optimisation possible because of this restriction is too good to pass up. The Object DBMSs which had full Object Query Languages were unable to get near Relational DBMSs in query performance, and this was a major reason for the lack of market penetration by Object DBMSs. Merely translating JDOQL to OQL won’t do.

 

The JDOQL filtering methods supplement field value equality and ordering operators. The methods and operators are combined using boolean operators. Navigation is specified by the dereference operator “.”. Where the field being dereferenced is null, boolean operations return false.

 

There is a parameter on Query execution that indicates whether the cache is to be searched in fulfilment of the query. That is, whether uncommitted changes of a transaction are visible to queries issued by that transaction. This does reference already instantiated objects during query execution, which is quite a different matter than instantiating objects for the query execution. Updating transactions should select the option that causes the cache to be searched. Read-only transactions need not elect to have the cache searched as there are no changes to be found there.

 

Alternative query languages (and query interfaces) are allowed by the spec, which is rather permissive. Some vendors are exposing SQL as a query language and others are compiling JDOQL to SQL which seems a more satisfactory approach. 

 

How to assess whether JDOQL is a good query language or not?

Generality of selection operations over a graph of objects seems like a reasonable measure. It is likely that the restriction that only objects of the given collection may be selected will be lifted in the next version of JDOQL by a proposal that allows selection of objects that are related to the given collection by what is termed projection which is basically selection operations over a graph. Graph walking is used for selection and not just filtering.   

 

Extents refer to all persistent instances of a given class and optionally to its subclasses. Inclusion of subclass instances is made optional for the sake of RDBMS performance. However it doesn’t make sense in terms of the object model not to retrieve subclass instances as well. There is no filtering available with extents, so unless all persistence instances of a given class or class hierarchy must be referenced performance can be bettered by using queries.

 

J2EE

 

JDO and J2EE have different transaction services. The transaction object in JDO is javax.jdo.Transaction and in J2EE it is javax.transaction.UserTransaction . It is the javax.jdo.Transaction object which is returned by the persistenceManager.currentTransaction() method.

 

Where Container Managed Transaction is defined for an EJB component, transaction management methods (begin(), commit(), rollback()) cannot be issued by the component. Where Bean Managed Transaction is defined for a session or message-driven EJB, either JDO or J2EE transactions can be used. A JDO transaction in an EJB container has an associated J2EE transaction. When BMT is in effect, whether a J2EE transaction is established before a persistence manager is obtained determines which transaction object can be used and whether JDO synchronizes on J2EE or J2EE on JDO. If no J2EE transaction is established before the persistence manager is obtained, the JDO transaction object can be used and otherwise the J2EE transaction object must be used.

Each EJB business method should obtain its persistence manager from the persistence manager factory and close the persistence manager before returning. This is not strictly necessary for stateful session beans under BMT, but the implied keeping transactions open over user interaction is a dubious practice.

 

A PersistenceManagerFactory can be obtained via JNDI when EJB context is set. For contrast, in servlets the persistence manager factory is obtained in the init method. A PersistenceManagerFactory may pool PersistenceManagers.

 

There is some talk of making Java Data Objects responsive to the reception of asynchronous messages via JMS. It will be interesting to see how much of EJB is usurped by JDO in time.    

 

Datastores

 

The underlying persistence technology can be any transactional store:

 

Creation or update of large numbers of persistent objects in a transaction is a good first test of a RDBMS JDO implementation. RDBMS JDO implementations that use JDBC statement batching will demonstrate better insert and update performance.

 

Marketing of ODBMSs against RDBMSs during the nineties largely failed. The situation now is that the idea (if not the practice) of object orientation is prevalent, Java is entrenched, there is a Java standard for transparent object persistence (JDO) and crucially RDBMSs may be used as JDO datastores at some performance level. To the extent of JDO take-up at a given time ODBMSs can compete against RDBMSs for market share on performance and not against inertia. For an indication of performance improvement due to ODBMS use in an application server environment see http://www.versant.com/resources/white_papers/WP_Benchmark-011022r4-W.pdf .

 

Independence from underlying technology is a boon for application design and programming. However, the performance characteristics of these different persistence technologies vary and do so to the extent that acceptable performance for a given application may not be had from all persistence technologies in the list. An ODBMS that does not instantiate objects to process queries has far more performance potential than other technologies. To avoid instantiating objects during JDOQL processing, an ODBMS vendor must not simply translate JDOQL to its OQL.

 

Implementations

 

JDO is aimed at J2ME as well as J2SE and J2EE, and this is the reason for so much being optional. However, JDO implementations for other than the J2ME environment are rapidly moving toward supporting all options. 

 

JDO applications are binary compatible. A vendor might supply application binary bundled with or built for a given JDO implementation, but without recompilation a different JDO implementation can be used. The trend toward implementing all JDO options assists portability.

 

That different JDO implementations can be used in the same transaction allows flexibility of technical architecture. Not only can different object classes be supported by different JDO implementations and accessed within the same transaction, different objects within the same class can be supported by different JDO implementations and accessed within the same transaction. However, there is no standardisation yet of persistence reference from persistent objects backed by one JDO implementation to persistent objects backed by other JDO implementations.

 

It may be a couple of years before there are robust free JDO implementations. Don’t wait – the prices for commercial implementations for non-J2EE environment are quite okay. To keep for costs down when developing for the J2EE environment, you might consider schemes in which all but one person (the “Integrator”) in the project uses non-J2EE versions to develop modules of JDO code that the Integrator then builds and tests for the J2EE environment. Scripts might automate the code changes between non-J2EE and J2EE environments.

 

A duly diligent procedure for evaluation of JDO implementations: 

  1. Model domain objects.
  2. Collect domain object relationship frequencies.
  3. Design major algorithms.
  4. Write JDO benchmark specific to your application on one JDO implementation. The current version (1.0) of the reference implementation is not usable.
  5. Run benchmark on evaluation copies of all JDO implementations.

 

Next version of standard

 

Unresolved issues were deferred to the next version of the JDO standard. A list of the more major issues:

Other suggestions:

 

No JSR has been created for work on the next version of JDO so don’t wait for the next version. You can have that 20% development cost reduction now.

 

Non-transparent persistence technologies for Java

 

Don’t use JDO as a persistence mechanism for entity EJBs. Entity EJBs will still not be transparently persistent after being implemented using JDO. Also, the Java Data Objects are no longer representative of the application domain.

 

It needs to be made plain that there are also message driven and session EJBs in the EJB standard which have nothing to do with persistence, and that the EJB standard is but one part of the J2EE standard.

 

Robin Roos sums up entity EJBs rather well in his book “Java Data Objects”:

entity EJB have “capability for gross inefficiency when manipulating large datasets” and “entity beans have regularly failed to meet applications requirements for object persistence”.

 

Expect performance of home-grown object-relational mapping via JDBC to be bettered by a good RDBMS JDO implementation even when the home-grown code includes an object cache. The procedural relational approach is likely to be faster as there is no object-relational mismatch. However, the assumption is that object orientation is a good thing.

 

It is time to lay JDBC aside when the programming is object oriented. JDO take up will be huge. It gives significant development speed improvement without reducing execution performance when the programming is object oriented.

 

Standard transparent persistence for Java is the biggest productivity improvement for enterprise applications since Java itself.

 

Costs

 

20% less time to develop enterprise applications is anticipated. Obviously, a software development house just breaking even prior to JDO can make a 20% profit with JDO. JDO is of fairly general applicability to enterprise systems or other systems requiring persistence of complex object models.

 

Some estimates put coding time spent writing non-functional persistence infrastructure at 50%. With JDO, there is no coding for non-functional persistence infrastructure.

 

My prediction of average reduction of software development costs for large long-lived systems (enterprise applications):

This progression to orthogonal persistence with automated concurrency could easily take 20 years.

 

Domain Object Modeling

 

JDO in no way reduces the need for domain object modeling. JDO makes possible direct expression of a domain object model.

 

Object models that were devised pre-JDO are likely to have persistence infrastructure designed into them. Such object models should be revised so that persistence infrastructure is factored out.

 

Cognitive Machines Pty Ltd does domain object modeling.

www.cognitivemachines.com

 

john.sharley@cognitivemachines.com (your browser may not be configured to handle mailto: links correctly. If so, just copy and paste the email address into your usual email program).