Early Persistence Models

For supporting persistent objects, the simplest model is to allow all data to persist: save all data when the program terminates, and reload the data when the program runs again. This model is called core dumping [2.24], adopted by Smalltalk [2.39] and some Lisp systems [2.40].

 

An efficient solution is to introduce some semantics designating persistent objects. E [2.32], an extension of C++, introduces a new storage class, the persistent storage class, to declare objects as persistent. E designates persistence by storage class: an object is persistent if it has a persistent storage class. Objectivity/C++ [2.26] designates persistence by data type: an object is persistent if it belongs to a persistent class, a class inheriting from a predefined class ooObj. Sometimes, the persistence of Objectivity/C++ is called Persistence By Inheritance (PBI). In the two models, persistence is a property dependent of data/object type, resulting in a limited form of persistence. In the first model, objects with no persistent storage class cannot persist. In Objectivity/C++, objects not belonging to persistent classes cannot persist.

 

Besides the models above, Shore [2.28] introduces a naming mechanism to designate persistence so that an object is persistent if it is registered into a namespace.

 

Orthogonal Persistence

For supporting unlimited persistence, the concept of orthogonal persistence emerges, which treats persistence as a property independent of object type. This concept introduces three principles [2.6] as the following:

 

n         Persistence Independence: a persistence model should introduce a uniform manner of the creation and manipulation of objects irrespective of their lifetimes.

n         Data Type Orthogonality: this uniform manner should be independent of data type, i.e., it applies to all data types.

n         Persistence Identification: the mechanism of identifying persistent objects should not be related to the type system, the storage allocation or any naming methods.

 

These principles are well supported by a persistence model named Persistence By Reachability (PBR), which is the most popular model. In PBR, an object is persistent if and only if it is reachable from a persistent root. A persistent root is a distinguished, named persistent object, from which a garbage collector can discover all reachable objects. Tightly relying on the garbage collection mechanism, a PBR system can obtain all reachability information and then decide and maintain persistent objects. In PBR, all objects can potentially be persistent objects. The drawback is that PBR tightly relies on garbage collection and requires special compiler cooperation. Therefore, PBR is not suited for C/C++.

 

Besides PBR, there are some other persistence models offering orthogonal persistence to some extent. For example, ObjectStore, Mneme, and Texas [2.13,2.35,2.36] introduce some persistent variants of the storage allocator to allocate persistent objects on a special heap. This model is sometimes called Persistence By Storage (PBS), which designates persistence by declaring the storage location: an object is persistent if it is created in a persistent heap using the persistent storage allocator.

 

Compared with PBR, PBS reduces the principle of Persistence Independence since it need explicitly declare persistence. Sometimes, the method of explicitly declaring persistence is called Persistence By Declaration (PBD), and PBS can be regarded as a kind of PBD. By explicitly declaring persistence, PBS need not discover reachability, leading to lightweight and easy implementation.

 

Persistence Transparency

To access a persistent object, the programmer need reference an object identifier (OID), which play a role of a pointer. An OID can have the same size of a pointer [2.8,], or a size larger than the size of a pointer, even an arbitrary size [2.41]. The core work of a persistent system is to translate a reference on an OID into a reference on the memory address of the object specified with the OID. It can be done eagerly: translate all OID references once a program runs [], or perform the translation on demand [], i.e., when an OID is dereferenced.

 

Multiple approaches are used to translate OID dereferences at run-time.

 

For an interpreted language like Smalltalk, the interpreter can detect an OID dereference, and use a particular way like a hash table [] to translate the OID. Some systems perform translation on each dereference. For optimizing performance, the first time an OID is translated, some systems use the obtained memory address to overwrite the OID so that all following dereferences are pointer dereferences. This approach is called pointer swizzling [].

 

For a compiled language, the persistent system can detect an OID dereference and perform pointer swizzling using page faults [] or some line code [].

 

However, pointer swizzling is a heavyweight approach, since it requires a deswizzling translation table [] to keep track of all swizzled pointers. When resident objects are saved, the translation table can ensure swizzled pointers to be deswizzled, i.e., translated into the corresponding OIDs. The translation table can be very large [], and requires complicated algorithms to ensure correct deswizzling especially when some swizzled pointers are changed []. Moreover, the fault-based pointer swizzling relies on special hardware mechanisms and operating system (OS) supports [], cannot be afforded by low-cost applications.

 

For supporting lightweight persistence, some systems [] employ smart pointers, i.e., objects belonging to some C++ template classes that overload the operator->(). All OIDs are encapsulated in smart pointers and the programmer reference OIDs through the operator->(). This operator performs the translation on each dereference. In some measure, this approach reduces the principle of Persistence Independence. However, this approach is easy to implement.