Early Persistence Models
For
supporting persistent objects, the simplest model is to allow all data to
persist: save all data when the program terminates, and reload the data when
the program runs again. This model is called core dumping [2.24], adopted
by Smalltalk [2.39] and some Lisp systems [2.40].
An efficient solution is to introduce some semantics
designating persistent objects. E [2.32], an extension of C++, introduces a new
storage class, the persistent storage class, to declare objects as persistent.
E designates persistence by storage class: an object is persistent if it has a
persistent storage class. Objectivity/C++ [2.26] designates persistence by data
type: an object is persistent if it belongs to a persistent class, a class
inheriting from a predefined class ooObj. Sometimes, the persistence of
Objectivity/C++ is called Persistence By Inheritance (PBI). In the two models,
persistence is a property dependent of data/object type, resulting in a limited
form of persistence. In the first model, objects with no persistent storage
class cannot persist. In Objectivity/C++, objects not belonging to persistent
classes cannot persist.
Besides
the models above, Shore [2.28] introduces a naming mechanism to designate
persistence so that an object is persistent if it is registered into a
namespace.
Orthogonal Persistence
For supporting unlimited persistence, the concept of
orthogonal persistence emerges, which treats persistence as a property
independent of object type. This concept introduces three principles [2.6] as
the following:
n
Persistence Independence:
a persistence model should introduce a uniform manner of the creation and
manipulation of objects irrespective of their lifetimes.
n
Data Type Orthogonality: this
uniform manner should be independent of data type, i.e., it applies to all data
types.
n
Persistence Identification:
the mechanism of identifying persistent objects should not be related to the
type system, the storage allocation or any naming methods.
These
principles are well supported by a persistence model named Persistence By
Reachability (PBR), which is the most popular model. In PBR, an object is
persistent if and only if it is reachable from a persistent root. A persistent
root is a distinguished, named persistent object, from which a garbage
collector can discover all reachable objects. Tightly relying on the garbage
collection mechanism, a PBR system can obtain all reachability information and
then decide and maintain persistent objects. In PBR, all objects can
potentially be persistent objects. The drawback is that PBR tightly relies on
garbage collection and requires special compiler cooperation. Therefore, PBR is
not suited for C/C++.
Besides PBR, there are some other persistence models
offering orthogonal persistence to some extent. For example,
ObjectStore, Mneme, and Texas [2.13,2.35,2.36] introduce some persistent
variants of the storage allocator to allocate persistent objects on a special
heap. This model is sometimes called Persistence By Storage (PBS), which designates
persistence by declaring the storage location: an object is persistent if it is
created in a persistent heap using the persistent storage allocator.
Compared with PBR, PBS reduces the principle of Persistence
Independence since it need explicitly declare persistence. Sometimes, the
method of explicitly declaring persistence is called Persistence By Declaration
(PBD), and PBS can be regarded as a kind of PBD. By explicitly declaring
persistence, PBS need not discover reachability, leading to lightweight and
easy implementation.
Persistence Transparency
To
access a persistent object, the programmer need reference an object identifier
(OID), which play a role of a pointer. An OID can have the same
size of a pointer [2.8,], or a size larger than the size of a pointer, even an
arbitrary size [2.41]. The core work of a persistent system is to translate a
reference on an OID into a reference on the memory address of the object
specified with the OID. It can be done eagerly: translate all OID
references once a program runs [], or perform the translation on demand [],
i.e., when an OID is dereferenced.
Multiple approaches are used to translate OID
dereferences at run-time.
For
an interpreted language like Smalltalk, the interpreter can detect an OID
dereference, and use a particular way like a hash table [] to translate the OID.
Some systems perform translation on each dereference. For optimizing
performance, the first time an OID is translated, some systems use the
obtained memory address to overwrite the OID so that all following
dereferences are pointer dereferences. This approach is called pointer
swizzling [].
For
a compiled language, the persistent system can detect an OID dereference
and perform pointer swizzling using page faults [] or some line code [].
However, pointer swizzling is a heavyweight
approach, since it requires a deswizzling translation table [] to keep track of
all swizzled pointers. When resident objects are saved, the translation table
can ensure swizzled pointers to be deswizzled, i.e., translated into the
corresponding OIDs. The translation table can be very large [], and
requires complicated algorithms to ensure correct deswizzling especially when
some swizzled pointers are changed []. Moreover, the fault-based pointer
swizzling relies on special hardware mechanisms and operating system (OS)
supports [], cannot be afforded by low-cost applications.
For supporting lightweight persistence, some systems []
employ smart pointers, i.e., objects belonging to some C++ template classes
that overload the operator->(). All OIDs are encapsulated in smart
pointers and the programmer reference OIDs through the operator->().
This operator performs the translation on each dereference. In some measure,
this approach reduces the principle of Persistence Independence.
However, this approach is easy to implement.