DMA Content Model

Introduction

The DMA content model defines the properties, object classes, and interfaces that enable a DMA compliant application to access the content of a document, associate new content, modify content, and delete content from a document.

In this description, we refer to content as the information that can be used to produce one or more renderings of a document. For example, word processor and other office document files, image files, and HTML files are all considered content in DMA.

The DMA content model has been designed with the following characteristics:

In addition to the primary document class (DocVersion), two classes of objects are used to represent document content in DMA, Renditions and Content Elements.

DocVersion

A DocVersion object is used to capture all the properties and sub-ordinate objects (renditions and content elements) to represent a single version of a document.

Rendition

A Rendition object is used to capture the properties and group the components required to produce one rendering of the document. In essence, renditions encapsulate the variations in content format. For example, one rendition could capture the information and file components for editing a document in Word, while another rendition could have a postscript file of the document for printing purposes.

Content Element

A Content Element object is used to capture the properties and provide access to a single content component. In essence, content elements encapsulate variations in storage and access to content data of a document. A content element would typically represent a file or a reference to a file for a particular component. Two sub-classes of Content Element, ContentTransfer and ContentReference are defined to represent content that is captured by a document space or maintained as a reference.

Modeling Document Content in DMA

The following example illustrates how a complex document consisting of a number of multiple component renditions can be represented in DMA.

Assume the following scenario:

A possible scenario for a DMA compliant application to create the document described above would be as shown below:

  1. Create a document object, either DocVersion or an appropriate sub-class of DocVersion.
  2. Create a Rendition object for the Word format of the document.
  3. Create a ContentTransfer object for each ".doc" file in the document . Assuming a document space that manages content, these files will be copied to the document space. ContentReference objects can be created for any external references that need to be maintained.
  4. Make the document and its content persistent in the document space.

The above document can now be made available for different individuals to update during the revision cycle.

The following could be done when it is time to publish the completed document.

  1. Create a Rendition object for the HTML format of the document.
  2. Generate the HTML files from the ".doc" files.
  3. Create a ContentTransfer object for each file generated. ContentReference objects may need to be created to maintain any external references to content present in the Word rendition of the document.
  4. Create a Rendition object for the PDF format of the document.
  5. Create the PDF file from the ".doc" files (assume one PDF for the whole document).
  6. Create a ContentTransfer object for the PDF file.
  7. Add the PDF and HTML renditions to the document and make them persistent in the document space.

The diagram below shows the DMA objects that might make up such a complex document.

Content Creation

A DMA client application’s first step in making document content persistent in a document space is to create a DocVersion object.

DocVersion Creation

As stated in the object model section, independently persistent objects (such as Doc Versions) must always be created from the document space in which the object will be persisted. Therefore a DMA client can invoke one of the following two methods to instantiate a new DocVersion object from the document space.

  1. IdmaObjectFactory::CreateObject () method on the document space, or
  2. IdmaClassDescsription::CreateInstance () method on the class description object for a DocVersion, obtained from the document space in which the object is to be created.

Successful invocation of one of the above methods will result in a scratchpad DocVersion object being made available to the client application.

At this stage, the client will most likely set various property values on the new document object using methods on the IdmaEditProperties interface. The client then needs to create the necessary Rendition and Content Element objects in order to associate the content data for the document.

A subsequent call to either ExecuteChange () or ExecuteChanges () will make the scratchpad DocVersion object and any of its sub-objects persistent in the document space. (The transfer of content to the document space will occur at this time).

Note: In general, DocVersion objects are considered independently persistent. However, this is not mandatory and therefore DocVersion objects may be dependently persistent on another independently persistent object. It should be noted that Rendition and Content Element objects are always dependently persistent, typically on a DocVersion object. In this discussion, DocVersion objects are assumed independently persistent.

Rendition Creation

The DMA client application’s next step in creating content for a document is to create the Rendition objects of the appropriate format to hold the content data. Renditions are objects of class dmaClass_Rendition or one of its sub-classes.

If a document space provides support for document content, then it must support the two content related properties, dmaProp_RenditionsPresent and dmaProp_Renditions, on dmaClass_DocVersion and all its sub-classes.

The dmaProp_Renditions property on a DocVersion is a list of Rendition objects. The client application can access this list the same as any other DMA "list of object" property. When a client application initially accesses this property on a new DocVersion it is supplied an empty list of objects. The client application would then construct as many Renditions that are required and insert them into this list using the IdmaEditListOfObject::InsertObject method.

Rendition objects can be created using one of the following methods.

  1. IdmaObjectFactory::CreateObject () method on the document space, or
  2. IdmaClassDescsription::CreateInstance () method on the class description object for a Rendition, obtained from the document space in which the object is to be created.
Note: Since Rendition objects are dependently persistent objects, they could be created from another source other than the document space in which the document is being persisted. However, a document space is not required to accept such a Rendition object if it does not conform to the class description for Rendition objects in that particular document space.

Once a Rendition object is created, the DMA client application would set the dmaProp_RenditionType and other properties as appropriate on the Rendition. Typically, this would include creating the appropriate Content Element objects and inserting them into the Rendition object’s dmaProp_ContentElements list. After these operations are completed, the client application should have fully constructed Renditions with associated Content Elements; these Renditions can then be inserted into the DocVersion.

Content Element Creation

Before a DMA client application creates any content elements, it must determine what form of content capture is supported by the document space.

Determining the form of content capture

A document space supports content capture in at least one of two forms. Support for capture and management of content within the document space is provided via the class dmaClass_ContentTransfer, while support for maintaining references to content outside the control of the document space is provided via the class dmaClass_ContentReference.

A document space is allowed to support both forms of content capture. In this case, it is possible that a rendition may have a mixture of content data that is captured in the document space via a content transfer object, and some external references stored as content reference objects.

In order to determine what forms of content capture are supported by a document space, a client can either check the capabilities of the document space (via its dmaProp_DocSpaceCapabilities value) or detect the presence of the above mentioned classes in the metadata for the document space. Based on this information and the form of the content data, the client application can decide which type of content element to create.

Creating ContentTransfer objects

The client application would use either the CreateObject or CreateInstance method to create a new dmaClass_ContentTransfer object. It would then set dmaProp_ComponentType to some appropriate value in order to indicate what type of content element this is. Valid values for this property are dependent on the Rendition to which the content element belongs; DMA does not define these values. The client may also set a value for the dmaProp_RetrievalName property. Typically, this value is an URL format resource name where the content data currently exists. This property value could be used on subsequent access to the document content, to determine the original content resource name and/or the file name suffix possibly to launch an appropriate application.

After creating the content transfer object and setting its properties, the client then needs to determine the mode in which content will be supplied to the document space. Content capture and access can be done via Streams or Resource names in URL format. The IdmaContentTransfer interface provides all these methods.

To supply content as a stream the client application would create a stream on the content and invoke the method SetCaptureStream () on the content transfer object. All document spaces must support this method of content capture. Alternatively, if supported, the client could invoke the method SetCaptureResource () to identify content as a resource name in the form of an URL. Local file names – a common way of identifying document content, are made into URLs by prefixing the full path name with "file://".

Creating ContentReference objects

If content is supplied as a reference, the client application uses either the CreateObject or CreateInstance method to create a new dmaClass_ContentReference object. In addition to setting the dmaProp_ComponentType as discussed above, the client must set a value for the dmaProp_ContentLocation property. This value should be the resource name for the content data, in URL format. The document space may validate the URL, but does not need to provide any guarantees regarding the existence or accessibility of the content referenced by the resource name.

Making Content Persistent

Once the appropriate content elements are created, the client application would insert these objects into the Rendition object’s dmaProp_ContentElements list property as discussed in the previous section. The document space will capture and transfer the content data as appropriate when the parent DocVersion is made persistent.

Note: The client application must ensure that the resource name or stream supplied to a content transfer object exists until the DocVersion has been made persistent.

Content Access and Modification

The DMA content model provides two features that enable efficient content access and modification. First, Renditions and Content Element objects are bound to scratchpad objects only upon access to those specific objects due to the late binding rule. Second, actual document content is transferred to the client only upon invoking methods on the IdmaContentTransfer interface.

Locating and Accessing DocVersions

A DMA client can use either navigation (typically via containers), connection via OIID, or query to locate DocVersion objects in a document space.

The dmaProp_RenditionsPresent property is a system-generated list that reflects the rendition types that exist on a DocVersion at a given time. If a document space makes this property searchable, then it can be used to detect renditions that exist for a document via a DMA query. However, this property value and the dmaProp_Renditions list on the DocVersion are not kept synchronized in a scratchpad object. Therefore, it is not safe to assume that the Rendition object at a particular index in the dmaProp_Renditions list will have rendition type matching that of the corresponding index position in the dmaProp_RenditionsPresent list. The same rule applies to the dmaProp_ContentElementsPresent and dmaProp_ContentElements in the Rendition object.

Once a DocVersion object is located the client application would invoke the IdmaDocSpace::ConnectObject () method to connect to the document and obtain a scratchpad copy of the persistent document.

Accessing Renditions and Content Elements

To access specific rendition and content element objects on a document a DMA client would use methods on the IdmaListOfObject interface on the dmaProp_Renditions and dmaProp_ContentElements list properties on the DocVersion and Rendition objects respectively.

Accessing Content Data

Once the client application has obtained a scratchpad copy of a specific content element, it can use an IsOfClass () test to determine what type of content element (i.e. content transfer or content reference) was originally stored. If it is a content reference object, the client is responsible for accessing the content data directly by using the value of dmaProp_ContentLocation. Alternatively, if it is a content transfer object, the client can then use one of a number ways to access content data as supported by the methods on the IdmaContentTransfer interface.

Delivering content data as a Stream must be supported by all document spaces that support content. The client would use the GetStream () method to obtain a stream to the content stored in the document space.

Optionally, a document space may choose to support one or both of the following means of delivering content.

Modifying Content

In order to modify content, a client first needs to access the document and the relevant Rendition and Content Element objects as described above. Then the client application would use methods on the IdmaEditListOfObject interface to insert, replace, or delete objects in the Renditions or ContentElements list properties. Document space policy would dictate the type of modifications allowed on content objects and data.

Some document spaces may support direct replacement of existing content element data, by allowing SetCaptureStream () or SetCaptureResource () on already persisted objects.

Deleting Content

To delete document content or the document itself, the client would follow steps similar to those for modifying content.

If Renditions or Content Elements are deleted, the document space will delete the relevant objects from the persistent store for the particular document. If the DocVersion is deleted, then the document space will delete all persistent information about the document, including all its renditions and content elements. The DMA content model does not provide for sharing of content components among multiple objects, therefore if a document space has such a feature, it must not expose that fact to the DMA client application.

Note: The document space will not follow and delete the references supplied in ContentReference objects. In addition, a document space is not required to reclaim any storage at the time of deletion of content.

Content and the DMA Object Model

The following section discusses the classes, properties and interfaces defined for the content model. As stated previously, supporting content is optional for a document space. In addition to being indicated via the capabilities mechanism, the presence of the relevant classes, properties and interfaces indicates support for content by a document space.

The inheritance hierarchy of content classes is shown in the following diagram:

DocVersion

The document version object is the base class for all document objects in DMA. This object contains properties about a document, as well as other sub-objects that represent content for the document. A document version object can participate in versioning operations as defined by the DMA versioning model.

A document version object can have one or more objects that represent different renderings (Renditions) of the document (for example, word-processor document files and the PostScript form of the document). The content of a document version is not associated directly, but rather through its renditions. Note: It is not mandatory for document version objects to have content.

The DocVersion class inherits from dmaClass_Versionable, and introduces the two properties, dmaProp_RenditionsPresent and dmaProp_Renditions, on the DocVersion class.

The dmaProp_Renditions property is a list of Rendition objects and can be accessed using DMA list access methods.

The dmaProp_RenditionsPresent property is a list of renditions present for the document. This list can be used to determine what renditions exist on a document either via a query or by inspecting the property on a specific document version.

Rendition

A Rendition manages all of the content elements associated with a particular rendering or representation of a document.

The class dmaClass_Rendition inherits from the base class dmaClass_DMA and introduces the dmaProp_RenditionType and dmaProp_ContentElementsPresent properties.

The dmaProp_RenditionType is a string property that differentiates the renditions of a document. These string values have the following syntax:

<RenditionType_Space>::<Typename>
The syntax and semantics of the Typename is specific to the rendition type namespace, RenditionType_Space. DMA 1.0 defines one RenditionType_Space, ‘MIME’, whose values are legal MIME types as defined by the IANA.
Examples values for RenditionType in the MIME namespace are: ‘MIME::application/msword’ or ‘MIME::text/plain’.
Other rendition type namespaces may be defined in future versions of DMA.

The dmaProp_ContentElements property is a list of content elements on the rendition and can be accessed using DMA list of object methods.

The dmaProp_ContentElementsPresent property is a list of the component types of the content elements on the rendition and can be accessed using DMA list of string methods.

Due to the DMA content model being neutral to content format, the Rendition class has a minimal set of properties. However, it is anticipated that DMA implementations would sub-class Renditions to provide content format specific properties and behavior. For example, renditions that support the specific characteristics of Word, HTML and TIFF may be defined.

Content Element

This abstract base class represents an elementary content component. This class introduces the dmaProp_ComponentType property inherited by the other content element sub-classes.

The dmaProp_ComponentType is a string property that differentiates the content elements of a rendition. The syntax and semantics of this property is specified by the class of rendition associated with this content element; therefore, it is not defined in the DMA specification.

Content Transfer

This is a sub-class of content element that provides capture and access to document content managed within a document space.

The string property dmaProp_RetrievalName introduced by this class captures either the full name or suffix that the client would like the content to have upon retrieval for viewing or editing. This can be useful when retrieving content to determine the location to place the content and/or to determine the application to launch to handle the content (based on file extension).

Objects of class Content Transfer must support the IdmaContentTransfer interface.

IdmaContentTransfer

This interface provides methods to supply and access content data in DMA. The only mandatory form of content transfer is via a stream. This means that DMA clients and document spaces must provide support for supplying and retrieving content by implementing the IdmaStream interface.

IdmaStream Interface

This utility interface provides methods to perform read-only access to content data as a continuous stream of bytes. This interface inherits from IUnknown and introduces the two methods, ReadStreamData () and SetStreamPosition (). Some implementations might not support seeking into a stream, thereby making it a sequential stream.

This interface must be presented through a COM object, which need not be a DMA object. When content is inserted (via SetCaptureStream ()), the client application is responsible for instantiating an appropriate COM object. When content is retrieved (via GetStream ()), the document space implementation will instantiate the object.

Content Reference

This is a sub-class of content element, which references content outside the control of a document space.

The string property dmaProp_ContentLocation introduced by this class contains the reference to content data managed outside the document space. This value is a resource name in the format of an URL.