Preprocessing DITA-OT Project Files

19 Feb 2022

Read time: 11 minute(s)

Project files were introduced in the DITA-OT 3.4 release. They provide a standardized XML way to define how input DITA files should be published to output content files, including details such as filtering, transformation parameters, and output directory locations.

Our basic publishing requirements are as follows:

We publish some books multiple times in multiple DITAVAL filtering conditions.
We publish to both PDF (using PDF Chemistry) and online help (using Oxygen WebHelp).
- PDFs are published individually per-book.
- WebHelp is published as a collection of books, with in-help links to the PDF files.
We have "review" and "final" versions of our output.
- These versions use different DITA-OT parameters and different DITAVAL flagging files.

As I attempted to create a DITA-OT project file to produce our deliverables, I encountered some limitations. This blog post describes how I created an XSLT-based preprocessing approach to work around these limitations.

Quick Overview of Project Files

A project file uses three primary building block elements:

<content> - an input DITA map to process
- Can include one or more associated DITAVAL files
<publication> - a transformation to apply
- Can include transformation parameters
<deliverable> - output content to create, by transforming a <context> with a <publication>
- Can include an output subdirectory path (relative to the overall output directory)

In its simplest form, a <deliverable> can provide its own <context> and <publication> information within itself:

For more complex output content situations, a <deliverable> can reference shared <context> and <publication> elements by @idref references to @id values:

This @idref mechanism allows many deliverables to share common context and publication definitions. If there is a change to a <context> (perhaps a different map or new DITAVAL condition) or a <publication> (perhaps an updated DITA-OT parameter), then all relevant deliverables inherit the change automatically.

In addition, DITA-OT project files can use <include> statements to structure their information across multiple files. This allows contexts to be organized by product writer teams, publications to be placed in files maintained by a DITA environment maintainer, and so on.

Limitation – Specify Per-Deliverable PDF File Names (#3682)

When I publish multiple PDFs from the same map using DITAVAL conditions, I needed to define the outputFile.base parameter on a per-<deliverable> basis to control the output PDF file name:

DITA-OT versions before 4.0 do not allow <param> elements to be controlled from a <publication> reference in a <deliverable>. I filed the following DITA-OT enhancement request for this:

#3682: In DITA-OT project files, allow a PDF <deliverable> to specify its output file name

It was implemented for DITA-OT 4.0 in the following pull request:

#3907: Support param in publication reference

Limitation – Consider DITAVAL in Both <context> and <publication> (#3690)

I needed to apply DITAVAL from both <context> (for @product filtering) and <publication> (for @audience/@deliveryTarget/@rev filtering/flagging of "review" and "final" deliverables):

DITA-OT versions before 4.0 do not properly combine <context> and <publication> DITAVAL filtering. I filed the following DITA-OT issue for this:

#3690: In DITA-OT project files, apply both <context> and <publication> DITAVAL filtering

It was implemented for DITA-OT 4.0 in the following pull request:

#3907: Add profiles to publication project file

Using Preprocessing to Work Around the Limitations

To work around these limitations in earlier DITA-OT versions before 4.0 is released, I created an XSLT file to do the following:

Read the input DITA-OT project file
- Resolve <include> statements to pull all content into a single file
Convert all DITAVAL file references to absolute paths (to work around #3873)
In <deliverable> elements, replace all @idref'ed <context> and <publication> elements with copies of the referenced elements (so we can modify them per-<deliverable>)
Find <param> elements in <deliverable>, move them to <publication> instead (to work around #3682)
Find <ditaval> elements in <publication>, move them to <context> instead (to work around #3690)

I then applied this XSLT file as a preprocessing step to translate the unsupported project file constructs into supported constructs in a temporary preprocessed project file, then ran DITA-OT publishing using that temporary file. For example,

#!/bin/bash
rm -rf ./out

export DITAOT=$(dirname $(dirname $(which dita)))
export SAXON_JAR=~/saxon/saxon-he-10.6.jar

echo "Creating preprocessed DITA-OT project file..."
java \
  -jar ${SAXON_JAR} \
  -xsl:frameworks/dita/preprocess_project_file.xsl \
  -s:project.xml \
  -o:project.xml-preprocessed.xml

echo "Publishing preprocessed DITA-OT project file..."
${DITAOT}/bin/dita --project project.xml-preprocessed.xml -t temp --verbose

rm project.xml-preprocessed.xml

This worked well from a linux command line, but we also needed our writers to be able to run it from Oxygen. To do this, I created a copy of Oxygen's project file build script at

<OXYGEN_INSTALL>/frameworks/dita/dita_project/build_dita_project.xml

and added similar XSLT preprocessing to it using Ant commands, then placed the modified version at

frameworks/dita/build_dita_project_preprocessed.xml

in our Oxygen project directory. Then I extended the DITA-OT project file framework and created an extended DITA-OT project file transformation pointing to the modified build script:

This new transformation allowed Oxygen to publish project files that used the preprocessing workaround. The unsupported constructs still result in schema violations when the original (non-preprocessed) project files are opened for editing in Oxygen, but at least the publishing aspect works.

Note:

The preprocessing XSLT stylesheet requires Saxon to run. To support, this, the preprocessing-based DITA-OT project transformations specify a list of additional .jar libraries to use. To see these libraries, click the Libraries button in the dialog box shown above.

The following Oxygen project demonstrates this preprocessing approach:

preprocessed_ditaot_project_files.zip

To run it,

Extract the archive and open OPENME.xpr in Oxygen.
In the Project view, expand the Main Files list, right click on deliverables-all.xml and choose Transform > Transform With, and choose the Publish Preprocessed DITA-OT Project (all deliverables) transformation.

This will build "review" and "final" versions of both the "Product A" and "Product B" online help collections in the out/ directory, complete with correctly-named PDF files integrated into each online help collection.

There are other deliverable files for specific subsets of deliverables, organized into logical folders in the Main Files list.

To view the XSLT stylesheet without downloading the archive, click on the following link:

preprocess_project_file.xsl

There are comments in the code to explain how it works.

Exploring How the XSLT Transformation Works

To help you explore how the XSLT transformation works, the Oxygen project also makes it available as a refactoring operation that you can manually preview on project files.

To do this,

In the Project view in the Main Files list in the project_files/ directory, right-click one of the deliverable*.xml files, then choose Refactoring > XML refactoring.
In the refactoring operation list, choose Synopsys > Preprocess DITA-OT project file refactoring operation.
Click the Preview button to see what the XSLT transformation would do.

For example,

Be sure not to actually apply the refactoring operation to the file. Otherwise, you will need to re-extract the archive to restore the original project file.