Friday, June 30, 2017

DITA Linking Strategies

Share to Facebook Share to Twitter Email This Share on Google Plus Share on Tumblr

This small tutorial is based on the "DITA Linking Strategies" presentations I made for the DITA Europe 2016 and DITA North America 2017 conferences. It's a general overview about DITA linking possibilities and best practices. Also, it's meant as a continuation of the DITA Reuse Strategies blog post.

According to Wikipedia:

"A link, is a reference to data that the reader can directly follow either by clicking, tapping, or hovering."

Basically, we should regard linking as yet another form of content reuse, except that instead of presenting the content in place, it re-directs the end user to some other resource.

I'll start with describing linking at DITA Map level.

Map-Level Linking

A DITA Map uses topic references to assemble the content of a publication.
 <topicref href="installation.dita">
  <topicref href="server_installation.dita"/>
  <topicref href="client_side_installation.dita"/>

Depending on the output format, the topic reference may be a link in the table of contents for the XHTML-based outputs or it may be interpreted as a content reference for the PDF-based output that generates a single monolith document. So the role of the topicref is dual, it may sometimes be regarded as a link to a topic and sometimes as a content reference.


DITA topic modules should be kept as small as possible, but sometimes the end user may need to read more than one topic to achieve a single task. So, when publishing to HTML-based outputs, you will end up asking yourself this question:

Should I prefer larger HTML files or more links in the TOC?

And you should always consider these two ideas:
  • Links are disruptive. Ideally, users would not need to jump around in content to read the entire story they are searching for.

  • Small topics that are usually read consecutively by the end user can probably get merged together.
For example, if the installation of your product requires installing both a server-side and a client-side component, by using DITA chunking you can choose to have separate DITA topic modules for each of the installation procedures but merge the topics together in the web-based outputs:
 <title>User Guide</title>
 <topicref href="installation.dita" chunk="to-content">
  <topicref href="server_installation.dita" toc="no"/>
  <topicref href="client_side_installation.dita" toc="no"/>

You can read more about chunking in the DITA 1.3 specification. The DITA Style Guide also has a good overview about why it is preferable to write small topics and then merge them together using the chunking mechanism.

Topic-Level Linking

Links that appear inside topics can be divided into various categories and I'll discuss each of these categories separately.

In-Content Links

In-content links are links added manually in the topic content:
<li>See: <xref href="http://www.../" format="html" scope="external"/></li>

You should keep in mind that this kind of link is disruptive to the reading experience because when end users encounter them, they need to decide weather to read further on or to follow the link. On the other hand, this may sometimes be a good thing. For example, one of the installation steps may require the end user to download a certain library from an external website before continuing.

You can read more about links in general in the DITA 1.3 specification. The DITA Style Guide, written by Tony Self, also discourages the use of in-content links.

Related Links

Related links are placed at the end of the DITA topic and they allow the end user to explore additional resources after the current topic has been read.
    <link href="" format="html" scope="external"/>

To minimize disruption when reading the content in general, the preferred place where to place links is at the end of the generated HTML page.

You can read more about related links in the DITA 1.3 specification.

Defining Related Links using Relationship Tables

Related links do not need to be manually added at the end of each topic. You can define relationship tables in the DITA Map:
    <topicref href="client_side_installation.dita"/>
    <topicref href="server_installation.dita"/>

These tables can define associations between two or more topics, associations that automatically contribute to the related links creation in the generated HTML output.

Here are some benefits of using relationship tables:
  • A topic should have as few links as possible defined directly within. This makes it easier to reuse the topic in various contexts and keeps it as separate as possible for other parts of the DITA project, decreasing the possibility of broken links.

  • By default, links defined in relationship tables are bi-directional, allowing users to land on any of the topics when searching for solutions and find their way to the related ones.

  • Using a relationship table separates the task of writing topics from the task of finding relationships between topics.

You can read more about relationship tables in the DITA 1.3 specification. The DITA Style Guide also recommends using relationship tables.

Indirect Links (Key References)

All the link samples we've look at so far have been direct links, links that point to the target using the @href attribute. Indirect links require two steps:
  1. Define a key in the DITA Map for the target.
    <keydef keys="client_installation" href="client_side_installation.dita"/>
  2. Use the defined key to reference the target resources.
    <xref keyref="client_installation"/>
Here are some of the benefits of indirect linking:
  • Offers the ability to reuse link target text and meta data. If you want to have custom text for a certain link, you can define it directly in the DITA Map:
    <keydef keys="dita_ot_website" href="" format="html"
       <linktext>DITA Open Toolkit Web Site</linktext>
    and then add key references in all other places:
    <xref keyref="dita_ot_website"/>
  • Easier conditional linking (including links to topics that sometimes may be missing). If you want your topic to link either to one target or to another depending on the filtering/profiling conditions, instead of adding profiling directly on the link, you can add the profiling conditions directly in the DITA Map:
     <topicref keys="slicing" href="slicing_vegetables_for_experts.dita" audience="expert"/>
     <topicref keys="slicing" href="slicing_vegetables_for_novices.dita" audience="novice"/>
     <keydef keys="slicing" audience="noLink"><topicmeta><keywords>
    and then link to the key from each topic:
    <xref keyref="slicing"/>
  • Easier link management. A good overview about all the outbound links in your project helps you maintain and control lists of allowed external web sites. With indirect references, you can define all references to external resources in a separate DITA Map. An example of a DITA project using indirect links to achieve separation of links by purpose can be found here:

  • Makes it easier to move/rename topics. When you move or rename a topic referenced via indirect links, only the link defined in the DITA Map will break, making it easier to fix broken links.

There is an overview about indirect addressing on the DITA XML Org website. The DITA 1.3 specification also has a chapter about indirect links.

Auto-Generated Links

Until now, I've talked about manually added links, either in the topic or in relationship tables. Using the DITA @collection-type attribute, you can define relationships between parent and child topic references in the DITA Map, relationships that result in automatic links added between them:
 <topicref href="installation.dita" collection-type="sequence">
  <topicref href="server_installation.dita"/>
  <topicref href="client_side_installation.dita"/>
There are 3 useful types of @collection-type values:
  • Unordered - Links are generated from parent to children, and from children to parent.

  • Family - Links are generated from parent to children, from children to parent, and from sibling to sibling.

  • Sequence - Links are generated from parent to children, from children to parent, and from child to previous sibling (if applicable) and next sibling (if applicable).

You can read more about auto-generated links in the DITA Style Guide.

Conditional Links in Distinct Publications

You may publish documentation for multiple products from the same DITA content. Also, you may want to have links point to various targets depending on the product for which you want to publish the documentation. Or, you may want to suppress links completely in certain publications.

When using direct linking, you will need to profile each link depending on the publication:
Find our more about slicing vegetables: <xref href="slicing_vegetables_for_experts.dita" audience="expert"/>
<xref href="slicing_vegetables_for_novices.dita" audience="novice"/>.
With indirect links, you can define the profiling attributes as DITA Map level:
 <topicref keys="slicing" href="slicing_vegetables_for_experts.dita" audience="expert"/>
 <topicref keys="slicing" href="slicing_vegetables_for_novices.dita" audience="novice"/>
and thus, simplify the reference made in the topic content:
Find our more about slicing vegetables: <xref keyref="slicing/>.

Conditional Links in the Same Publication

Using DITA 1.3 key scopes, you can reuse a topic multiple times in a DITA Map and have each referenced topic contain links to various target topics. For example, if my preparing_vegetables.dita topic has a link:
<link keyref="slicing"/>
you can define various key scopes in the DITA Map that bind the "slicing" key to various targets:
 <topichead navtitle="Cooking for Experts" keyscope="expert">
  <topicref href="preparing_vegetables.dita" keys="preparing"/>
  <topicref href="slicing_vegetables_for_experts.dita" keys="slicing"/>
 <topichead navtitle="Cooking for Novices" keyscope="novice">
  <topicref href="preparing_vegetables.dita" keys="preparing"/>
  <topicref href="slicing_vegetables_for_novices.dita" keys="slicing"/>

This previous blog post contains more details about key scopes.

Link Text

When linking to an external resource or to a DITA topic or element, the publishing engine will attempt to deduce the link text from the target context. For example, the link to a DITA topic or element that contains a <title> will use that title as the link text. The link to an external resource (for example to will, by default, use the HTTP location as the link text. You can also customize each link text individually. So, ask yourself this question:

Should I leave the link text to be automatically computed or should I set a more friendly text?

For internal links to elements that have a title, in general it is more flexible to not set a custom text and let the publishing engine decide one for you. For external links, you should usually specify your custom link text.

Should I Link or Should I Reuse?

Suppose you want to bring a certain paragraph, note, or section to the end user's attention. If that particular target element is not very large, you should always reuse it (using a content reference) instead of linking to it.


As with all large projects, managing links in a growing DITA project can be problematic, so you need to become organized. As an overview of what we've discussed so far, I suggest the following best practices:
  • Linking is a form of reuse so:

    • Reuse small pieces of content instead of linking to them
    • Avoid too much linking (linking is disruptive)
  • Use indirect links. It will allow you to reuse link text and make profiling/filtering easier while giving you a better overview of the outbound links for your project.

If you want to experiment with the various linking strategies I discussed above, you can find some samples here:

Friday, June 16, 2017

Checking terminology when editing in Oxygen XML Editor

Share to Facebook Share to Twitter Email This Share on Google Plus Share on Tumblr
In this blog post I will offer a general overview about the current possibilities you have available to impose your own language checking rules when working with Oxygen XML Editor.

Built-in support

Oxygen comes bundled with the popular Hunspell spell checker and along with the regular bundled dictionaries for English, German, Spanish, and French, it allows you to install new dictionaries, either for other languages or custom dictionaries (for example, dictionaries for medical terms) that you can build separately:

Besides the spell checker, Oxygen also has support for Auto-correct and you can add your own Auto-correct pairs to Oxygen.

Commercial alternatives

Acrolinx is a very popular commercial tool for checking content for consistency and terminology. The plugins that Acrolinx developed for Oxygen standalone, Oxygen Eclipse plugin and Oxygen Web Author allow you to run the Acrolinx checker directly from inside the application.

HyperSTE is another popular commercial tool for checking content and terminology. They also have a plugin for Oxygen standalone.

Open-source alternatives

LanguageTools is an open-source proof­reading program for English, French, German, Polish, and more than 20 other languages . There is an open-source plugin for Oxygen available on GitHub.

The DITA Open Toolkit terminology checker plugin from Doctales contains Schematron rules to check that various words adhere to the terminology dictionaries that are custom built using DITA.

Building your own terminology checker

The fastest and simplest way to build a simple terminology checker is by using Schematron rules. The Doctales plugin is a good example for this.

At some point, as the terminology dictionary keeps growing, you may encounter delays and slow downs when editing the documents and validating it using the custom Schematron rules. So an alternative to this is by using our Author SDK to build your own Oxygen plugin, which can use our API to check the content and then add highlights. The LanguageTools open-source plugin may be a good starting example for this.

Monday, June 12, 2017

Batch converting HTML to XHTML

Share to Facebook Share to Twitter Email This Share on Google Plus Share on Tumblr

Suppose you have a bunch of possibly "not-wellformed" HTML documents already created and you want to process them using XSLT. For example, you may want to migrate the HTML documents to DITA using the predefined XHTML to DITA Topic transformation scenario available in Oxygen. So you need to create valid XML wellformed XHTML documents from the existing HTML documents and you need to do this in a batch processing automated fashion.

There are lots of open source projects that deliver processors that can convert HTML to its wellformed XHTML equivalent. For this blog post, we'll use HTML Tidy. Here are some steps to automate this process:
  1. Create a new folder on your hard drive (for example, I created one on my Desktop: C:\Users\radu_coravu\Desktop\tidy).
  2. Download the HTML Tidy executable specific for your platform ( and place it in the folder you created in step 1.
  3. In that same folder, create an ANT build file called build.xml with the following content:
    <project basedir="." name="TidyUpHTMLtoXHTML" default="main">
        <basename property="filename" file="${filePath}"/>
      <target name="main">
          <exec command="tidy.exe -o ${output.dir}/${filename} ${filePath}"/>
  4. In the Oxygen Project view, link the entire folder where the original HTML documents are located.
  5. Right-click the folder, choose Transform->Configure Transformation Scenarios... and create a new transformation scenario of the type: ANT Scenario. Modify the following properties in the transformation scenario:
    1. Change the scenario name to something relevant, like HTML to XHTML.
    2. Change the Working Directory to point to the folder where the ANT build file is located (in my case: C:\Users\radu_coravu\Desktop\tidy).
    3. Change the Build file to point to your custom build.xml (in my case: C:\Users\radu_coravu\Desktop\tidy\build.xml).
    4. In the Parameters tab, add a parameter called filePath with the value ${cf} and a parameter called output.dir with the value of the path to the output folder where the equivalent XHTML files will be stored (in my case, I set it to: C:\Users\radu_coravu\Desktop\testOutputXHTML).
  6. Apply the new transformation scenario on the entire folder that contains the HTML documents. When it finishes, in the output folder you will find the XHTML equivalents of the original HTML files (XHTML documents that can later be processed using XML technologies such as XSLT or XQuery).