Friday, December 18, 2015

Migrating to a Structured Standards-based Documentation Solution

Share to Facebook Share to Twitter Email This Share on Google Plus Share on Tumblr
Potential clients come to this world of structured content authoring from two main sources:
  1. They are starting fresh and after a little bit of comparing between structured and unstructured editing, between opened and closed solutions and some soul searching they come to regard structured authoring with a specific XML standard in general (and usually DITA in particular) as the possible solution for them.
  2. They are migrating from a previous unstructured or structured solution.
I think people in this second category start thinking about structured writing when they start encountering certain limitations with their current approach. These limitations they experience with their current system could be:
  • The need to reuse more content.

    With structured XML authoring in general and with DITA in particular you have so many ways of reusing content. In a previous blog post I tried to come up with an overview about all the ways in which you can reuse content using DITA:

  • Produce multiple outputs from the same content using some complex profiling conditions which are not supported in the current work flow.
  • Stop thinking about how the content is styled.

    You may want to focus more on the actual content and on semantically tagging it than on the way in which it will be presented in a certain output format.

  • Publish to more output formats than the current editing solution allows.

    Using a widely adopted open source standard like DITA for documentation also means having access to a variety of commercial and open source tools to generate various output formats from it. For example for obtaining the PDF you have about 5-6 distinct possible solutions: And for Mobile Friendly WebHelp you have 3-4 possible solutions:

  • Enforce certain internal rules on the documents.

    It's hard to impose best practices in unstructured documents. But with structured XML content, you can use Schematron to easily cover this aspect and even to provide quick fixes for your authors:

  • Benefit of advice and help from a larger community of writers and developers.

    When you are using a closed source solution, you may have only one forum and a couple of people willing to help. When you have a larger community you will be able to reach out with a single email to lots of people, and somebody may want to help you.

  • Share documentation between different companies.

    If a larger company which uses structured writing takes over a smaller one, the smaller company will need to adopt structured writing as well.

  • Own your content.

    Some editing solutions are closed source, you are forced to use a single tool because there are no other tools being to read that format. Then you need to ask yourself the question: "Is this content actually mine?"

  • Problems with your current tool vendor.

    If the format is closed source and the tool vendor is not responsive to your needs, you need to somehow move your content over to a market with multiple tool vendors available because competition also means smaller prices and better customer support.

Switching to structured content writing also has its problems. And I think the main ones are these:
  • The people. The fact that we all are reluctant to change. The learning curve. Writers might need to re-learn how to structure and write their documentation. Besides the technical aspects they will need to learn to divide content in small modules and to reuse parts in multiple places. Writers may not be willing to do this. We usually are very reluctant to change tools if we do not see instant benefits deriving from it.
  • Effort to convert the current available content to structured content. You can either choose manual conversion or automated conversion or in most cases a mixture of the two. Conversion will never be perfect, you will still need to go through the entire content and re-structure it taking into account module-based editing.
  • Customize the obtained output format. You may get out of the box various outputs from your content but you will always need to customize it to adhere to company standards. If you are using the DITA Open Toolkit for publishing you will need basic XSLT development skills to customize the PDF and CSS skills to customize the XHTML based output.
  • Money. You need to spend more money to get new tools, possibly a new CMS. Although I consider that starters, for a pilot project DITA does not need to be expensive. Here's how we're using DITA internally for our user's manual:
  • Sometimes you might need to control the styling of your obtained output so much and it would be impossible to separate the styling information from the content.

So can we draw a conclusion from all this?

Well, maybe not everybody interested in structured authoring will succeed to convert to it. But I think that one thing will hold true in most cases:

Once you convert to structured content, you will never go back.

Tuesday, December 15, 2015

Sharing New Custom File Templates for a Specific Vocabulary

Share to Facebook Share to Twitter Email This Share on Google Plus Share on Tumblr

The support Oxygen provides for editing DITA topics comes with quite an extensive set of new file templates used to create new DITA topic types. If you have a team of writers, you may want to filter out certain new file templates or add your custom new file templates, then share these custom templates with your team members.

This blog post will attempt to give you some clear steps for sharing a custom set of new file templates with your team.

All the original DITA new topic templates are located in the folder: OXYGEN_INSTALL_DIR\frameworks\dita\templates.

Instead of making changes directly to that folder, copying the entire DITA framework configuration folder (like OXYGEN_INSTALL_DIR\frameworks\dita), modifying and distributing it you can choose to extend the DITA framework and distribute the extension. In this way, you will benefit of new functionality added to the base framework by newer Oxygen versions and still use your customizations.

The steps below describe how an extension of the DITA framework which adds a custom set of new file templates can be constructed and shared:
  1. Create somewhere on your disk, in a place where you have full write access a folder structure like: custom_frameworks/dita-extension.
  2. In that new folder structure create another folder custom_frameworks/dita-extension/templates which will contain all your custom new topic templates.
  3. In the Document Type Association / Locations preferences page add in your Additional frameworks directories list the path to your custom_frameworks folder.
  4. In the Document Type Association preferences page select the DITA document type configuration and use the Extend button to create an extension for it.
  5. Give a custom name to the extension, for example DITA - Custom and then change its Storage to external, then save it to a path like: path/to/.../custom_frameworks/dita-extension/dita-extension.framework.
  6. Make changes to the extension, go to the Templates tab, remove all previous entries from it and add a new entry pointing to your custom templates folder: ${frameworkDir}/templates.
  7. Click OK to close the dialog and then either OK or Apply to save the preferences changes.

After you perform the steps above you will have in the dita-extension folder a fully functioning framework extension which can be shared with others.

The framework can then be shared with others in several ways:
  • Copy it to their [OXYGEN_DIR]/frameworks directory.
  • Create somewhere on disk a custom_frameworks folder, copy the framework there and then from the Document Type Association / Locations preferences page add in your Additional frameworks directories list the path to the custom_frameworks folder.
  • Distribute the framework along with a project.

    Follow these steps:
    1. On your local drive, create a directory with full write access, containing the project files and a custom_frameworks folder containing your dita-extension framework.
    2. Start the application, go to the Project view and create a project. Save it in the newly created directory.
    3. In the Document Type Association / Locations preferences page, select Project Options at the bottom of the page.
    4. Add in the additional framework directories list an entry like ${pd}/custom_frameworks.
    5. Add other resources to your project, for example you can have all your DITA content located inside the project folder.
    6. You can then share the new project directory with other users. For example you can commit it to your version control system and have they update their working copy. When they open the customized project file in the Project view, the new document type becomes available in the list of Document Types.
  • Deploy the framework/document type configuration as an add-on.

After your team members install the framework they can check in Document Type Association preferences page in the list of Document Types to see if the framework is present and if it appears before the bundled DITA framework (meaning that it has higher priority).

You can use the framework extension mechanism to customize lots of aspects of the DITA functionality in Oxygen. For example you can remove various elements from the content completion list:

Tuesday, December 08, 2015

DITA Map Validate and Check for Completeness Overview

Share to Facebook Share to Twitter Email This Share on Google Plus Share on Tumblr

The Validate and Check For Completeness is an action available on the DITA Maps Manager view toolbar and it can be used to perform thorough checks on the entire DITA Map structure and set of referenced topics. We've made this action available to you a couple of years ago and during these years, based on your direct feedback we kept adding additional checks and functionality to it. We always took care to optimize the processing speed in order to allow for validating projects containing thousands of resources in 10-15 seconds.

In this blog post I will try to make a list of all the checks that the action does in order to ensure you that your DITA content is valid and proper:
  • Validate each DITA resource directly or indirectly referenced from your DITA Map with its associated DTD or XML Schema and report any errors which may arise.
  • Validate each DITA resource with an additional Schematron resource which you can provide. Schematron is quite handy when it comes to enforcing internal rules on the DITA content and we use it quite a lot for checking our user's manual.
  • Batch validate referenced DITA resources. This setting validates each DITA resource according to the validation scenario associated with it in Oxygen. This will decrease the validation speed quite a bit but if you have DITA 1.3 resources which are Relax NG based you should check it in order to validate each resource according to the Relax NG Schema.
  • Use specific DITAVAL or profiling condition filters when performing the validation. From a single published DITA Map you may get multiple publications based on the profiling filters applied. Because these filters are used to remove entire topics or parts of topics, you may have links and conrefs which become invalid when certain filters are applied on the map. So it makes sense to validate your DITA project by applying all profiling filters you would apply when publishing it in order to be aware of these potential broken references.
  • Report profiling attributes or values which are not valid according to the Subject Scheme Map associated with your project. You can read more about controlling profiling attributes and values here:
  • Identify possible conflicts in profile attribute values. When the profiling attributes of a topic contain values that are not found in parent topic profiling attributes, the content of the topic is overshadowed when generating profiled output.
  • Check the existence of non-DITA referenced resources. You will get reports if links to local images or other resources are broken. You can also decide to verify the existence of remote links. For example if you have links to various external web sites, you might be interested in seeing if those remote servers are still there.
  • Report links to topics not referenced in DITA maps. Checks that all referenced topics are linked in the DITA map. Otherwise you may get working links to topics which are not included in the table of contents.
  • Check for duplicate topic IDs within the DITA map context. By default the topic ID can be used in the WebHelp output for context sensitive help. Also certain CMSs require that a certain topic ID would be unique in the entire DITA Map.
  • Report elements with the same ID placed in the same DITA Topic according to the specification.
  • Report missing domains attribute which may indicate an improper DITA specialization.
  • Report invalid class attribute values according to the specification.
  • Report invalid key names according to the specification.
  • Report references to missing keys or links which refer to keys which have no target resource defined on them.
  • Report problems when elements referenced using DITA content reference range are not siblings or are not properly sequenced.
  • Report links which have no target set on them either via href or keyref.
  • Report non-portable absolute references to DITA resources.
  • Report when links contain invalid encoded characters or Windows-like path separators.
  • Report when resources are referenced with incorrect path capitalization.
  • Report a mismatch between the referenced resource type and its format attribute.
  • Report a mismatch between the referenced resource type and its type attribute.
  • Report topic references in a DITA Map pointing to non-topic elements in the target topics.
  • Report invalid content references and content key references, references to non-existing resources, to non-existing IDs, report when the source element is not a specialization of the target element.

I think I covered most of the checks that this validation does.

Are there other checks you would like to see in a future version? Would you like to see this validation available as a separate process which could be run on a server?

Friday, November 27, 2015

DITA Reuse Strategies (Short Tutorial describing all DITA Reuse possibilities)

Share to Facebook Share to Twitter Email This Share on Google Plus Share on Tumblr


This small tutorial is based on a presentation called DITA Reuse Strategies I made at DITA Europe 2015. It's main purpose is to explore the numerous possibilities of reusing content within the DITA standard.

First of all I think the main reasons we want to reuse content in technical documentation are these ones:
  • Consistent explanations for the same situations.
  • Less content to translate.
  • Decreased time spent writing content.
  • Obtain different publications from common content.
I would like to start by saying that technical documentation writers have two very important roles:
  • Record knowledge about using tools and processes.
  • Spread knowledge to reach large audiences.
As a software engineer, having a product user's manual which is rich in examples and answers to frequently asked questions saves me time. Instead of individually explaining to end users various application behaviors I can give links to the manual or better yet our end users find that content by themselves. Because there are just not enough human resources in any company in order to individually help each end user.

We'll start with a top down approach to reuse. Complete small examples for most of the reuse situations described below can be found here:

Version Control and Reuse

Version Control allows you to reuse content tagged at a certain point in time in order to produce older versions of your publications. So no matter what open source version control system like SVN or GIT you are using or commercial CMS, you should always have the possibility to produce older bug-fix versions for your documentation. So you can think of Version Control as content reuse on the time line axis.

Converting XML content to various output formats

XML in itself is perfect for reuse because:
  • XML is an intermediary format. We don't do XML for the pleasure of it. We do it because we want to obtain multiple outputs from it and it has enough content and structure inside to allow for it. Some call this single source publishing but it can be just as easily be called content reuse.
  • XML contains the necessary content.
  • XML contains the necessary structure.
  • XML is a standard. So you have a choice between open source and commercial tools.
  • XML is a standard for defining standards. Among which DITA, the most versatile standard XML vocabulary when it comes to reuse.
Whatever output you will obtain from the XML, there is a constant, this XML format which contains all your data will contain more semantic meaning than any of the published outputs.

You can read more about the selling points of using XML in this older blog post:

Create larger publications from existing ones

You can merge multiple existing DITA Maps in various new publications.

The only existing danger for this would be if you define keys with the same name but different values in both publications. Fortunately DITA 1.3 comes to the rescue with the new keyscopes support which allows keys with the same name to be resolved to various values on each scope:
<!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "map.dtd">
    <title>Vegetables Soup</title>
    <topicref href="carrots/carrots.ditamap" format="ditamap" keyscope="ks1"/>
    <topicref href="potatoes/potatoes.ditamap" format="ditamap" keyscope="ks2"/>

Even if you have a single root map you can keep related sections or chapters in different DITA Maps. Besides adding more logical structure to your content you never know when you'll reuse those sub-maps in different publications.

Reuse content for similar products

This is the most common case for successful reuse, you have multiple similar products which share common functionality. So similarly the technical documentation for each of those products will also share common content. This is usually done in two ways. In the following sections I will use the term root map for referring to the DITA Map which will actually get published.

1. Use multiple Root Maps.

Each root map is published to obtain the output for a certain product type. As major benefits you can:
  • Reuse entire topics.
  • Define variable product names.
  • Remap links and reused content using keys.

Publication maps for phone models X1000 and X2000 using almost similar content except Blue-tooth chapter which appears in only one of them.

2. Use a single Root Map.

You have a single publication root map which gets published for various products using profiling filters applied on it. These filters can be applied either at topic or element levels. The product name is variable and depends on the applied filters.

Reuse fragments of content

Until now we have regarded the topic as an indivisible unit in our project. But there are many times when it becomes useful to reuse smaller elements in various places throughout the publication.

Content References

Content references are the initial and probably the mostly used reuse mechanism in the DITA specification. They allow reusing elements from a topic in various other topics throughout the publication.

Small example of content referencing

Reusable Component from topic reusables.dita:

  <dd id="CPU">
    <ul id="ul_lym_bqd_x4">
      <li>Minimum - <tm tmtype="tm">Intel Pentium III</tm>/<tm tmtype="tm">AMD Athlon</tm>
        class processor, 1 <term>GHz</term>.</li>
      <li>Recommended - Dual Core class processor.</li>

Content reference:

<dd conref="path/to/reusables.dita#topicID/CPU"/>

You can read more about how content references can be inserted in Oxygen here:

Content Key References

When compared to direct content references, content key references are done with indirect addressing. You first need to define a key for the topic which contains the reused content and make the content key reference using that key.

Small example of content key referencing

Reusable Component from topic reusables.dita:

  <dd id="CPU">
    <ul id="ul_lym_bqd_x4">
      <li>Minimum - <tm tmtype="tm">Intel Pentium III</tm>/<tm tmtype="tm">AMD Athlon</tm>
        class processor, 1 <term>GHz</term>.</li>
      <li>Recommended - Dual Core class processor.</li>

Key definition in DITA Map:

<keydef keys="reusable.install" href="reusables/reusables.dita"/>

Content key reference:

<dd conkeyref="reusable.install/CPU"/>

You can read more about how content key references can be inserted in Oxygen here:

Content Reference Ranges

Instead of reusing a series of consecutive elements (for example steps, list items) one by one you can reuse an entire range of sibling elements. For this to work, both the intial and the final elements need to have IDs defined on them.

Small example of content key reference with ranges

Reusable steps from task reusable_steps.dita:

      <step id="washing">
        <cmd>Wash the vegetables thoroughly.</cmd>
      <step id="peeling">
        <cmd>Pass the peeler gently over the vegetable.</cmd>

Key definition in DITA Map:

 <keydef keys="reusable_steps" href="reusable_steps.dita"/>

Content key reference range:

      <step conkeyref="reusable_steps/washing" conrefend="default.dita#default/peeling">

The usual dialog from Oxygen used to insert reusable content can also be used to select the range of elements to insert:

Content Reuse Tips and Tricks

I tried to compile below a set of best practices to follow when reusing content:

  • Keep all your reused content in special topics located in special folders. Technical writers need to know that they are editing content which potentially is used in mutiple contexts.
  • Keep a description for each reused element. You can have topics which act like dictionaries of reused content. A table of reused content can have two columns. On the first column each cell contains the reused element and on the second one you can have a small description for each reused element. The description acts as metadata, it may give the technical writer more details about how that content should be reused.
  • Use conkeyrefs instead of conrefs. Really, because they use relative paths conrefs always break when you move topics around. But more about conkeyrefs in the next section.
  • When using conkeyrefs you should create a special map with key definitions. This keeps the reused content and the keys for it separate from the live content.
  • A topic can have multiple reusable elements inside it. In this way it will act like a dictionary of reused components. In such a topic you can keep a table with two columns. On the first table column in each cell you can have a reused element. On the second table column you can keep a small description for each element. The description is metadata, it is not meant for the published output. It is just a good way to inform technical writers about how that particular element should be reused.

Pushing Content

Besides the techniques we've seen so far for pulling reused content in multiple places you can also push content to a certain specified place inside an existing topic.

So why push content?

Imagine you have an existing publication "Cooking Book" containing a task with a couple of steps for peeling vegetables. At some point you create the DITA Map for a larger publication called "Cooking Book for Pros" which reuses the entire original publication by referencing to the original publication DITA Map. But you somehow need to add extra steps in the original task when the larger publication gets printed.

Pushing Content to an existing sequence of steps

Sequence of steps from the original task:

      <step id="peeler_handling">
        <cmd>Pass the peeler gently over the vegetable.</cmd>

Key definition in DITA Map for the task which will push the content:

<keydef href="stepsPusher.dita" keys=”peeling”/>

Content key reference push done from the "stepsPusher.dita" task:

            <step conaction="mark" conkeyref="peeling/peeler_handling">
            <step conaction="pushafter">
                <cmd>Read the instructions.</cmd>

So the only purpose of the "stepsPusher.dita" task which is referenced with a resource-only processing role and thus does not appear at all in the output is to modify the content of the original task which gets published.

How do we push content in Oxygen? First you would need to define an ID on an element which will be the target for our push. The conref push mechanism allows us either to replace, insert an element before or after this target element. After this you can create the topic which pushes the content, create the step which will be pushed. You can right click inside this steps and choose Reuse->Push Current Element....

Key References (Variables)

You can reuse simple variables like product name, executable, and so on by defining keywords in the Dita Map and then using keyref's in topics to reuse those text fragments.

Reusing keywords

Defining the reused keyword in the DITA Map:

<!-- product name -->
  <keydef keys="product" product="editor">
        <keyword>Oxygen XML Editor</keyword>

Reusing the keyword in a topic:

<title>Installation Options for <ph keyref="product"/></title>

In Oxygen you can create key definitions in the DITA Map by right clicking in the DITA Maps Manager and choosing Append Child->Key definition with keyword.... After this, in the topic you can use Oxygen's regular Reuse Content action to insert the keyref.

DITA 1.3 Contributions to Reuse

DITA 1.3 takes content reuse to an entire new level allowing you to:
  • Reuse topic with variable content depending on context (keyscopes).
  • Reuse the same content profiled in various ways in the same publication (branch filtering).

Reuse with Key Scopes

Using DITA 1.3 key scopes you can reuse a topic in multiple places in the DITA Map with slightly different content.

Reuse using key scopes

Let's say you write a topic about Windows installation for your software product:
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="installation">
  <title><ph keyref="osName"/> Installation</title>
      <ol id="ol_g5h_st4_zt">
        <li>Download the executable.</li>
        <li>Run the executable by double clicking it.</li>
        <li>Follow steps described in the installation wizard.</li>
and at some point your realise that exactly the same steps need to be followed for the Linux installation. The only difference is the name of the operating system. You use a keyref to refer to the operating system name but just with DITA 1.2 support the key will resolve to a single value.

Using keyscopes in the DITA Map you can define multiple values for your key depending on the context:

 <topicgroup keyscope="windows">
  <keydef keys="osName">
  <topicref href="installation.dita"/>
 <topicgroup keyscope="linux">
  <keydef keys="osName">
  <topicref href="installation.dita"/>

You can find a more detailed example and download samples for reuse based on key scopes in this blog post:

Reuse with Branch Filtering

With branch filtering you can combine two profiles of the same DITA Map in a larger publication.

Creating a Phones Catalogues publication

If you already have a DITA Map from which you can obtain publications for various mobile phone versions based on the profiling filters applied to it, you can use branch filtering to create a larger publication which incorporates the publications for all mobile phone versions:

  <topicref href="phoneDetails.ditamap" format="ditamap">
   <ditavalref href="ditaval/X1000Branch.ditaval">
  <topicref href="phoneDetails.ditamap" format="ditamap">
   <ditavalref href="ditaval/X2000Branch.ditaval">

You can find a more detailed example and download samples for reuse based on branch filtering in this blog post:

Reuse non-DITA resources

Besides DITA topics you can reuse other resources in your DITA project:
  • Reuse images either referenced directly or via a key reference.
  • Reuse other linked resources (like videos, PDFs and so on).

As binary resources are not embedded in the DITA topics, they are naturally reused by being kept in separate files and linked when necessary.

You can reuse images and link to other resources either via direct references or via indirect key references. What to choose may depend on how many times you refer to a certain image or binary resource. If you refer to it only once or twice you can use direct referencing.

If you have problems getting images to appear the same size when published to PDF and XHTML-based outputs you should make sure they do not have the dots-per-inch information saved inside them:


The DITA standard can provide for you quite a large toolbox for reuse scenarios.

Besides the tips which are spread during this tutorial here is some additional advice for you:
  • Know a little bit about all these possibilities (at least know that they exist), you never know when one of them might come in handy.
  • For any given potential reuse situation you may find out that you can use multiple reuse strategies. So at a given time you could reuse a piece of simple text either via direct conrefs, indirect conkeyrefs or keyword keyrefs. Choosing one of the strategies will depend on the situation. For example if you plan in the future to also have inline elements in the reused text, you should go with either conref or conkeyref. If you reuse that content only in one or two places you can go with conref. But if you reuse it extensively you can define a key and use conkeyref.
  • Try to keep the reused content separately, in special folders. Writers will know that when they are editing resources from these special folders they might modify content which is potentially used in multiple places.
  • If you plan to translate your content to other languages try not to reuse inline elements (other than product name and constants which do not change when translated). Usually the translators need to translate entire block level-elements in order to have a good flow of translated content. The DITA 1.3 specs contains quite an useful recommendation for this: