Edit online

DITA XML vs Markdown Syntax and Capabilities Comparison

16 Mar 2023
Read time: 35 minute(s)

The following article is a comparison between the DITA XML standard and Markdown. The comparison attempts to cover syntax specification and features. I attempted to write this comparison without any implicit bias towards one or the other, if there are DITA XML or Markdown features that I missed, that was done out of ignorance and not out of malice. Feedback is always welcomed as usual.

Table 1. DITA XML vs Markdown
DITA XML Markdown
Short description DITA XML is a standard for designing, writing, managing, and publishing information. There are multiple versions of the DITA standard, the most popular one being version 1.3. Markdown is a lightweight markup language that you can use to add formatting elements to plain text documents. There was an effort to standardize Markdown to a specification named CommonMark. In the wild there are lots of Markdown flavors and extensions most of them sharing a common set of features. The most popular are probably CommonMark and Github Flavored Markdown.
Useful links for learning Resources for learning DITA with Oxygen
Pros
  • OASIS Open standard.
  • Advanced support for content reuse either at topic, block or inline level.
  • Advanced support for filtering (generating multiple similar user guides from the same content).
  • Open source publishing engine with lots of supported output formats (some free, some commercial) like HTML5, Windows Help, PDF, Word, EPUB, so on.

Doctales - Why use DITA

Doctales - Pros and Cons

  • Large user base. Familiar to software engineers who use it to write issues.
  • Basic syntax easy to learn.
  • Easier to read without specialized tools.
  • Offline and online free editing tools.
  • For the base syntax quite easy to edit the content in a plain text editor tool.
  • Lots of static web site generator open source tools like MKDocs or Jekyll.
Cons
  • Smaller user base.
  • Harder to learn.
  • XML is more verbose than plain text.
  • Visual editing requires the use of a commercial tool like Oxygen.
  • Smaller number of open source tools to generate professional looking outputs.
Doctales - Pros and Cons
  • Not all language features are available in the base Markdown "specification". There are various flavors with various syntax differences between them and you probably need to pick a flavor to use and stick to it.
  • Advanced features like content reuse for example are not in the base standard but may be implemented with different syntaxes for various flavors.
  • Static web site generators are not compatible with each other, they have various specific configuration files or to link between files.
  • Not many possibilities to assemble multiple Markdown files and publish for example outputs like PDF or Word.
  • Cannot render complex cell content (multiple paragraphs for example) in table cells or in list items.
Cross-Compatibility A DITA Map can refer to a Github flavored Markdown file and the publishing engine can perform dynamic conversion from Markdown to DITA while editing. -
Table of contents Gathering multiple DITA topics in a larger publication and defining the table of contents is done by using DITA Maps.

Working with DITA Maps

CommonMark does not define the possibility to create a table of contents or to aggregate multiple Markdown files in larger publications.

Various static web site generators have various ways to define table of contents, usually based on Yaml like MKDocs.

Validation
  • Validation according to the DITA specification DTDs/schemas done when publishing or when editing.
  • Additional validation can be done with Schematron rules.
  • Usually with Markdown you can look at a live preview while typing to see that everything looks Ok.
  • There are various processors which may be used to validate Markdown for example using a set of JSON rules.
Publishing
  • The DITA Open toolkit publishing engine comes with default support to publish DITA Maps and customize to plain HTML5, PDF.
  • There are additional open source plugin to publish to MS Word or EPUB.
  • Other curated open source plugins are available in the DITA OT plugins registry.
  • Commercial plugins to publish to WebHelp output like Oxygen WebHelp or Fluid Topics.
Most publishing libraries rely on the conversion from Markdown to HTML.
  • Lots of open source static web site generators.
  • Lots of libraries (Javascript, Java, Python, etc) to convert Markdown to HTML.
  • Other conversion types available using Pandoc.
Translation There are translation agencies accepting directly DITA XML content or you can convert DITA XML to XLiff and use a translation system. Each DITA XML topic or map can have an @xml:lang attribute to specify the current language in which it is written.

Translating your DITA Project

There are various tools like Simpleen which seem to specifically handle Markdown translation.
Extensibility
  • Possibility to define a new specialization of the DITA vocabulary with new element names.
  • Use the @outputclass attribute value on elements to set custom values used when styling the output.
  • Use the DITA <data> element with custom names and values and take them into account with publishing time customizations.
  • Use the DITA <foreign> element, maybe for example embed HTML inside it using a custom publishing plugin.
  • Use HTML elements inside Markdown, for example when defining complex tables or you do not have a Markdown equivalent.
  • Yaml headers.
  • Ability on certain Markdown flavors/extensions to define attributes for each element.
Metadata
  • The DITA <prolog> element can contain lots of metadata information, by not visible in the published output. Example:
    <topic id="topic_wcj_tgy_5wb">
     <title>The Title</title>
     <prolog>
      <author>The Author</author>
      <metadata>
        <keywords>
    	<keyword>one</keyword>
           <keyword>two</keyword>
        </keywords>
       </metadata>
      </prolog>
  • <indexterm> elements are also considered metadata as they are used to generate an index table.
  • Sometimes Markdown files may contain Yaml headers before the actual content defining simple keys and values. Example:
    ---
    title: The Title
    author: The Author
    keywords: [one, two, three, four]
    ---
    # A Heading
    Text body. 
Content reuse: No content reuse support is in the standard base, various extensions do exist for example:
  • Redocly uses HTML <embed> tags with references to Markdown files to reuse entire chunks of Markdown content placed inside a file.
  • Hugo uses special notations named shortcuts.
Filters You can use profiling attributes in DITA XML topics or on topic references in a DITA Map map. By using a single DITA Map and filtering it differently you can obtain multiple publications from it.

For example for the Oxygen user's manual we obtain from the same DITA Map lots of distinct publications for "Oxygen XML Editor", "Oxygen XML Author", "Oxygen XML Web Author".

There may be but I am not aware of such a feature in Markdown.
Headings
  • DITA topics have a <title> element which appears as a heading 1 when published and is also used for the <title > element in the published HTML document.
  • You can nest topics one inside the other and the generated HTML output will have for each nested topic <h2>, <h3>, etc depending on the nested depth.
  • You can have <section> elements with <title>elements inside a topic, they cannot be nested one inside the other.
    <topic id="topic_wcj_tgy_5wb">
     <title>Title1</title>
     <body>
      <section>
        <title>Section 1</title>
        <p>paragraph</p>
      </section>
     </body>
     <topic id="inner">
      <title>Inner topic title</title>
     </topic>
    </topic>
You can use a number of # followed by space and text to define a new heading. Headings do not necessarily need to be incremental, you can start with heading level 2 and then have a heading level 1.
# Heading level 1
### Heading level 3
## Heading level 2
....
Block elements There are multiple topic types like <concept>, <task>, <reference> and extra topic types can be added using specialization. The basic block elements are <topic>, <title>, paragraph <p> elements, <codeblock>, lists <ul> <ol>, <table>, <section>, <fig>, <image>, <note>. There are also other block level elements depending on the topic type. Block elements: Paragraphs, tables, lists, images, block quotes, etc.
Inline elements <b>, <i>, <u>,<sup>,<sub> and other inline elements with more semantic meaning like <codeph>, <uicontrol>, <filepath>. Bold, italic,underline. Depending on the Markdown flavor also other inlines like subscript, superscript, strike-through.
Audio/Video The DITA <object> element can be used to refer to audio, video or iframe content. No official support, maybe use embedded HTML content or add a link to the audio/video instead.
Tables The DITA <table> element is based on the CALS table specification. Cells can span multiple rows or columns and contain inside block elements content like lists, paragraphs. The table can have header and body rows. Markdown tables are usually written in an ASCII graphic like representation allowing for cells content to be aligned left or right. By default cells can contain inside only plain text. If more complex table structures are needed, HTML tables can be inserted directly in Markdown if the used Markdown flavor supports HTML elements inside it.
Lists Ordered <ol>, unordered <ul> or definition lists <dl>. Other topic types like <task> contain for example the <steps> element which is an ordered list of steps. Each list item can contain inside block elements like paragraphs, other lists, tables, etc. Ordered and unordered lists. Each list element contains inside simple content, it cannot contain block level elements like additional lists or multiple paragraphs.

The task list is an interesting extension to show checkboxes next to each list item.

Other types of lists (definition list for example) or list items containing multiple block level elements can be inserted directly in Markdown if the used Markdown flavor supports HTML elements inside it.

Links
  • Internal links (cross references):
    • Link to another topic.
    • Link to a particular element in another topic.
    • Links to web resources.
  • Related links (at the end of each topic)
    • Link to another topic.
    • Link to a particular element in another topic.
    • Links to web resources.
Links to other web resources.
Conclusion
  • Harder to type in a plain text area, requires DITA editing tools most of which are not free.
  • Advanced support for structured validation.
  • Advanced support for content reuse and profiling conditional text.
  • Publishing engine allows publishing to multiple output formats like HTML, PDF and others based on plugins which can be installed.
  • Easy to manually type in a plain text area but a preview definitely helps.
  • More complex elements need to be inserted as HTML elements.
  • Various Markdown extensions have extra support for example for content reuse.
  • Mostly targeted towards obtaining web based HTML content.
  • Looks like a language which is not intended to do the heavy lifting of producing multiple deliverable formats and deliverables from the same content.