Edit online

Migrating Various Document Formats to DITA

Labels: Migrate
Contributed by: Radu Coravu

Most companies do not start new DITA-based projects from scratch. They already have content written in various other formats and somehow they need that content converted to DITA. In this blog post, I will offer some conversion advice depending on the format of your current project.

Migrating DocBook Content to DITA.

You can migrate one or multiple DocBook documents to DITA using the Oxygen Resources Converter add-on: https://www.oxygenxml.com/doc/ug-editor/topics/batch-converter-addon.html.

The DocBook to DITA conversion contains an option named Create DITA maps from DocBook documents containing multiple sections. When this option is selected, all sections from your DocBook document will be separated into individual DITA topics and referenced in a DITA map.

Migrating Microsoft Word Content to DITA

The Oxygen XML User Manual has a detailed topic enumerating the possibilities to convert Microsoft Word content to DITA: https://www.oxygenxml.com/doc/ug-editor/topics/ooxml-to-dita.html.

Migrating Excel Content to DITA

You can use Oxygen's Smart Paste functionality to copy content from an Excel spreadsheet and paste it inside an opened DITA topic. Also, as an alternate possibility, the Oxygen Resources Converter add-on was updated to be able to batch convert Excel to DITA: https://www.oxygenxml.com/doc/ug-editor/topics/batch-converter-addon.html.

Migrating LibreOffice Content to DITA

LibreOffice documents can be saved in Word format, and once you do that, you can convert the Word content to DITA as described above. Alternatively, you can save the LibreOffice documents to DocBook and then apply the DocBook to DITA conversion technique described above.

Migrating Google Docs to DITA

You have three possibilities to convert Google Docs to DITA using Oxygen:
  • Copy/Pasting from Google Docs to a DITA Topic opened in Oxygen in the Author visual editing mode should work and convert the pasted content to DITA.
  • Save the Google document as OpenDocumentFormat (ODF) then save the ODF document as DocBook with Libre Office, then apply the DocBook to DITA transformation scenario shipped in Oxygen to convert DocBook to DITA.
  • Save the Google document as HTML then use the Oxygen batch converter add-on to convert it to DITA: https://www.oxygenxml.com/doc/ug-editor/topics/batch-converter-addon.html.

Migrating Markdown Content to DITA

The DITA Open Toolkit publishing engine bundled with Oxygen allows you to reference Markdown files directly in a DITA map and either publish them directly or export the Markdown files to DITA one by one: https://www.oxygenxml.com/doc/ug-editor/topics/markdown-dita-x-dita2.html. If you want to convert multiple Markdown documents at once, you can use the Oxygen Resources Converter add-on: https://www.oxygenxml.com/doc/ug-editor/topics/batch-converter-addon.html.

Migrating HTML Content to DITA

Using Oxygen's Smart Paste functionality, you can open the HTML documents in a web browser, then copy the contents and paste it in a DITA topic opened in Oxygen's Author visual editing mode. If you want to convert multiple HTML files, you can use the Oxygen Resources Converter add-on: https://www.oxygenxml.com/doc/ug-editor/topics/batch-converter-addon.html.

Migrating Unstructured FrameMaker to DITA

There is a FrameMaker plugin that can be used for this type of conversion: http://leximation.com/tools/info/fm2dita.php.

Migrating MadCap Content to DITA

Some recent MadCap versions seem to have facilities to export content directly to DITA. Otherwise, you will need to convert XHTML content to DITA with a custom XSLT stylesheet to preserve variable references.

Migrating Confluence Content to DITA

To convert Confluence content to DITA, you can use the Oxygen Resources Converter add-on: https://www.oxygenxml.com/doc/ug-editor/topics/batch-converter-addon.html.

You first need to export the content to HTML. For this, log in to your Confluence account and navigate to the specific space that you want to export. Then go to Space Settings→Export space and choose to export it as HTML. Then, back on Oxygen, you can then use the Confluence to DITA action (available once the add-on is installed) to convert the exported index.html file into a DITA map with topics.

Migrating LaTex to DITA

You may use a third-party application (like Pandoc) to convert LaTex content to Word or HTML. Afterwards use the Oxygen Resources Converter: https://www.oxygenxml.com/doc/ug-editor/topics/batch-converter-addon.html

Migrating Other Formats to DITA

You may find third-party applications (like Pandoc) that can convert your content to HTML or to some kind of XML format like DocBook. Once you have HTML or DocBook content, you can convert them to DITA using the advice above.