Wednesday, May 25, 2016

How to Migrate from Word to DITA

Share to Facebook Share to Twitter Email This Share on Google Plus Share on Tumblr
The need for migrating Microsoft Office® Word documents to XML formats, and particularly to DITA, is quite a frequently encountered situation. As usual, migration from proprietary formats to XML is never perfect and manual changes need to be made to the converted content. However, the methods below should help you find the best approach for your particular case:

 Method 1
  1. Open the Word document in MS Office, select all the content, and copy it.
  2. Open Oxygen and create a new DITA topic in the Author visual editing mode. 
  3. Paste the selected content. Oxygen's smart paste functionality will attempt to convert the content to DITA.
Method 2
  1. Save your MS Office Word document as HTML.
  2. Once you obtain that HTML, you have two possibilities:
    • In Oxygen, Select File->Import->HTML File to import the HTML as XHTML. Then open the XHTML in Oxygen and in the "Transformation Scenarios" view there should be four pre-configured transformation scenarios to convert XHTML to DITA topics, tasks, references, or concepts.
    • Open the HTML file in any Web browser, select all of its content, and copy it. Then open Oxygen, create a new DITA topic in the Author visual editing mode, and paste the selected content. Oxygen's smart paste functionality will attempt to convert the HTML to DITA.
Method 3
  1. Open the Word document in the free Libre Office application and save it as DocBook
  2. Open the DocBook document in Oxygen.
  3. Run the predefined transformation scenario called DocBook to DITA.
Method 4
  1. If the Word document is in the new DOCX format you can open it in Oxygen's Archive Browser view and then open the document.xml file contained in the archive.
  2. Run the predefined transformation scenario called DOCX DITA. This ANT scenario runs the following build file: OXYGEN_INSTALL_DIR/frameworks/dita/DITA-OT/plugins/net.sourceforge.dita4publishers.word2dita/build-word2dita.xml over the DOCX archive and should produce a DITA project that contains a DITA map and multiple topics. 
  3. You may need to do some reconfiguring to map DOCX styles to DITA content. 
Note: This method may also be helpful if you want to run it automatically with scripts, since it is based on the DITA OT and DITA For Publishers plugins.

1 comment:

  1. Anonymous11:09 PM

    Wow--this is really helpful information. I've got a lot of Word docs I'm looking at converting to DITA. Started looking at your docs first (before I download a trial of your product)--I must say, you've got a lot of useful documentation and it looks really good at first glance.

    ReplyDelete