Wednesday, May 25, 2016

How to Migrate from Word to DITA

Share to Facebook Share to Twitter Email This Share on Google Plus Share on Tumblr
The need for migrating Microsoft Office® Word documents to XML formats, and particularly to DITA, is quite a frequently encountered situation. As usual, migration from proprietary formats to XML is never perfect and manual changes need to be made to the converted content. However, the methods below should help you find the best approach for your particular case:

 Method 1
  1. Open the Word document in MS Office, select all the content, and copy it.
  2. Open Oxygen and create a new DITA topic in the Author visual editing mode. 
  3. Paste the selected content. Oxygen's smart paste functionality will attempt to convert the content to DITA.
Method 2
  1. Save your MS Office Word document as HTML.
  2. Once you obtain that HTML, you have two possibilities:
    • In Oxygen, Select File->Import->HTML File to import the HTML as XHTML. Then open the XHTML in Oxygen and in the "Transformation Scenarios" view there should be four pre-configured transformation scenarios to convert XHTML to DITA topics, tasks, references, or concepts.
    • Open the HTML file in any Web browser, select all of its content, and copy it. Then open Oxygen, create a new DITA topic in the Author visual editing mode, and paste the selected content. Oxygen's smart paste functionality will attempt to convert the HTML to DITA.
Method 3
  1. Open the Word document in the free Libre Office application and save it as DocBook
  2. Open the DocBook document in Oxygen.
  3. Run the predefined transformation scenario called DocBook to DITA.
Method 4
  1. If the Word document is in the new DOCX format you can open it in Oxygen's Archive Browser view and then open the document.xml file contained in the archive.
  2. Run the predefined transformation scenario called DOCX DITA. This ANT scenario runs the following build file: OXYGEN_INSTALL_DIR/frameworks/dita/DITA-OT/plugins/net.sourceforge.dita4publishers.word2dita/build-word2dita.xml over the DOCX archive and should produce a DITA project that contains a DITA map and multiple topics. 
  3. You may need to do some reconfiguring to map DOCX styles to DITA content. 
Note: This method may also be helpful if you want to run it automatically with scripts, since it is based on the DITA OT and DITA For Publishers plugins.

5 comments:

  1. Anonymous11:09 PM

    Wow--this is really helpful information. I've got a lot of Word docs I'm looking at converting to DITA. Started looking at your docs first (before I download a trial of your product)--I must say, you've got a lot of useful documentation and it looks really good at first glance.

    ReplyDelete
  2. Anonymous12:22 PM

    I tried method 4 but the transformation ran into an error when used on the document.xml. The log says: archive is not a ZIP archive ... did I understand correctly that the transformation is to be used on the document.xml within the docx-archive?

    ReplyDelete
    Replies
    1. Maybe you could write us at "support@oxygenxml.com" and give more details, for example a screenshot showing what transformation scenario you apply and also the entire error message.

      Delete
  3. Anonymous7:25 PM

    @Radu Coravu

    Tried Method 4, transformation failed with:


    BUILD FAILED
    C:\Program Files\Oxygen XML Author 16\frameworks\dita\DITA-OT\plugins\net.sourceforge.dita4publishers.word2dita\build-word2dita.xml:151: The following error occurred while executing this line:
    C:\Program Files\Oxygen XML Author 16\frameworks\dita\DITA-OT\plugins\net.sourceforge.dita4publishers.word2dita\build-word2dita.xml:195: Fatal error during transformation using C:\Program Files\Oxygen XML Author 16\frameworks\dita\DITA-OT\plugins\net.sourceforge.dita4publishers.word2dita\xsl\docx2dita.xsl: An empty sequence is not allowed as the value of variable $styleName; SystemID: file:/C:/Program%20Files/Oxygen%20XML%20Author%2016/frameworks/dita/DITA-OT/plugins/net.sourceforge.dita4publishers.word2dita/xsl/simple2dita.xsl; Line#: 1862; Column#: -1

    ReplyDelete
    Replies
    1. This depends on your current Word content and also on what version of the DITA 4 Publishers converter is used. For example Oxygen 19.0 comes with a newer convertor so you might want to also try it for converting your Word documents.
      The DITA For Publishers Word to DITA User's Guide can be found here:
      http://www.dita4publishers.org/d4p-users-guide/user_docs/d4p-users-guide/word2dita/word2dita-intro.html

      Delete