Word to DITA Conversion FAQ
How can I fix unrecognized style warnings?
When converting a Word document, the styles that don't have a mapping into the Word styles mapping table from the preferences page are converted to simple paragraph elements and a warning will be reported for each of them in the Results view.
- Open the Plugins / Batch Documents Converter preferences page from Oxygen.
- For fixing the
Unrecognized "Document Title" style for "p" Word element
warning add a new row in the Word styles mapping table with the following cells:- Type the "p" text into the Word element cell, because the unrecognized style was found on a Word paragraph.
- Type the "Document Title" into the Word style cell.
- In the "HTML elements" cell you have to add a corresponding HTML element. For
this one, a corresponding element is "
<h1>
" as for the default mapping of "Title" style. So, type "h1:fresh" into this cell. The ":fresh" suffix instructs the converter to create a new element every time it finds this kind of paragraph. When it's not set, the converter will try to reuse the elements and combine sequences of the same style paragraphs into a single element.
- For fixing the
Unrecognized "Keyboard Key" style for "r" Word element
warning add a new row in the Word styles mapping table with the following cells:- Type the "r" text into the Word element cell, because the unrecognized style was found on a Word character.
- Type the "Keyboard Key" into the Word style cell.
- In the HTML elements cell you have to add a corresponding
HTML element. For this one, a corresponding element is
"
<kbd>
" as for the default mapping of "HTML Keyboard" style. So, type "kbd" in this cell.
p | Document Title | h1:fresh |
r | Keyboard Key | kbd |
For more information about the Word styles mapping configuration see this section from the documentation: Conversions from Word
How can I configure the styles mapping when the wanted element doesn't exist in HTML?
A frequent case when setting the mapping configuration for a custom style is not to find a correspondent element in HTML, although one exists in DITA.
Let's take the following example: I have a Word document that contains a character custom style named "filepath". We know that a correspondent element exists in DITA, but we cannot find one in HTML.
These are the steps that can be applied to handle this case:
- Go to the Plugins / Batch Documents Converter
preferences page and add the following mapping into the Word
styles mapping table:
r Filepath i.filepath - Convert the Word document to DITA. The characters styled with the
Filepath style in Word are converted to the
<i>
element with the "filepath"@outputclass
attribute on DITA. - Move the output files into your project, select them and apply the
"Rename element" refactoring operation using the
"//i[@outputclass = 'filepath']" XPath for matching the target
<i>
element and changing them to the<filepath>
DITA element. - Apply the "Remove attribute" refactoring operation using the
"//filepath[@outputclass = 'filepath']" XPath for matching the
target elements and deleting the
@outputclass
attributes.
Instead of step 3 and 4 you can also create a custom refactoring operation that makes these two changes, like this:
- Create an XSLT file (for example, named
batch-converter-post-processing.xsl) that iterates over all
elements from the document finds the
<i>
elements with the "filepath"@outputclass
attribute and replaces them with the<filepath>
elements without copying the@outputclass
attribute:<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0"> <xsl:template match="@* | node()"> <xsl:copy> <xsl:apply-templates select="@* | node()"/> </xsl:copy> </xsl:template> <xsl:template match="i[@outputclass = 'filepath']"> <xsl:element name="filepath"> <xsl:apply-templates select="node()"/> </xsl:element> </xsl:template> </xsl:stylesheet>
- Create an XML Refactoring operation descriptor (for example, named
batch-converter-post-processing.xml) that references the
stylesheet and provides
descriptions:
<?xml version="1.0" encoding="UTF-8"?> <refactoringOperationDescriptor xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.oxygenxml.com/ns/xmlRefactoring" id="op_qzq_y2x_nsb" name="Post-processing Batch Documents Converter"> <description>Post-process the resulting DITA documents from the Word conversion using the Batch Documents Converter add-on.</description> <script type="XSLT" href="batch-converter-post-processing.xsl"/> </refactoringOperationDescriptor>
- Copy these two files into a folder scanned by Oxygen XML Editor when it loads the custom operation.
- Apply the new custom operation named "Post-processing Batch Documents Converter" that can be found now into the list of refactoring operations from the "XML Refactoring" dialog.
How can I configure the styles mapping for paragraphs styled as code blocks?
To add a mapping for this custom style, follow these steps:
- Open the Options → Preferences → Plugins → Batch Documents Converter preferences page in Oxygen.
- Add a new row to the Word styles mapping table.
- Enter "p" in the Word element cell, and enter "Code Snippet" in the Word style cell to match the paragraphs styled with "Code Snippet".
- Enter "pre:separator('\n')" in the HTML elements cell.
The
<pre>
element is the corresponding HTML element for these types of paragraphs. Since we want to merge these sequences of "Code Snippet" style paragraphs, the ":fresh" marker was not used. The ":separator('')" syntax configures a separator when the same type of consecutive paragraphs are merged. If it isn't specified, we would obtain a<pre>
element with a single line of text.
Using this configuration, a <pre>
element is the result in
the converted DITA output for every code block sequence. For obtaining
<codeblock>
elements in DITA, see How can I configure the styles mapping when the wanted element doesn't exist in HTML? for setting a class attribute on the resulting <pre>
HTML
element (by setting "pre.codeblock:separator('\n')" in the HTML
elements cell) and creating a custom refactoring operation.