Friday, June 16, 2017

Checking terminology when editing in Oxygen XML Editor

Share to Facebook Share to Twitter Email This Share on Google Plus Share on Tumblr
In this blog post I will offer a general overview about the current possibilities you have available to impose your own language checking rules when working with Oxygen XML Editor.

Built-in support

Oxygen comes bundled with the popular Hunspell spell checker and along with the regular bundled dictionaries for English, German, Spanish, and French, it allows you to install new dictionaries, either for other languages or custom dictionaries (for example, dictionaries for medical terms) that you can build separately: https://www.oxygenxml.com/doc/versions/19.0/ug-editor/topics/spell-dictionary-Hunspell.html.

Besides the spell checker, Oxygen also has support for Auto-correct and you can add your own Auto-correct pairs to Oxygen.

Commercial alternatives

Acrolinx is a very popular commercial tool for checking content for consistency and terminology. The plugins that Acrolinx developed for Oxygen standalone, Oxygen Eclipse plugin and Oxygen Web Author allow you to run the Acrolinx checker directly from inside the application.

HyperSTE is another popular commercial tool for checking content and terminology. They also have a plugin for Oxygen standalone.

Open-source alternatives

LanguageTools is an open-source proof­reading program for English, French, German, Polish, and more than 20 other languages . There is an open-source plugin for Oxygen available on GitHub.

The DITA Open Toolkit terminology checker plugin from Doctales contains Schematron rules to check that various words adhere to the terminology dictionaries that are custom built using DITA.

Building your own terminology checker

The fastest and simplest way to build a simple terminology checker is by using Schematron rules. The Doctales plugin is a good example for this.

At some point, as the terminology dictionary keeps growing, you may encounter delays and slow downs when editing the documents and validating it using the custom Schematron rules. So an alternative to this is by using our Author SDK to build your own Oxygen plugin, which can use our API to check the content and then add highlights. The LanguageTools open-source plugin may be a good starting example for this.

Monday, June 12, 2017

Batch converting HTML to XHTML

Share to Facebook Share to Twitter Email This Share on Google Plus Share on Tumblr

Suppose you have a bunch of possibly "not-wellformed" HTML documents already created and you want to process them using XSLT. For example, you may want to migrate the HTML documents to DITA using the predefined XHTML to DITA Topic transformation scenario available in Oxygen. So you need to create valid XML wellformed XHTML documents from the existing HTML documents and you need to do this in a batch processing automated fashion.

There are lots of open source projects that deliver processors that can convert HTML to its wellformed XHTML equivalent. For this blog post, we'll use HTML Tidy. Here are some steps to automate this process:
  1. Create a new folder on your hard drive (for example, I created one on my Desktop: C:\Users\radu_coravu\Desktop\tidy).
  2. Download the HTML Tidy executable specific for your platform (http://binaries.html-tidy.org/) and place it in the folder you created in step 1.
  3. In that same folder, create an ANT build file called build.xml with the following content:
    <project basedir="." name="TidyUpHTMLtoXHTML" default="main">
        <basename property="filename" file="${file}"/>
      <target name="main">
          <exec command="tidy.exe -o ${output.dir}/${filename} ${file}"/>
      </target>
    </project>
  4. In the Oxygen Project view, link the entire folder where the original HTML documents are located.
  5. Right-click the folder, choose Transform->Configure Transformation Scenarios... and create a new transformation scenario of the type: ANT Scenario. Modify the following properties in the transformation scenario:
    1. Change the scenario name to something relevant, like HTML to XHTML.
    2. Change the Working Directory to point to the folder where the ANT build file is located (in my case: C:\Users\radu_coravu\Desktop\tidy).
    3. Change the Build file to point to your custom build.xml (in my case: C:\Users\radu_coravu\Desktop\tidy\build.xml).
    4. In the Parameters tab, add a parameter called file with the value ${cf} and a parameter called output.dir with the value of the path to the output folder where the equivalent XHTML files will be stored (in my case, I set it to: C:\Users\radu_coravu\Desktop\testOutputXHTML).
  6. Apply the new transformation scenario on the entire folder that contains the HTML documents. When it finishes, in the output folder you will find the XHTML equivalents of the original HTML files (XHTML documents that can later be processed using XML technologies such as XSLT or XQuery).