How Special Paste works in Oxygen

Read time: 4 minute(s)

If you've worked with one of the XML vocabularies for which Oxygen has out of the box support like DITA, Docbook, TEI, XHTML you've probably already used the support Oxygen has for converting content pasted in the application from external applications like Microsoft Word, Excel or from any web browser. This is a very useful feature for converting various types of content to XML because it preserves and converts styling, links, lists, tables and image references.

The feature relies on the fact that when copying content in the applications mentioned above, they set in the clipboard the HTML equivalent of the copied content. So all Oxygen has to do is clean up that HTML, make it wellformed XHTML and apply conversion XSLT stylesheets over it.

This support is not hardcoded and anybody who is developing an Oxygen framework customization for a certain XML vocabulary can provide conversion stylesheets for external pasted HTML content.

I will describe how this works for the DITA framework and you can do the same for yours. You can also use this information to modify the way in which smart paste works for the bundled framework configurations.

In the Preferences->Document Type Association page you can choose to edit (or extend) the DITA document type association.
In the Extensions tab the Extensions bundle implementation is set to DITAExtensionsBundle which resides in the DITA Java extensions archive dita.jar.

The DITAExtensionsBundle is an extension of the ExtensionsBundle API and it provides its own external object extension handler:

  /**
   * @see ro.sync.ecss.extensions.api.ExtensionsBundle#createExternalObjectInsertionHandler()
   */
  @Override
  public AuthorExternalObjectInsertionHandler createExternalObjectInsertionHandler() {
    return new DITAExternalObjectInsertionHandler();
  }

The DITAExternalObjectInsertionHandler extends the base class AuthorExternalObjectInsertionHandler and provides a reference to its specific conversion stylesheet:
```
  /**
   * @see ro.sync.ecss.extensions.api.AuthorExternalObjectInsertionHandler#getImporterStylesheetFileName(ro.sync.ecss.extensions.api.AuthorAccess)
   */
  @Override
  protected String getImporterStylesheetFileName(AuthorAccess authorAccess) {
    return "xhtml2ditaDriver.xsl";
  }
```
Note:
The Extensions tab also allows you to specify the external object insertion handler as a separate extension.
In the same Document Type edit dialog in the Classpath tab you will see that there is a reference to a framework-specific resources folder like:${framework}/resources/
If you look on disk in the DITA framework resources folder: "OXYGEN_INSTALL_DIR\frameworks\dita\resources" you will find the xhtml2ditaDriver.xsl stylesheet there. The stylesheet imports various other stylesheets which you could probably fully reuse and which apply various cleanups on HTML produced with MS Word. It also handles the conversion between the pasted HTML content and DITA so it is a good starting point, you can copy the entire set of XSLT stylesheets to your framework and use those as a starting point.