Edit online

Deleting Elements in a Refactoring Operation

7 Dec 2021
Read time: 7 minute(s)

Recently, a writer wanted to remove the index from their DITA book. This required the following:

  • Removing the <indexlists> element from the map:

    <backmatter>
      <booklists>
        <indexlist/>
      </booklists>
    </backmatter>
  • Removing topic-level <indexterm> elements from topic prologs:

    <topic id="feature_A">
      <title>About Feature A</title>
      <prolog>
        <metadata>
          <keywords>
            <indexterm>feature A</indexterm>
          </keywords>
        </metadata>
      </prolog>
  • Removing inline <indexterm> elements from topic content:

    <p>This is about <indexterm>feature B</indexterm>feature B.</p>

Oxygen provides a "Delete element" refactoring operation. However, it does precisely what it says—deletes the specified elements, leaving everything else in place:

<topic id="feature_A">
  <title>About Feature A</title>
  <prolog>
    <metadata>
      <keywords>

      </keywords>
    </metadata>
  </prolog>

I decided to create an XSLT refactoring operation that does the following:

  • Deletes the specified elements

  • Deletes any containing (ancestor) elements that became empty as a result

  • Updates whitespace/newline formatting around deleted elements as needed

  • Serves as an easily customizable template for other element deletion uses

Fortunately, as described in Custom Refactoring Operations, Oxygen allows us to package up customized XSLT refactoring operations in an easy-to-use way. For the XML descriptor file, put this content into a remove-index.xml file:

<?xml version="1.0" encoding="UTF-8"?>
<refactoringOperationDescriptor
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://www.oxygenxml.com/ns/xmlRefactoring" id="remove-index"
    name="Remove index from a DITA book">
    <description>Remove index terms and backmatter index from a DITA book.</description>
    <script type="XSLT" href="remove-index.xsl"/>
    <category>DITA</category>
</refactoringOperationDescriptor>

For the XSLT file itself, put this content into a remove-index.xsl file:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs"
  version="2.0">

  <!-- elements to delete -->
  <xsl:variable name="elements-to-delete" select="('indexterm', 'indexlist')"/>

  <!-- delete up to (and including) these elements, if they become empty -->
  <xsl:variable name="delete-up-to" select="('prolog', 'backmatter')"/>


  <!-- baseline identity transform -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>


  <!-- remove elements-to-delete -->
  <xsl:template match="*[name() = $elements-to-delete]"/>

  <!-- remove whitespace/newlines before elements-to-delete -->
  <xsl:template match="text()
                       [following-sibling::*[1]
                         [name() = $elements-to-delete]]
                       [matches(., '^\s*\n\s*$')]"/>


  <!-- remove elements that contain our to-be-deleted elements,
       but only if they become empty -->
  <xsl:template match="*[ancestor-or-self::*[name() = $delete-up-to]]
                        [descendant::*[name() = $elements-to-delete]]">

    <!-- apply templates to this element's contents and see what we get -->
    <xsl:variable name="contents" as="node()*">
      <xsl:apply-templates select="node()"/>
    </xsl:variable>

    <!-- if children elements remain, copy this element (and its preceding whitespace/newlines)
         and put its contents inside -->
    <xsl:if test="$contents[self::*]">
      <xsl:copy select="preceding-sibling::node()[1][self::text()][matches(., '^\s*\n\s*$')]"/>
      <xsl:copy select=".">
        <xsl:sequence select="$contents"/>
      </xsl:copy>
    </xsl:if>
  </xsl:template>

  <!-- remove whitespace/newlines before elements-to-delete
       (we re-add whitespace/newlines above, if needed -->
  <xsl:template match="text()
                       [following-sibling::*[1]
                         [ancestor-or-self::*[name() = $delete-up-to]]
                         [descendant::*[name() = $elements-to-delete]]]
                       [matches(., '^\s*\n\s*$')]"/>

</xsl:stylesheet>

At the beginning of the refactoring operation, two XSLT variables are defined:

  • elements-to-delete - the element names to delete, regardless of their contents

  • delete-up-to - the highest-level containing element names to delete, if they become empty

The refactoring operation works as follows:

  • The elements-to-delete elements are always deleted.

    • Any whitespace/newline text() nodes directly preceding them are also deleted.

  • Any elements that (1) contain an elements-to-delete element as a descendant, (2) are contained by or are themselves a delete-up-to element, and (3) become empty due to the element deletion, are deleted.

    • To determine if a "containing" element becomes empty due to the deletion, <xsl:apply-templates> is called, then the results are checked to see if any elements remain. This is what allows the deletion to continue dynamically up through the containing elements.

  • To conditionally keep the whitespace/newline text() node directly preceding a "containing" element,

    • A standalone unconditional template always deletes the whitespace/newline text() node preceding a containing element, whether it will be kept or not.

    • Inside the template that conditionally keeps containing elements, that same preceding text() node is re-included if the containing element is kept.

The following example shows a <prolog> element that disappears completely because it does not contain anything other than an <indexterm> element:

Before refactoring After refactoring
<topic id="feature_A">
  <title>About Feature A</title>
  <prolog>
    <metadata>
      <keywords>
        <indexterm>feature A</indexterm>
      </keywords>
    </metadata>
  </prolog>
<topic id="feature_A">
  <title>About Feature A</title>

The following example shows a <prolog> element that is partially kept because it also contains a <resourceid> element:

Before refactoring After refactoring
<topic id="feature_A">
  <title>About Feature A</title>
  <prolog>
    <metadata>
      <keywords>
        <indexterm>feature A</indexterm>
      </keywords>
    </metadata>
    <resourceid id="feature_A"/>
  </prolog>
<topic id="feature_A">
  <title>About Feature A</title>
  <prolog>
    <resourceid id="feature_A"/>
  </prolog>

This same refactoring code can be adapted to other use cases by editing the elements-to-delete and delete-up-to variables as needed.