Thursday, May 21, 2015

Schematron Checks to help Technical Writing

Share to Facebook Share to Twitter Email This Share on Google Plus Share on Tumblr

The Oxygen XML Editor User's Manual is written in DITA. In an older post I described in more detail how we collaborate internally on our User's Guide Project. And we also made available a copy of our User's Manual as a project on GitHub.

During these years on working on it, we progressively developed a set of simple rules which were originally kept in a plain text document. The problem is that nobody can really remember all these rules when actually writing. So recently we started to migrate these rules to Schematron and have them reported automatically has validation warnings and errors while editing the topics. And with the release of Oxygen 17 we can now also add quick fixes for each of these problems.

So below I will try to tell you what each rule imposes and what it's Schematron implementation looks like. If you want to quickly test these rules on your side, you can add them to the Schematron file which is used by default to validate DITA topics located in: OXYGEN_INSTALL_DIR/frameworks/dita/resources/dita-1.2-for-xslt2-mandatory.sch.

  1. Try as much as possible to add at least an indexterm element in each topic. This is useful when the Index page is created for the PDF output or the Index tab is created for the WebHelp output. As this is not a requirement, we wanted to report this issue as an error. The end result looks like this:

    And the Schematron pattern looks like this:
    <pattern xmlns:sqf="http://www.schematron-quickfix.com/validator/process">
     <rule context="/*">
      <assert test="prolog/metadata/keywords/indexterm" role="warn" sqf:fix="addFragment">
          It is recommended to add an 'indexterm' in the current '<name/>' element.
      </assert>
      <!-- Quick fix to add the indexterm element element and its parents -->
      <sqf:fix id="addFragment">
          <sqf:description>
              <sqf:title>Add the 'indexterm' element</sqf:title>
          </sqf:description>      
          <sqf:add match="(title | titlealts | abstract | shortdesc)[last()]" position="after" use-when="not(prolog)">
             <prolog xmlns=""><metadata><keywords><indexterm> </indexterm></keywords> </metadata></prolog>
          </sqf:add>
      </sqf:fix>
     </rule>
    </pattern>
  2. The ID of each topic must be equal to the file name (minus the extension). One of the outputs we produce (I think CHM) had a limitation when building the context mapping between help IDs and actual HTML content so this was an important rule for us, thus an error is reported on this. Also a quick fix is added to auto-correct the topic ID based on the file name. The end result looks like this:

    and the Schematron pattern is:
    <!-- Topic ID must be equal to file name -->
    <sch:pattern>
     <sch:rule context="/*[1][contains(@class, ' topic/topic ')]">
      <sch:let name="reqId" value="replace(tokenize(document-uri(/), '/')[last()], '\.dita', '')"/>
      <sch:assert test="@id = $reqId" sqf:fix="setId">
       Topic ID must be equal to file name.
      </sch:assert>
      <sqf:fix id="setId">
       <sqf:description>
        <sqf:title>Set "<sch:value-of select="$reqId"/>" as a topic ID</sqf:title>
        <sqf:p>The topic ID must be equal to the file name.</sqf:p>
       </sqf:description>
       <sqf:replace match="@id" node-type="attribute" target="id" select="$reqId"/>
      </sqf:fix>
     </sch:rule>
    </sch:pattern>
  3. Report when plain links or related links to web resources have the same text inside them as the value of the @href attribute. We had cases in which writers would input web links like this:
    <xref href="http://www.google.com" format="html" scope="external">http://www.google.com</xref>
    which is redundant because when you set no text to the link, the publishing uses the @href attribute value as the link text. So we wanted to report such cases as warnings and to have a quick fix which removes the link text:

    The Schematron pattern looks like this:
    <sch:pattern>
     <sch:rule context="*[contains(@class, ' topic/xref ') or contains(@class, ' topic/link ')]">
      <sch:report test="@scope='external' and @href=text()" sqf:fix="removeText">
       Link text is same as @href attribute value. Please remove.
      </sch:report>
      <sqf:fix id="removeText">
       <sqf:description>
        <sqf:title>Remove redundant link text, text is same as @href value.</sqf:title>
       </sqf:description>
       <sqf:delete match="text()"/>
      </sqf:fix>
     </sch:rule>
    </sch:pattern>
  4. Avoid using the @scale attribute on images. We wanted to smooth scale images in an external image editor so it was prohibited to use the @scale attribute on images. The Schematron pattern for this:
    <pattern>
     <rule context="*[contains(@class, ' topic/image ')]">
      <assert test="not(@scale)">
       Dynamically scaled images are not properly displayed, you
       should scale the image with an image tool and keep it within
       the recommended with and height limits.
      </assert>
     </rule>
    </pattern>

We have an upcoming webinar dedicated to Schematron Quick Fixes. There is a W3C working group for XML Quick Fixes and you calso read the SQF Quick Fix specification if you want to become more familiar with the technology.
We also have a GitHub project which tries to combine the notion of a style guide for writing documentation inside a company with a very simple manner of defining checks which can be applied to impose the styleguide rules.

I would be interested in your feedback, especially if you have checks that you perform right now on your content and you consider that they might benefit others.

7 comments:

  1. Patrik Stellmann9:00 AM

    Another potentially more general use-case is the file name within the href attribute of an xref element. When renaming a file or moving referenced content to another file. Since in my framework the ids are globally unique I can still identify the correct file name and provide a QuickFix for it. However, this requires some custom java code so my sqf code is not general. Furthermore in standard dita the ids are not necessarily globally unique so there could be multiple quick-fix-options - which is not supported by sqf yet (or at least not very well - see sqf issue#3).

    Another thought: I've set the requirement to my own framework that every warning and error should be resolvable. So looking at your first example with the missing indexterm there might be cases that it is intentional to omit the indexterm. Thus, another quickfix could be to mark the topic as "without index term by intention" - for instance by adding a comment a pi. Of course, the schematron rule would have to check for this marker as well to hide the warning afterwards.

    More of my general schematron use-cases:
    - Empty topic, title, body, p, li, sli, ... element (Most elements should not be empty, so in my framework I marked those that are allowed to be empty.)
    - Whitespace at the end of an element with text content (often an indication that the sentence is not completed becaus you got interrupted when writing it)
    - Every draft-comment element (only as role="info")
    - Inconsistent existence of title element within ol/ul (every or no list item element should have a title)
    - Two consecutive ol or li elements (should me merged into a single list)
    - image element without @href

    Patrik

    ReplyDelete
    Replies
    1. Hi Patrik,

      Great point.

      Indeed a Schematron which identifies broken links and conrefs can also be easily created for DITA. But creating a quick fix for it is quite impossible as you do not know the new place where the referenced topic was placed.

      About that warning we give that an indexterm should be added to each topic, it bothers me too when contributing to the manual because in some topics you just don't want to add an index term. So indeed having a quick fix which would say something like "Ignore this warning" and adding a custom processing before the root could indeed be an answer for this.

      Thanks also for providing your other schematron rules, I can see potential in porting most of them to our user's manual as well.
      Regards,
      Radu

      Delete
    2. Hi Radu,

      about the quickfix for broken links: I'm using a java custom extension with maps to keep track of all files and ids in the current book. Thus, it is very easy to identify the file(s) where the id is defined in. At least for keyrefs you should have something similar in the dita framework when dealing with dita maps. I would expect this could be extended to store the ids as well. And calling java methods from xsl/schematron/sqf is no problemeither (I'm already doing this a lot).

      BTW: I have no experience with the DITA keyrefs and how oxygen supports it, but a schematron validation that checks for known keys might be useful as well - of course this would require to call methods from the dita extension bundle as well, but probably without the need to extend the java classes.

      Regards,
      Patrik

      Delete
    3. Hi Patrik,

      For DITA topics we already have a special validation stage which is pure Java based and checks for missing key references and so on. So we have this covered with this special DITA validation stage.
      What I like about this pure Java validation stage is that it is very fast, it parses the XML on SAX and it also has a DTDs cache so that it does not have to load the DTD's from disk every time the XML is parsed.

      Regards,
      Radu

      Delete
  2. Another idea will be to provide the current key space as a document accessible through a custom protocol, for example oxygen:/dita/keyscope may return an XML document with all key info and then this can be accessed from Schematron using the document function, without the need for a Java extension.

    ReplyDelete
  3. I just remembered another set of checks that turned out to be very useful when using variants by filtering the content according to @product and @audience (or other attributes containing value lists):
    - The attribute should contain no value that is missing in an ancestor element with the same attribute. (You can only remove values on descendant elements, not add one.)
    - The target of an xref must be available for all variants as the xref itself. (Otherwise there will be a variant where the xref can't be resolved.)

    This might be irrevelat for your users manual but I'd consider it to be useful for DITA in general.

    (I didn't implement this with schematron yet, but I use a pure xslt script to check a framemaker document which we didn't migrate to oxygen yet.)

    ReplyDelete
    Replies
    1. Hi Patrik,

      We already have both checks implemented in our internal Java code for DITA (in the "Validate and check for completeness" action). And I agree they are useful.
      Basically when validating you can specify all filters that will be used to produce the manual in all possible outputs. And the validation will be done for each filter, thus testing if a link may become broken when one filter is applied.

      I also implemented some Schematron checks based on your feedback, for example the check for empty elements and the check that there are no consecutive unordered list (for which a quick fix could also be implemented). And we found some issues our writers are working to fix right now. So thanks for that.

      Regards,
      Radu

      Delete