Friday, September 15, 2017

Small problems with the DITA standard

Share to Facebook Share to Twitter Email This Share on Google Plus Share on Tumblr

Don't get me wrong, I think that DITA is a great standard for writing technical documentation, it has lots of reuse and linking potential and it's in general well thought and comprehensive.

Over the years various of our Oxygen XML Editor clients or me personally have encountered various limitations/quirks in the DITA standard that I wanted to share below. These complaints are not related at all to the publishing part and I think that some of them will probably be resolved as the DITA standard evolves from one version to another. Also probably a good part of these issues could not be effectively fixed because the DITA standard has strived (and succeeded) to be backward compatible. So here we go:
  • When I create a DITA specialization in order to add a new attribute according to the specification I need to add that attribute to all DITA elements. I think the main idea was that the new attribute is a profiling attribute so it makes sense to be added to all elements but sometimes, just sometimes you would need to funnel this behavior and make the attribute available only on a certain element and still consider the specialization as a valid DITA specialization.
  • The existing xml:lang attribute cannot be used to profile and filter out content. In the past we had users somehow mixing languages in the same DITA Map and expecting to output either for one language or the other by filtering out based on the xml:lang attribute.
  • You cannot easily profile/filter out an entire column from a CALS table. For example you cannot add a profiling attribute directly to the DITA colspec element in order to remove entire table columns when publishing. So the alternative is to use a DITA simpletable and define the profiling attribute on each of the cells in the column or to perform some kind of output customization based on a magic outputclass attribute set on a certain element in the table.
  • There are too many constraints imposed when writing DITA Specializations. Robert Anderson, the DITA OT project manager and OASIS member involved in defining the DITA standard had two interesting blog posts on this:
  • With most of the material being published for web and with the need to dynamic include media resources (video, audio) in the published HTML content it's a pity that the DITA standard does not yet have specialized <audio> and <video> elements. Again we need to rely on the magic outputclass attribute to give semantic to the too generic DITA <object> element.
  • Sometimes there are two or more ways of doing something. For example choosing between using a CALS table or a simple table or choosing between using conkeyref, keyref or conref to reuse small pieces of text. Why have the <simpletable> at all in the DITA standard as long as a CALS table without any cells spanning is simple enough? The LightWeight DITA Project is an alternative to DITA which tries to simplify the standard and eliminate such problems: http://dita.xml.org/blog/lightweight-dita.
  • DITA elements which have conrefs or conkeyrefs need to also have the required content specified in them. So I cannot simply do this:
    <table conref="path/to/target.dita#topicID/elementID"/>
    and instead I need to do this:
    <table conref="path/to/target.dita#topicID/elementID">
      <tgroup cols="1">
       <tbody>
         <row>
           <entry/>
         </row>
      </tbody>
     </tgroup>
    </table>
    and have all the required table elements and tgroup elements plus required attributes filled out even if the expanded conref will replace the entire original element.
  • You cannot refer using a key directly to a subtopic element. If the standard would allow a DITA Map to refer directly to a subtopic element like this:
    <keydef href="topics/reusableComponents.dita#topicID/tableID" keys="reused_table"/>
    you could reuse the table without needing to specify the ID to the reused element on each conkeyref:
    <table conkeyref="reused_table"/>
  • Some DITA elements (eg: <li>, <entry>, <section>) have a very relaxed content model in the specification allowing both text and block elements, in any order. So when using visual editing tools this leads technical writers create DITA content looking like this:
        <li>
            Preview:
            <p>Here are some of the preview</p>
        </li>
    as the visual editing tool cannot by default impose an editing constraint if the standard does not. Usually for such cases additional Schematron checks can be handy.
  • The DITA content is not automatically profiled based on the new DITA 1.3 deliveryTarget attribute. So setting deliveryTarget="pdf" on a DITA element will not automatically filter it out of the HTML based outputs, the attribute is treated just like another profiling attribute and it can be filtered out from the DITAVAL file.

This concludes my complaint list. Anything else you encountered in the DITA standard which bothers you?

11 comments:

  1. Hi Radu,

    thank you for this summary, you should pass it to the OASIS DITA TC. I agree to most of your points, except those:

    - xml:lang indicates the language of the textual content of an element. If you have mixed language content, it would not work to filter the content with this attribute. However a second "lang" or "language" attribute should be available for filtering.
    - deliveryTarget="pdf" should filter out HTML: This would add semantic to attributes, this would always lead to unexpected results. You can achieve the same thing, I think, by simply doing:
    <prop action="exclude" att="deliveryTarget"/>
    <prop action="include" att="deliveryTarget" val="pdf"/>


    What is missing in the list is
    - The need for a WEB Map, because you, for example, don't have the xnal domain available on a standard map. So you're forced to use a BOOK Map to publish HTML-based outputs. That's weird in irritates our writers.
    - Constraints and rules how to use the release management domain and how it should be processed.
    - A standard element for cover images, as a replacement for <data name="cover-image"/> on bookmaps.

    This is what currently strikes me, ATM.

    ReplyDelete
    Replies
    1. Good idea about posting on the DITA TC list, I just did that.
      About the xml:lang being used for profiling, if people do not want to filter based on it they will not use it in the DITAVAL...
      Interesting idea for a new WebMap specialization...
      Also adding some special constructs to define the cover image and text in the BookMap specialization would indeed be interesting.
      I never used the release management domain for anything so I cannot express any opinion on it.

      Delete
  2. François Violette12:29 AM

    I share some of the pain points have been mentioned already (how/when to choose between conref, keyref and conkeyref... and how to transition, required elements when con(key)reffing, and table conditioning). Not going back on these.

    My current picks tonight:

    - Single allowed in leads to verbose tagging when writing mostly for the Web, about highly modular solutions. Specialization, preprocessing or heavy tool support are required much too quickly.

    - With elements, for instance, subjectSchemes do not take into account the value of @name to filter values of @content (
    and/or limitations) => with two elements of different @name-s, controlled values cannot be enforced on any othermeta/@content (with these two attributes only).

    - Mandatory attributes for elements that have the @conref attribute in referencing files could be missing, unless they need to be overridden. For instance: @version for . How the -dita-use-conref-target is presented in the documentation and the various discussions about the subject seem to reflect this ambiguity.

    François

    ReplyDelete
    Replies
    1. Unfortunately the blogging platform does not seem to render XML tags. So I did not understand your first point.
      The second point is probably about the "data" element and restricting the values of one attribute based on the values of another attribute. Indeed it would be cool if the Subject Scheme Map could have this capability. Maybe even if the standard does not offer this Oxygen could rely on some kind of trick using the @outputclass attribute on the elementdef to offer such restricted values depending on a certain @name attribute value though...something like:
      <enumerationdef>
      <elementdef name="data" outputclass="@name='test'"/>
      <subjectdef keyref="audienceSbjKey"/>
      </enumerationdef>

      Delete
  3. Anonymous10:56 AM

    I have to say I am seriously disappointed with DITA as a whole concept and "way of life".

    I learned the basics last year quickly and easily with http://learningdita.com/ and created a huge amount of raw XML content with oXygen.

    So far so good.

    When it comes to the next steps the amount of investment seems to be massively disproportionate:

    1. Creating basic results to get buy-in within the organization that is supposed to adopt DITA

    2. Tailoring the workflows to produce what you really need.

    3. As you note, feedback to the standardization bodies seems geologically slow. Where is the support for API docs? How long have HTTP APIs been around?

    It seems like a large part of oXygen's user base is DITA-aware but for new entrants to this field I don't see it as exactly adopter-friendly.

    The new collaboration tool is a big step in the right direction for getting buy-in but this doesn't address the workflow / publication issues.

    DITA is too useful to be left to a dispersed ecosystem of consultants and one-off project coders.

    ReplyDelete
    Replies
    1. Indeed as a tech writer it is hard to convince an organization to switch to another editing format. Usually consultants are the ones capable of doing this because they also speak with management. You somehow need to present the financial advantage in doing so, usually such an advantage coming from lowered translation costs.

      I also agree that it takes time and technical knowledge to produce the output in a way in which it adheres with particular company guidelines. For example for our classic WebHelp output we created an online skin builder, probably in a future version we'll have something similar for our PDF output.

      About incorporating API docs support in DITA, the standard cannot be grown indefinitely, probably at some point the DITA standard needs to remain and contain only base elements and then have separate satellite standards adding various commonly used specializations.

      I think that the dispersed ecosystem is an advantage of any standard in general. Otherwise there are closed source solutions to which you can orient, but in such cases you depend fully on a certain tool and on the trainers offered by that particular company.
      I agree there needs to be more involvement from the community in coding the DITA Open Toolkit publishing engine but this is not something directly related to the standard.

      Delete
  4. François Violette10:59 AM

    I can see that... It would be nice if we could edit posts.

    If you can force that update on your side:

    - Single prodname in prodinfo
    - With othemeta elements
    - For instance: @version for vrm


    ReplyDelete
    Replies
    1. I can approve posts but it seems that the blogging platform does not let me edit them. Anyway now I understand what you mean.

      Delete
    2. François Violette2:05 PM

      Thanks for looking into it. Pity I didn't preview the post!

      Delete
  5. Great list Radu, thanks for that!

    I have often thought about the described xml:lang and the several-ways-referencing problems.

    Concerning the deliveryTarget attribute I noticed another problem: In my understanding deliveryTarget="pdf" is a delivery format and not (like the attribute name suggests) a delivery target. A delivery target would be "UserManual", "ServiceManual", "InstallationGuide" or something like that. This definition of "delivery target" would group two other profiling attributes: the existing "audience" and the new "deliveryFormat". I think in many cases an author should not think about output format. He/She should think about (and set filtering for) the publication channel he wants to write for. Of course, the author should keep in mind which audience and export format is related to a specific delivery target.

    Another problem I have figured out is the predefined, fixed and not adjustable and/or-filtering logic for combined filtering attributes and for combined filtering values. However, I think not many users are facing this problem. For details have a look in this User Group thread:
    https://groups.yahoo.com/neo/groups/dita-users/conversations/topics/42463

    ReplyDelete
    Replies
    1. I gave Jarno Elovirta a link to the particular DITA Users List discussion. Jarno is considering making available some kind of API Java or Javascript interface which the filtering stage could call and which could pass the decision of filtering content to some user defined specific code.

      Delete