Edit online

Generating Google Structured Data from your DITA frequently asked questions

Contributed by: Radu Coravu on 17 May 2022

HTML pages published on the web can contain metadata specified using the Google Structured Data specification. Once such metadata exists in an HTML page, the Google search engine can present, for example, lists of frequently asked questions directly in the search page without the need to open the target HTML page. Below is a set of steps for automatically generating Google Structured Data metadata for DITA frequently asked questions when publishing DITA content to Oxygen WebHelp Responsive output, which can be customized using a publishing template mechanism.

  1. In your DITA project, create a topic with a specific @outputclass attribute value to signal that you want the Google structured data to be automatically generated for it.
    <topic id="frequently_asked_questions" outputclass="google-structured-data-faq">
        <title>Frequently Asked Questions</title>
        <body>
            <section>
                <title>How do I register to receive notifications for new blog posts?</title>
                <p>Each blog HTML page has at the end a form in which you can fill your email address if
                    you want to be notified when new posts are made.</p>
            </section>
    ..............
  2. Inside a WebHelp publishing template folder, there is an opt file that can contain links to various XSLT stylesheets that are useful for customizations. For example, we'll add a link to a stylesheet for processing such special DITA topics and producing a special script that contains details for each question/answer pair.
    <publishing-template>
        <name>.....</name>
        ......
            <xslt>
                ....
                <extension file="xslt/addGoogleStructuredData.xsl" id="com.oxygenxml.webhelp.xsl.dita2webhelp"/>
                .....
            </xslt>
        </webhelp>
    </publishing-template>
  3. Create the addGoogleStructuredData.xsl XSLT stylesheet that processes the task contents and adds a script in the HTML head that contains the frequently asked questions in Google Structured Data format.
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
      exclude-result-prefixes="xs"
      version="2.0">
      <xsl:template match="*[contains(@class, ' topic/prolog ')]">
        <xsl:choose>
          <xsl:when test="/*[@outputclass='google-structured-data-faq']">
            <xsl:apply-templates select="/*" mode="google-structured-data-faq"/>
          </xsl:when>
        </xsl:choose>
        <xsl:next-match/>
      </xsl:template>
      
      <xsl:template match="*" mode="google-structured-data-faq">
        <script type="application/ld+json">
         {
          "@context": "https://schema.org",
          "@type": "FAQPage",
          "mainEntity": [
            <xsl:for-each select="body/section">
              {
                "@type": "Question",
                "name": "<xsl:value-of select="normalize-space(title)"/>",
                "acceptedAnswer": {
                "@type": "Answer",
                "text": "<xsl:value-of select="normalize-space(string-join(*[not(self::title)], ''))"/>"
                }
              }
              <xsl:if test="position() &lt; last()">,</xsl:if>
            </xsl:for-each>
          ]
         }
        </script>
      </xsl:template>
      
    </xsl:stylesheet>
  4. Publish the DITA XML Content to a web site using the WebHelp Responsive transformation.
  5. Test your HTML page using the Google Rich Results Tester: https://search.google.com/test/rich-results.
  6. Once Google indexes your page, google search for it.