Ravn Webveveriet logo

Modelling multi-language glossaries in EPUB

The epub-reader Unibok already had support for a glossary with one description, but one publisher wanted to add support for several translations. The reader would need to be adjusted so that readers could choose their target language, and then only the translation in the chosen language would be presented to them.

Unibok is a solution for storing and reading epubs for teachers and pupils in Norwegian schools. The reader is based on the Readium framework, and the books are all reflowable EPUB 3. The publishers create the EPUBs specifically for Unibok, and the schools buy annual licenses in order to access the books.

ePUB logo

The problem

For a new book, one publisher needed to include glossary with translations in several languages, for pupils who have recently moved to Norway and are learning Norwegian. We already have support for glossary with one description, but needed to add support for several translations. The reader would need to be adjusted so that pupils can choose their target language, and then only the translation in the chosen language will be presented to them.

For this to work, the EPUB would need to include the translations in a format that the reader can understand, and then the reader would need to select the correct one based on user-settings.

Existing glossary markup:

<dl epub:type="glossary">
  <dt data-glossary="term" id="def1">
    <dfn>ord</dfn>
  </dt>
  <dd data-glossary="definition">
    <p>word</p>
  </dd>
</dl>

The standard

As far as we could tell, there is no description for this exact need in the standards for EPUB 3. The Dictionary and Glossary profile, which was finalized in its first version in 2015, describes some similar use cases, but not this one exactly.

One option for the solution would be to create the book as a dictionary publication, with the markup that the profile describes for such publications. The markup would look like this:

<body epub:type="dictionary" lang="no">
  <article id="def1"> 
    <dfn>ord</dfn>
    <span epub:type="tran" lang="en">word</span>
    <span epub:type="tran" lang="fr">mot</span>
  </article>
</body>

In addition, the publishers would need to include a search key map, which should list all possible versions of all the terms included in the dictionary file. This looks something like this:

<search-key-map xml:lang="no" xmlns="http://www.idpf.org/2007/ops">
  <search-key-group href="dict.xhtml#def1">
    <match title="ord" value="ord">
      <value value="ordet" />
      <value value="ordene" />
    </match>
  </search-key-group>
</search-key-map>

This seemed like a bit of an overkill for the publisher’s need, because it is meant for readers to know what to search for when a user clicks on a word in the text. In this case, the words would be predefined by the publisher, and it would not be possible for pupils to translate random words in the text.

The other option we identified, was to extend the current glossary-format to include descriptions in several languages, using the HTML lang-attribute.

The solution

After testing the two options, we decided to go with the smallest change to the existing solution, and extend the current glossary component to be allowed to include several description elements, each marked with the html lang-attribute. The reader will then display the option that matches the language selected by the pupil.

New glossary markup:

<dl epub:type="glossary">
  <dt data-glossary="term" id="def1">
    <dfn>ord</dfn>
  </dt>
  <dd data-glossary="definition" lang="en">
    <p>word</p>
  </dd>
  <dd data-glossary="definition" lang="fr">
    <p>mot</p>
  </dd>
</dl>

The trade-off

One leading principle for Unibok has always been to adhere to the standard, in such a way that the publishers should be able to reuse their EPUBs in any upcoming system that might arrive on the market. With this solution, this principle would be broken. However, since there is no standard that exactly answers to this specific need, that would probably be the case no matter what we ended up with.

In the end, we decided that it would be best to go for the pragmatic solution, and the one that wouldn’t add too much to the publisher’s production workflow.