epubtools

Library to convert and check EPUB 2 and 3

Repository
Git URL https://github.com/transpect/epubtools.git
SVN URL https://github.com/transpect/epubtools
Base URI http://transpect.io/epubtools/

Source ⬇

epub:create-ocf

This step is used to create the directory structure of the OCF Abstract Container. It is required to provide as option the path to the source html file.

Import

<p:import href="http://transpect.io/epubtools/modules/create-ocf/xpl/create-ocf.xpl"/>

Synopsis

<epub:create-ocf xmlns:epub="http://transpect.io/epubtools">
  <p:input port="meta"/>
  <p:output port="result" sequence="true" primary="true"/>
  <p:output port="files" primary="false"/>
  <p:option name="base-uri" required="true"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
</epub:create-ocf>

epub:create-opf

This step expects a file list of the EPUB content files in this form: <cx:document> <c:file name="OEBPS/chapter01.xhtml"/> </cx:document> It provides the OPF file on the result port. The output of the files port is the file reference of the content.opf but not all file references in the OPF file.

Import

<p:import href="http://transpect.io/epubtools/modules/create-opf/xpl/create-opf.xpl"/>

Dependencies

Synopsis

<epub:create-opf xmlns:epub="http://transpect.io/epubtools">
  <p:input port="source"/>
  <p:input port="meta"/>
  <p:output port="result" primary="true"/>
  <p:output port="files" primary="false"/>
  <p:option name="base-uri" required="true"/>
  <p:option name="target" select="'EPUB2'"/>
  <p:option name="terminate-on-error" select="'yes'"/>
  <p:option name="use-svg" select="'yes'"/>
  <p:option name="create-a11y-meta" required="false" select="'yes'"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
</epub:create-opf>

epub:create-ops

Hashed file names are patched into the HTML before split

Import

<p:import href="http://transpect.io/epubtools/modules/create-ops/xpl/create-ops.xpl"/>

Dependencies

Synopsis

<epub:create-ops xmlns:epub="http://transpect.io/epubtools">
  <p:input port="source" primary="true"/>
  <p:input port="conf" sequence="true" primary="false"/>
  <p:input port="meta" primary="false"/>
  <p:input port="attach-cover-xsl" primary="false"/>
  <p:input port="create-svg-cover-xsl" primary="false"/>
  <p:input port="cover-svg" primary="false"/>
  <p:output port="result" primary="true"/>
  <p:output port="html"/>
  <p:output port="files" primary="false"/>
  <p:output port="report" sequence="true" primary="false"/>
  <p:output port="splitting-report" sequence="true"/>
  <p:option name="base-uri" required="true"/>
  <p:option name="target" select="'EPUB2'"/>
  <p:option name="css-filename" required="false" select="'stylesheet.css'"/>
  <p:option name="use-svg" select="'yes'"/>
  <p:option name="terminate-on-error" required="false" select="'yes'"/>
  <p:option name="debug" required="false" select="'no'"/>
  <p:option name="debug-dir-uri" required="false" select="'debug'"/>
  <p:option name="status-dir-uri" select="'status'"/>
  <p:option name="create-a11y-meta" select="'yes'"/>
  <p:option name="create-font-subset" required="false" select="'false'"/>
  <p:option name="font-subset-min-file-size" required="false" select="0"/>
  <p:option name="create-svg-cover" required="false" select="'false'"/>
  <p:option name="convert-svg-cover" required="false" select="'false'"/>
  <p:option name="pull-up-epub-type-to-body" required="false" select="'false'"/>
</epub:create-ops>

tr:font-obfuscate

This step applies the EPUB font obfuscation algorithm to all font files that are declared in the CSS.

Import

<p:import href="http://transpect.io/epubtools/modules/font-obfuscate/xpl/font-obfuscate.xpl"/>

Dependencies

Synopsis

<tr:font-obfuscate xmlns:tr="http://transpect.io">
  <p:input port="source" primary="true"/>
  <p:input port="meta" primary="false"/>
  <p:output port="result"/>
  <p:option name="targetdir"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="debug-dir-uri" select="'debug-dir-uri'"/>
</tr:font-obfuscate>

tr:create-font-subset

This pipeline creates fontsubsets. The characters used in each font will be displayed in a character set. The subset is created using the pyftsubset phython script from fonttools https://github.com/fonttools.

Import

<p:import href="http://transpect.io/epubtools/modules/fontsubsetter/xpl/fontsubsetter.xpl"/>

Dependencies

Synopsis

<tr:create-font-subset xmlns:tr="http://transpect.io">
  <p:input port="source" primary="true"/>
  <p:input port="expand-css"/>
  <p:output port="result" sequence="true" primary="true"/>
  <p:option name="script-path" select="'../../../scripts/pyftsubset.sh'"/>
  <p:option name="min-file-size-kb" select="0"/>
  <p:option name="debug" required="false" select="'yes'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
</tr:create-font-subset>

epub:html-splitter

Sample invocation (for debugging purposes):

calabash/calabash.sh 
    -i source=file:/$(cygpath -ma ../content/output/debug/epubtools/create-ops/pre-split.html) 
    -i meta=file:/$(cygpath -ma a9s/publisher/series/epubtools/heading-conf.xml) 
    -o result=tmp.html -o report=report.xml -o files=files.xml  
    file:/$(cygpath -ma epubtools/modules/html-splitter/xpl/html-splitter.xpl) 
    base-uri=file:/$(cygpath -ma ../content/output/debug/epubtools/create-ops/pre-split.html)
    debug=yes
    debug-dir-uri=file:/$(cygpath -ma ../content/output/debug)

Calabash seems to suppress some XSLT errors, for instance when a stylesheet is looping. Therefore it might be necessary to replace collection()[…] with document(…) in the XSL (alternative variable declarations are already included in the xsl file, commented out) and run saxon from the command line, for example like this:

saxon -xsl:epubtools/modules/html-splitter/xsl/html-splitter.xsl -it:main \ 
      collection-uri=file:/path/to/debugdir/epubtools/html-splitter/…/splitter-input.catalog.xml) \ 
      -s:[any.xml] debug=yes debug-dir-uri=file:/other/path/to/debug/dir

Import

<p:import href="http://transpect.io/epubtools/modules/html-splitter/xpl/html-splitter.xpl"/>

Dependencies

Synopsis

<epub:html-splitter xmlns:epub="http://transpect.io/epubtools">
  <p:input port="source" primary="true"/>
  <p:input port="conf" sequence="true" primary="false"/>
  <p:input port="meta" primary="false"/>
  <p:input port="css-xml"/>
  <p:output port="result" primary="true"/>
  <p:output port="files" primary="false"/>
  <p:output port="report"/>
  <p:output port="unused-css-resources" sequence="true"/>
  <p:output port="splitting-report" sequence="true"/>
  <p:option name="base-uri" required="true"/>
  <p:option name="target" select="'EPUB2'"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
  <p:option name="pull-up-epub-type-to-body" required="false" select="'false'"/>
</epub:html-splitter>

epub:insert-amzn-region-magnification

This pipeline adds Amazon's Region Magnification markup to each div which includes the magnification class:

<div class="magnification">
  <p id="p-1-02">A quick brown fox jumps over a lazy dog.</p>
</div>

The script adds Region Magnification markup according to Amazon's Kindle Publishing Guidelines.

<div id="amzn-id-myBook-1-txt" class="source-mag"><a class="app-amzn-magnify" data-app-amzn-magnify="{"targetId":"magTarget-amzn-id-myBook\
      _000019-1","sourceId":"magSource-amzn-id-myBook-1","ordinal":1}">
  <p id="p-1-02">A quick brown fox jumps over a lazy dog.</p>
</a></div><div id="amzn-id-myBook-1-magTarget" class="target-mag">
  <p>A quick brown fox jumps over a lazy dog.</p>
</div>

Please note that you later need to execute a bash script which escapes the quotes. The script can be found here: scripts/escape-for-amzn-region-magnification.sh

Import

<p:import href="http://transpect.io/epubtools/modules/html-splitter/xpl/insert-amzn-region-magnification.xpl"/>

Synopsis

<epub:insert-amzn-region-magnification xmlns:epub="http://transpect.io/epubtools">
  <p:input port="source" sequence="true"/>
  <p:output port="result" sequence="true"/>
  <p:option name="amzn-region-magnification" select="'false'"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
</epub:insert-amzn-region-magnification>

epub:split-css

This pipeline is used to split the CSS based on the submitted value of the option css-handling. With the value regenerated-per-split, the CSS is analyzed and splitted for each HTML chunk. Commonly used CSS properties are stored to a global CSS stylesheet, whereas unused CSS properties are filtered. With the value unchanged CSS and HTML remain unchanged. Per default, a new CSS stylesheet is generated from the CSS XML representation.

Import

<p:import href="http://transpect.io/epubtools/modules/html-splitter/xpl/split-css.xpl"/>

Dependencies

Synopsis

<epub:split-css xmlns:epub="http://transpect.io/epubtools">
  <p:input port="source" sequence="true" primary="true"/>
  <p:input port="css-xml" primary="false"/>
  <p:output port="result" sequence="true" primary="true"/>
  <p:output port="unused-css-resources" sequence="true" primary="false"/>
  <p:option name="target" required="true"/>
  <p:option name="css-handling" required="true"/>
  <p:option name="svg-scale-hack" required="true"/>
  <p:option name="basename" required="true"/>
  <p:option name="html-subdir-name" required="true"/>
  <p:option name="common-source-dir-elimination-regex" required="true"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
</epub:split-css>

epub:zip-package

This step expects a file manifest as input and creates a zip-package. The file manifest should have been this form:

Import

<p:import href="http://transpect.io/epubtools/modules/zip-package/xpl/zip-package.xpl"/>

Dependencies

Synopsis

<epub:zip-package xmlns:epub="http://transpect.io/epubtools">
  <p:input port="ocf-filerefs"/>
  <p:input port="opf-fileref"/>
  <p:input port="ops-filerefs"/>
  <p:input port="meta"/>
  <p:output port="result" primary="true"/>
  <p:output port="files" primary="false"/>
  <p:option name="base-uri" required="true"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
</epub:zip-package>

epub:convert

This step takes a HTML file as input and converts it to an epub file. You need a configuration for the HTML splitting and the OPF metadata. Examples can be found in the sample directory. Invoke this step on the command line with:

calabash/calabash.sh -i source=sample/b978-3-646-92351-3.xhtml 
  -i conf=sample/hierarchy.xml -i meta=sample/epub-config.xml epub-convert.xpl 

Note that it’s advisable to make all file inputs absolute URIs, by using cygpath on Cygwin or readlink -f on Unixy systems. For bash, this is, e.g., source=file:/$(cygpath -ma sample/b978-3-646-92351-3.xhtml)

8/2023: new feature: if HTML contains a >div>, >section> or >nav> element with

@class="as-nav"
and
@epub:type="loi"
(or "lot") that section is removed from content and moved into the navigation page as >nav> element. This works not for EPUB2. Structure is expexted as >ol>>li>>a href="link-to-fig">Fig 1: xyz>/a>>/li>>/ol>.

Import

<p:import href="http://transpect.io/epubtools/xpl/epub-convert.xpl"/>

Dependencies

Synopsis

<epub:convert xmlns:epub="http://transpect.io/epubtools">
  <p:input port="source" primary="true"/>
  <p:input port="conf" sequence="true" primary="false"/>
  <p:input port="meta" primary="false"/>
  <p:input port="schematron"/>
  <p:input port="attach-cover-xsl"/>
  <p:input port="custom-schematron" sequence="true"/>
  <p:input port="cover-svg"/>
  <p:input port="create-svg-cover-xsl" primary="false"/>
  <p:output port="result" primary="true"/>
  <p:output port="chunks" primary="false"/>
  <p:output port="opf" primary="false"/>
  <p:output port="files" primary="false"/>
  <p:output port="report" sequence="true" primary="false"/>
  <p:output port="html"/>
  <p:output port="baseuri" primary="false"/>
  <p:output port="input-for-schematron" primary="false"/>
  <p:option name="target" select="''"/>
  <p:option name="terminate-on-error" select="'yes'"/>
  <p:option name="clean-target-dir" select="'no'"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="use-svg" required="false" select="''"/>
  <p:option name="create-a11y-meta" required="false" select="'yes'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
  <p:option name="status-dir-uri" select="'status'"/>
  <p:option name="id-in-report-heading" select="'false'"/>
  <p:option name="create-font-subset" required="false" select="'true'"/>
  <p:option name="create-svg-cover" required="false" select="'false'"/>
  <p:option name="convert-svg-cover" required="false" select="'false'"/>
  <p:option name="pull-up-epub-type-to-body" required="false" select="'false'"/>
</epub:convert>

GitHub sync date: 2025-01-08+01:00