epubtools

This step is used to create the directory structure of the OCF Abstract Container. It is required to provide as option the path to the source html file.

Import

<p:import href="http://transpect.io/epubtools/modules/create-ocf/xpl/create-ocf.xpl"/>

Synopsis

<epub:create-ocf xmlns:epub="http://transpect.io/epubtools">
  <p:input port="meta"/>
  <p:output port="result" sequence="true" primary="true"/>
  <p:output port="files" primary="false"/>
  <p:option name="base-uri" required="true"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
</epub:create-ocf>

This step expects a file list of the EPUB content files in this form: <cx:document> <c:file name="OEBPS/chapter01.xhtml"/> </cx:document> It provides the OPF file on the result port. The output of the files port is the file reference of the content.opf but not all file references in the OPF file.

Import

<p:import href="http://transpect.io/epubtools/modules/create-opf/xpl/create-opf.xpl"/>

Dependencies

Synopsis

<epub:create-opf xmlns:epub="http://transpect.io/epubtools">
  <p:input port="source"/>
  <p:input port="meta"/>
  <p:output port="result" primary="true"/>
  <p:output port="files" primary="false"/>
  <p:option name="base-uri" required="true"/>
  <p:option name="target" select="'EPUB2'"/>
  <p:option name="terminate-on-error" select="'yes'"/>
  <p:option name="use-svg" select="'yes'"/>
  <p:option name="create-a11y-meta" required="false" select="'yes'"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
</epub:create-opf>

Hashed file names are patched into the HTML before split

Import

<p:import href="http://transpect.io/epubtools/modules/create-ops/xpl/create-ops.xpl"/>

Dependencies

Synopsis

<epub:create-ops xmlns:epub="http://transpect.io/epubtools">
  <p:input port="source" primary="true"/>
  <p:input port="conf" sequence="true" primary="false"/>
  <p:input port="meta" primary="false"/>
  <p:input port="attach-cover-xsl" primary="false"/>
  <p:input port="create-svg-cover-xsl" primary="false"/>
  <p:input port="cover-svg" primary="false"/>
  <p:output port="result" primary="true"/>
  <p:output port="html"/>
  <p:output port="files" primary="false"/>
  <p:output port="report" sequence="true" primary="false"/>
  <p:output port="splitting-report" sequence="true"/>
  <p:option name="base-uri" required="true"/>
  <p:option name="target" select="'EPUB2'"/>
  <p:option name="css-filename" required="false" select="'stylesheet.css'"/>
  <p:option name="use-svg" select="'yes'"/>
  <p:option name="terminate-on-error" required="false" select="'yes'"/>
  <p:option name="debug" required="false" select="'no'"/>
  <p:option name="debug-dir-uri" required="false" select="'debug'"/>
  <p:option name="status-dir-uri" select="'status'"/>
  <p:option name="create-a11y-meta" select="'yes'"/>
  <p:option name="create-font-subset" required="false" select="'false'"/>
  <p:option name="font-subset-min-file-size" required="false" select="0"/>
  <p:option name="create-svg-cover" required="false" select="'false'"/>
  <p:option name="convert-svg-cover" required="false" select="'false'"/>
  <p:option name="pull-up-epub-type-to-body" required="false" select="'false'"/>
</epub:create-ops>

This step applies the EPUB font obfuscation algorithm to all font files that are declared in the CSS.

Import

<p:import href="http://transpect.io/epubtools/modules/font-obfuscate/xpl/font-obfuscate.xpl"/>

Dependencies

xproc-util transpect.github.io

Synopsis

<tr:font-obfuscate xmlns:tr="http://transpect.io">
  <p:input port="source" primary="true"/>
  <p:input port="meta" primary="false"/>
  <p:output port="result"/>
  <p:option name="targetdir"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="debug-dir-uri" select="'debug-dir-uri'"/>
</tr:font-obfuscate>

This pipeline creates fontsubsets. The characters used in each font will be displayed in a character set. The subset is created using the pyftsubset phython script from fonttools https://github.com/fonttools.

Import

<p:import href="http://transpect.io/epubtools/modules/fontsubsetter/xpl/fontsubsetter.xpl"/>

Dependencies

Synopsis

<tr:create-font-subset xmlns:tr="http://transpect.io">
  <p:input port="source" primary="true"/>
  <p:input port="expand-css"/>
  <p:output port="result" sequence="true" primary="true"/>
  <p:option name="script-path" select="'../../../scripts/pyftsubset.sh'"/>
  <p:option name="min-file-size-kb" select="0"/>
  <p:option name="debug" required="false" select="'yes'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
</tr:create-font-subset>

Sample invocation (for debugging purposes):

calabash/calabash.sh 
    -i source=file:/$(cygpath -ma ../content/output/debug/epubtools/create-ops/pre-split.html) 
    -i meta=file:/$(cygpath -ma a9s/publisher/series/epubtools/heading-conf.xml) 
    -o result=tmp.html -o report=report.xml -o files=files.xml  
    file:/$(cygpath -ma epubtools/modules/html-splitter/xpl/html-splitter.xpl) 
    base-uri=file:/$(cygpath -ma ../content/output/debug/epubtools/create-ops/pre-split.html)
    debug=yes
    debug-dir-uri=file:/$(cygpath -ma ../content/output/debug)

Calabash seems to suppress some XSLT errors, for instance when a stylesheet is looping. Therefore it might be necessary to replace collection()[…] with document(…) in the XSL (alternative variable declarations are already included in the xsl file, commented out) and run saxon from the command line, for example like this:

saxon -xsl:epubtools/modules/html-splitter/xsl/html-splitter.xsl -it:main \ 
      collection-uri=file:/path/to/debugdir/epubtools/html-splitter/…/splitter-input.catalog.xml) \ 
      -s:[any.xml] debug=yes debug-dir-uri=file:/other/path/to/debug/dir

Import

<p:import href="http://transpect.io/epubtools/modules/html-splitter/xpl/html-splitter.xpl"/>

Dependencies

xproc-util transpect.github.io

Synopsis

<epub:html-splitter xmlns:epub="http://transpect.io/epubtools">
  <p:input port="source" primary="true"/>
  <p:input port="conf" sequence="true" primary="false"/>
  <p:input port="meta" primary="false"/>
  <p:input port="css-xml"/>
  <p:output port="result" primary="true"/>
  <p:output port="files" primary="false"/>
  <p:output port="report"/>
  <p:output port="unused-css-resources" sequence="true"/>
  <p:output port="splitting-report" sequence="true"/>
  <p:option name="base-uri" required="true"/>
  <p:option name="target" select="'EPUB2'"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
  <p:option name="pull-up-epub-type-to-body" required="false" select="'false'"/>
</epub:html-splitter>

This pipeline adds Amazon's Region Magnification markup to each div which includes the magnification class:

<div class="magnification">
  <p id="p-1-02">A quick brown fox jumps over a lazy dog.</p>
</div>

The script adds Region Magnification markup according to Amazon's Kindle Publishing Guidelines.

<div id="amzn-id-myBook-1-txt" class="source-mag"><a class="app-amzn-magnify" data-app-amzn-magnify="{"targetId":"magTarget-amzn-id-myBook\
      _000019-1","sourceId":"magSource-amzn-id-myBook-1","ordinal":1}">
  <p id="p-1-02">A quick brown fox jumps over a lazy dog.</p>
</a></div><div id="amzn-id-myBook-1-magTarget" class="target-mag">
  <p>A quick brown fox jumps over a lazy dog.</p>
</div>

Please note that you later need to execute a bash script which escapes the quotes. The script can be found here: scripts/escape-for-amzn-region-magnification.sh

Import

<p:import href="http://transpect.io/epubtools/modules/html-splitter/xpl/insert-amzn-region-magnification.xpl"/>

Synopsis

<epub:insert-amzn-region-magnification xmlns:epub="http://transpect.io/epubtools">
  <p:input port="source" sequence="true"/>
  <p:output port="result" sequence="true"/>
  <p:option name="amzn-region-magnification" select="'false'"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
</epub:insert-amzn-region-magnification>

This pipeline is used to split the CSS based on the submitted value of the option css-handling. With the value regenerated-per-split, the CSS is analyzed and splitted for each HTML chunk. Commonly used CSS properties are stored to a global CSS stylesheet, whereas unused CSS properties are filtered. With the value unchanged CSS and HTML remain unchanged. Per default, a new CSS stylesheet is generated from the CSS XML representation.

Import

<p:import href="http://transpect.io/epubtools/modules/html-splitter/xpl/split-css.xpl"/>

Dependencies

Synopsis

<epub:split-css xmlns:epub="http://transpect.io/epubtools">
  <p:input port="source" sequence="true" primary="true"/>
  <p:input port="css-xml" primary="false"/>
  <p:output port="result" sequence="true" primary="true"/>
  <p:output port="unused-css-resources" sequence="true" primary="false"/>
  <p:option name="target" required="true"/>
  <p:option name="css-handling" required="true"/>
  <p:option name="svg-scale-hack" required="true"/>
  <p:option name="basename" required="true"/>
  <p:option name="html-subdir-name" required="true"/>
  <p:option name="common-source-dir-elimination-regex" required="true"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
</epub:split-css>

This step checks whether links are available. TO DO: additional report. Perhaps chooseable link attribute names.

Import

<p:import href="http://transpect.io/epubtools/modules/link-checker/xpl/link-checker.xpl"/>

Dependencies

xproc-util transpect.github.io

Synopsis

<tr:check-links xmlns:tr="http://transpect.io">
  <p:input port="source" primary="true"/>
  <p:output port="result"/>
  <p:option name="only" select="''"/>
  <p:option name="never" select="''"/>
  <p:option name="debug-dir-uri" select="'debug-dir-uri'"/>
  <p:option name="status-dir-uri" select="'status'"/>
</tr:check-links>

This step expects a file manifest as input and creates a zip-package. The file manifest should have been this form:

Import

<p:import href="http://transpect.io/epubtools/modules/zip-package/xpl/zip-package.xpl"/>

Dependencies

Synopsis

<epub:zip-package xmlns:epub="http://transpect.io/epubtools">
  <p:input port="ocf-filerefs"/>
  <p:input port="opf-fileref"/>
  <p:input port="ops-filerefs"/>
  <p:input port="meta"/>
  <p:output port="result" primary="true"/>
  <p:output port="files" primary="false"/>
  <p:option name="base-uri" required="true"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
</epub:zip-package>

This step takes a HTML file as input and converts it to an epub file. You need a configuration for the HTML splitting and the OPF metadata. Examples can be found in the sample directory. Invoke this step on the command line with:

calabash/calabash.sh -i source=sample/b978-3-646-92351-3.xhtml 
  -i conf=sample/hierarchy.xml -i meta=sample/epub-config.xml epub-convert.xpl

Note that it’s advisable to make all file inputs absolute URIs, by using cygpath on Cygwin or readlink -f on Unixy systems. For bash, this is, e.g., source=file:/$(cygpath -ma sample/b978-3-646-92351-3.xhtml)

8/2023: new feature: if HTML contains a >div>, >section> or >nav> element with

@class="as-nav"

and

@epub:type="loi"

(or "lot") that section is removed from content and moved into the navigation page as >nav> element. This works not for EPUB2. Structure is expexted as >ol>>li>>a href="link-to-fig">Fig 1: xyz>/a>>/li>>/ol>.

Import

<p:import href="http://transpect.io/epubtools/xpl/epub-convert.xpl"/>

Dependencies

Synopsis

<epub:convert xmlns:epub="http://transpect.io/epubtools">
  <p:input port="source" primary="true"/>
  <p:input port="conf" sequence="true" primary="false"/>
  <p:input port="meta" primary="false"/>
  <p:input port="schematron"/>
  <p:input port="attach-cover-xsl"/>
  <p:input port="custom-schematron" sequence="true"/>
  <p:input port="cover-svg"/>
  <p:input port="create-svg-cover-xsl" primary="false"/>
  <p:output port="result" primary="true"/>
  <p:output port="chunks" primary="false"/>
  <p:output port="opf" primary="false"/>
  <p:output port="files" primary="false"/>
  <p:output port="report" sequence="true" primary="false"/>
  <p:output port="html"/>
  <p:output port="baseuri" primary="false"/>
  <p:output port="input-for-schematron" primary="false"/>
  <p:option name="target" select="''"/>
  <p:option name="terminate-on-error" select="'yes'"/>
  <p:option name="clean-target-dir" select="'no'"/>
  <p:option name="debug" select="'no'"/>
  <p:option name="use-svg" required="false" select="''"/>
  <p:option name="create-a11y-meta" required="false" select="'yes'"/>
  <p:option name="debug-dir-uri" select="'debug'"/>
  <p:option name="status-dir-uri" select="'status'"/>
  <p:option name="id-in-report-heading" select="'false'"/>
  <p:option name="create-font-subset" required="false" select="'true'"/>
  <p:option name="create-svg-cover" required="false" select="'false'"/>
  <p:option name="convert-svg-cover" required="false" select="'false'"/>
  <p:option name="pull-up-epub-type-to-body" required="false" select="'false'"/>
</epub:convert>

Git URL	`https://github.com/transpect/epubtools.git`
SVN URL	`https://github.com/transpect/epubtools`
Base URI	`http://transpect.io/epubtools/`

epubtools

Library to convert and check EPUB 2 and 3

epub:create-ocf

Import

Synopsis

epub:create-opf

Import

Dependencies

Synopsis

epub:create-ops

Import

Dependencies

Synopsis

tr:font-obfuscate

Import

Dependencies

Synopsis

tr:create-font-subset

Import

Dependencies

Synopsis

epub:html-splitter

Import

Dependencies

Synopsis

epub:insert-amzn-region-magnification

Import

Synopsis

epub:split-css

Import

Dependencies

Synopsis

tr:check-links

Import

Dependencies

Synopsis

epub:zip-package

Import

Dependencies

Synopsis

epub:convert

Import

Dependencies

Synopsis