epubtools
Library to convert and check EPUB 2 and 3
Git URL | https://github.com/transpect/epubtools.git |
SVN URL | https://github.com/transpect/epubtools |
Base URI | http://transpect.io/epubtools/ |
epub:create-ocf
This step is used to create the directory structure of the OCF Abstract Container. It is required to provide as option the path to the source html file.
Import
<p:import href="http://transpect.io/epubtools/modules/create-ocf/xpl/create-ocf.xpl"/>
Synopsis
<epub:create-ocf xmlns:epub="http://transpect.io/epubtools">
<p:input port="meta"/>
<p:output port="result" sequence="true" primary="true"/>
<p:output port="files" primary="false"/>
<p:option name="base-uri" required="true"/>
<p:option name="debug" select="'no'"/>
<p:option name="debug-dir-uri" select="'debug'"/>
</epub:create-ocf>
epub:create-opf
This step expects a file list of the EPUB content files in this form: <cx:document> <c:file name="OEBPS/chapter01.xhtml"/> </cx:document> It provides the OPF file on the result port. The output of the files port is the file reference of the content.opf but not all file references in the OPF file.
Import
<p:import href="http://transpect.io/epubtools/modules/create-opf/xpl/create-opf.xpl"/>
Dependencies
Synopsis
<epub:create-opf xmlns:epub="http://transpect.io/epubtools">
<p:input port="source"/>
<p:input port="meta"/>
<p:output port="result" primary="true"/>
<p:output port="files" primary="false"/>
<p:option name="base-uri" required="true"/>
<p:option name="target" select="'EPUB2'"/>
<p:option name="terminate-on-error" select="'yes'"/>
<p:option name="use-svg" select="'yes'"/>
<p:option name="create-a11y-meta" required="false" select="'yes'"/>
<p:option name="debug" select="'no'"/>
<p:option name="debug-dir-uri" select="'debug'"/>
</epub:create-opf>
epub:create-ops
Hashed file names are patched into the HTML before split
Import
<p:import href="http://transpect.io/epubtools/modules/create-ops/xpl/create-ops.xpl"/>
Dependencies
Synopsis
<epub:create-ops xmlns:epub="http://transpect.io/epubtools">
<p:input port="source" primary="true"/>
<p:input port="conf" sequence="true" primary="false"/>
<p:input port="meta" primary="false"/>
<p:input port="attach-cover-xsl" primary="false"/>
<p:input port="create-svg-cover-xsl" primary="false"/>
<p:input port="cover-svg" primary="false"/>
<p:output port="result" primary="true"/>
<p:output port="html"/>
<p:output port="files" primary="false"/>
<p:output port="report" sequence="true" primary="false"/>
<p:output port="splitting-report" sequence="true"/>
<p:option name="base-uri" required="true"/>
<p:option name="target" select="'EPUB2'"/>
<p:option name="css-filename" required="false" select="'stylesheet.css'"/>
<p:option name="use-svg" select="'yes'"/>
<p:option name="terminate-on-error" required="false" select="'yes'"/>
<p:option name="debug" required="false" select="'no'"/>
<p:option name="debug-dir-uri" required="false" select="'debug'"/>
<p:option name="status-dir-uri" select="'status'"/>
<p:option name="create-a11y-meta" select="'yes'"/>
<p:option name="create-font-subset" required="false" select="'false'"/>
<p:option name="font-subset-min-file-size" required="false" select="0"/>
<p:option name="create-svg-cover" required="false" select="'false'"/>
<p:option name="convert-svg-cover" required="false" select="'false'"/>
<p:option name="pull-up-epub-type-to-body" required="false" select="'false'"/>
</epub:create-ops>
tr:font-obfuscate
This step applies the EPUB font obfuscation algorithm to all font files that are declared in the CSS.
Import
<p:import href="http://transpect.io/epubtools/modules/font-obfuscate/xpl/font-obfuscate.xpl"/>
Dependencies
Synopsis
<tr:font-obfuscate xmlns:tr="http://transpect.io">
<p:input port="source" primary="true"/>
<p:input port="meta" primary="false"/>
<p:output port="result"/>
<p:option name="targetdir"/>
<p:option name="debug" select="'no'"/>
<p:option name="debug-dir-uri" select="'debug-dir-uri'"/>
</tr:font-obfuscate>
tr:create-font-subset
This pipeline creates fontsubsets. The characters used in each font will be displayed in a character set. The subset is created using the pyftsubset phython script from fonttools https://github.com/fonttools.
Import
<p:import href="http://transpect.io/epubtools/modules/fontsubsetter/xpl/fontsubsetter.xpl"/>
Dependencies
Synopsis
<tr:create-font-subset xmlns:tr="http://transpect.io">
<p:input port="source" primary="true"/>
<p:input port="expand-css"/>
<p:output port="result" sequence="true" primary="true"/>
<p:option name="script-path" select="'../../../scripts/pyftsubset.sh'"/>
<p:option name="min-file-size-kb" select="0"/>
<p:option name="debug" required="false" select="'yes'"/>
<p:option name="debug-dir-uri" select="'debug'"/>
</tr:create-font-subset>
epub:html-splitter
Sample invocation (for debugging purposes):
calabash/calabash.sh -i source=file:/$(cygpath -ma ../content/output/debug/epubtools/create-ops/pre-split.html) -i meta=file:/$(cygpath -ma a9s/publisher/series/epubtools/heading-conf.xml) -o result=tmp.html -o report=report.xml -o files=files.xml file:/$(cygpath -ma epubtools/modules/html-splitter/xpl/html-splitter.xpl) base-uri=file:/$(cygpath -ma ../content/output/debug/epubtools/create-ops/pre-split.html) debug=yes debug-dir-uri=file:/$(cygpath -ma ../content/output/debug)
Calabash seems to suppress some XSLT errors, for instance when a stylesheet is looping. Therefore it might be necessary to replace collection()[…] with document(…) in the XSL (alternative variable declarations are already included in the xsl file, commented out) and run saxon from the command line, for example like this:
saxon -xsl:epubtools/modules/html-splitter/xsl/html-splitter.xsl -it:main \ collection-uri=file:/path/to/debugdir/epubtools/html-splitter/…/splitter-input.catalog.xml) \ -s:[any.xml] debug=yes debug-dir-uri=file:/other/path/to/debug/dir
Import
<p:import href="http://transpect.io/epubtools/modules/html-splitter/xpl/html-splitter.xpl"/>
Dependencies
Synopsis
<epub:html-splitter xmlns:epub="http://transpect.io/epubtools">
<p:input port="source" primary="true"/>
<p:input port="conf" sequence="true" primary="false"/>
<p:input port="meta" primary="false"/>
<p:input port="css-xml"/>
<p:output port="result" primary="true"/>
<p:output port="files" primary="false"/>
<p:output port="report"/>
<p:output port="unused-css-resources" sequence="true"/>
<p:output port="splitting-report" sequence="true"/>
<p:option name="base-uri" required="true"/>
<p:option name="target" select="'EPUB2'"/>
<p:option name="debug" select="'no'"/>
<p:option name="debug-dir-uri" select="'debug'"/>
<p:option name="pull-up-epub-type-to-body" required="false" select="'false'"/>
</epub:html-splitter>
epub:insert-amzn-region-magnification
This pipeline adds Amazon's Region Magnification markup to each
div
which includes the magnification
class:
<div class="magnification">
<p id="p-1-02">A quick brown fox jumps over a lazy dog.</p>
</div>
The script adds Region Magnification markup according to Amazon's Kindle Publishing Guidelines.
<div id="amzn-id-myBook-1-txt" class="source-mag"><a class="app-amzn-magnify" data-app-amzn-magnify="{"targetId":"magTarget-amzn-id-myBook\
_000019-1","sourceId":"magSource-amzn-id-myBook-1","ordinal":1}">
<p id="p-1-02">A quick brown fox jumps over a lazy dog.</p>
</a></div><div id="amzn-id-myBook-1-magTarget" class="target-mag">
<p>A quick brown fox jumps over a lazy dog.</p>
</div>
Please note that you later need to execute a bash script which escapes the quotes.
The script can be found here: scripts/escape-for-amzn-region-magnification.sh
Import
<p:import href="http://transpect.io/epubtools/modules/html-splitter/xpl/insert-amzn-region-magnification.xpl"/>
Synopsis
<epub:insert-amzn-region-magnification xmlns:epub="http://transpect.io/epubtools">
<p:input port="source" sequence="true"/>
<p:output port="result" sequence="true"/>
<p:option name="amzn-region-magnification" select="'false'"/>
<p:option name="debug" select="'no'"/>
<p:option name="debug-dir-uri" select="'debug'"/>
</epub:insert-amzn-region-magnification>
epub:split-css
This pipeline is used to split the CSS based on the submitted value of the option
css-handling
.
With the value regenerated-per-split
, the CSS is analyzed and splitted for each HTML chunk. Commonly
used CSS properties are stored to a global CSS stylesheet, whereas unused CSS properties
are filtered.
With the value unchanged
CSS and HTML remain unchanged.
Per default, a new CSS stylesheet is generated from the CSS XML representation.
Import
<p:import href="http://transpect.io/epubtools/modules/html-splitter/xpl/split-css.xpl"/>
Dependencies
Synopsis
<epub:split-css xmlns:epub="http://transpect.io/epubtools">
<p:input port="source" sequence="true" primary="true"/>
<p:input port="css-xml" primary="false"/>
<p:output port="result" sequence="true" primary="true"/>
<p:output port="unused-css-resources" sequence="true" primary="false"/>
<p:option name="target" required="true"/>
<p:option name="css-handling" required="true"/>
<p:option name="svg-scale-hack" required="true"/>
<p:option name="basename" required="true"/>
<p:option name="html-subdir-name" required="true"/>
<p:option name="common-source-dir-elimination-regex" required="true"/>
<p:option name="debug" select="'no'"/>
<p:option name="debug-dir-uri" select="'debug'"/>
</epub:split-css>
tr:check-links
This step checks whether links are available. TO DO: additional report. Perhaps chooseable link attribute names.
Import
<p:import href="http://transpect.io/epubtools/modules/link-checker/xpl/link-checker.xpl"/>
Dependencies
Synopsis
<tr:check-links xmlns:tr="http://transpect.io">
<p:input port="source" primary="true"/>
<p:output port="result"/>
<p:option name="only" select="''"/>
<p:option name="never" select="''"/>
<p:option name="debug-dir-uri" select="'debug-dir-uri'"/>
<p:option name="status-dir-uri" select="'status'"/>
</tr:check-links>
epub:zip-package
This step expects a file manifest as input and creates a zip-package. The file manifest should have been this form:
Import
<p:import href="http://transpect.io/epubtools/modules/zip-package/xpl/zip-package.xpl"/>
Dependencies
Synopsis
<epub:zip-package xmlns:epub="http://transpect.io/epubtools">
<p:input port="ocf-filerefs"/>
<p:input port="opf-fileref"/>
<p:input port="ops-filerefs"/>
<p:input port="meta"/>
<p:output port="result" primary="true"/>
<p:output port="files" primary="false"/>
<p:option name="base-uri" required="true"/>
<p:option name="debug" select="'no'"/>
<p:option name="debug-dir-uri" select="'debug'"/>
</epub:zip-package>
epub:convert
This step takes a HTML file as input and converts it to an epub file. You need a configuration for the HTML splitting and the OPF metadata. Examples can be found in the sample directory. Invoke this step on the command line with:
calabash/calabash.sh -i source=sample/b978-3-646-92351-3.xhtml
-i conf=sample/hierarchy.xml -i meta=sample/epub-config.xml epub-convert.xpl
Note that it’s advisable to make all file inputs absolute URIs, by using cygpath
on Cygwin or readlink -f
on Unixy systems. For bash, this is, e.g., source=file:/$(cygpath -ma sample/b978-3-646-92351-3.xhtml)
8/2023: new feature: if HTML contains a >div>, >section> or >nav> element with
@class="as-nav"and
@epub:type="loi"(or "lot") that section is removed from content and moved into the navigation page as >nav> element. This works not for EPUB2. Structure is expexted as >ol>>li>>a href="link-to-fig">Fig 1: xyz>/a>>/li>>/ol>.
Import
<p:import href="http://transpect.io/epubtools/xpl/epub-convert.xpl"/>
Dependencies
Synopsis
<epub:convert xmlns:epub="http://transpect.io/epubtools">
<p:input port="source" primary="true"/>
<p:input port="conf" sequence="true" primary="false"/>
<p:input port="meta" primary="false"/>
<p:input port="schematron"/>
<p:input port="attach-cover-xsl"/>
<p:input port="custom-schematron" sequence="true"/>
<p:input port="cover-svg"/>
<p:input port="create-svg-cover-xsl" primary="false"/>
<p:output port="result" primary="true"/>
<p:output port="chunks" primary="false"/>
<p:output port="opf" primary="false"/>
<p:output port="files" primary="false"/>
<p:output port="report" sequence="true" primary="false"/>
<p:output port="html"/>
<p:output port="baseuri" primary="false"/>
<p:output port="input-for-schematron" primary="false"/>
<p:option name="target" select="''"/>
<p:option name="terminate-on-error" select="'yes'"/>
<p:option name="clean-target-dir" select="'no'"/>
<p:option name="debug" select="'no'"/>
<p:option name="use-svg" required="false" select="''"/>
<p:option name="create-a11y-meta" required="false" select="'yes'"/>
<p:option name="debug-dir-uri" select="'debug'"/>
<p:option name="status-dir-uri" select="'status'"/>
<p:option name="id-in-report-heading" select="'false'"/>
<p:option name="create-font-subset" required="false" select="'true'"/>
<p:option name="create-svg-cover" required="false" select="'false'"/>
<p:option name="convert-svg-cover" required="false" select="'false'"/>
<p:option name="pull-up-epub-type-to-body" required="false" select="'false'"/>
</epub:convert>
GitHub sync date: 2025-01-08+01:00