transpect

An Open Source framework for converting and checking data

Transpect was designed to provide generic and stable modules for common conversion and checking tasks. To adress complex and diverse data, transpect offers a cascading configuration to override specific transformation and checking rules. Each component within the framework is Open Source and use standard technologies such as XSLT 2.0 and XProc.

Data Conversion

Transpect offers many modules to parse and convert a wide range of XML-based formats such as DOCX, IDML, EPUB, NLM JATS/BITS and TEI. Additionally, there exist tools for converting text-based formats such as CSS and LaTeX as well as extensions, e. g., in order to check PDF and image files. The following list of supported formats makes no claim to be comprehensive.

Format Parse Generate Remark
EPUB 2/3 including Landmarks, Fixed Layout, Media Overlays, Structural Semantics Vocabulary
HTML, CSS including conversion from CSS to ➼ CSSa
Images extract technical metadata, conversion is possible with 3rd-party software
InDesign Markup Language (IDML) Styles are retained as ➼ CSSa. Generation is limited to one main story yet.
Math (MathML, OMML, LaTeX) Conversion from MathType, OMML to MathML and LaTeX.
Office Open XML, OpenDocument (ODT) Styles are retained as ➼ CSSa. Some OOXML are not supported yet but are retained (SmartArts, Drawings)
XML formats, e.g. DocBook, NLM JATS/BITS/HoBots, TEI
PDF parsing is possible with 3rd-party extensions but limited due to the nature of the format

Many converters generate the intermediate format ➼ Hub XML. Hub XML is a DocBook 5.1 derivative that allows for documents that lack a proper section hierarchy and uses ➼ CSSa for expressing layout information. It is used as a common intermediate format to represent raw conversion results of, for example, OOXML, ODT, and IDML documents.

Checking Data

Transpect implements Schematron and Schema validation. Furthermore, many modules integrate error detection and recovery methods. Reports are stored as Schematron SVRL document. The report messages can be displayed in an HTML view of the document at the error location.

Configuration Cascade

Default transformation and checking rules (XSLT, Schematron, CSS, …) may be superseded with specific rules. These rules specified according to the group of content that the input belongs to, for example per company, per production-line, or per product.

Open Source

Transpect is published under the BSD 2-clause license, also known as FreeBSD License. This permissive license imposes minimal restrictions on the redistribution of the software. Therefore you can use the software in commercial and even in closed source projects.

The license terms can be found here: ➼ http://opensource.org/licenses/BSD-2-Clause

Industry Standards

The technologies behind transpect are industry standards like XProc, XSLT 2.0 and Schematron. Their specifications are publicly available through international standards organizations such as W3C and ISO.

XProc A language to specify a sequence of operations to be performed on XML documents.
XSLT 2.0 XSLT is a programming language for the purpose of transforming XML documents. Due to several limitations of version 1.0, we recommend to use XSLT 2.0
Schematron A rule-based schema language to validate XML documents.
RelaxNG RelaxNG is an XML schema language to specify patterns for the structure of an XML document.

le-tex

Transpect is developed and maintained by le-tex, a Leipzig-based company which provides professional services for publishers.