tr:file-uri file-uri

xproc-util/file-uri/xpl/file-uri.xpl

Import URI: http://transpect.io/xproc-util/file-uri/xpl/file-uri.xpl

This step accepts either a file system path or a URL in its 'filename' option. It will normalize them so that both a file system path and a file: URL are available. If filename starts with http: or https:, the file will be retrieved and stored locally. Please note that this retrieval will not work for remote directories.

Its primary uses are

  • giving users the liberty to either specify a URL or an OS-specific path for input file parameters;
  • making XML catalog resolution available to any URI, not just when accessing resources through catalog-enabled methods such as doc();
  • if, after optional catalog resolution, the 'filename' URI is still http:/https:, p:http-request will be used to store the file locally.

Examples for 'filename' values

  • C:/temp/file.docx,
  • c:\temp\file.docx,
  • file:/C:/temp/file.docx,
  • file:///C:/temp/file.docx,
  • /tmp/file.docx,
  • subdir/file.docx,
  • https://github.com/me/myrepo/blob/master/file.docx?raw=true

Relative Paths

Relative paths will be resolved against the current working directory, which is better than the static base uri most of the time but which might not always be what the user wants. It is a good idea to absolutize paths, as in $(readlink -f subdir/file.docx) or $(cygpath -ma subdir/file.docx).

XML Catalogs

If a catalog is provided on the catalog port and an XSLT stylesheet for catalog resolution is supplied on the resolver port, http:/https: URIs will be catalog-resolved first, see below.

Storage Location for HTTP Downloads

It is possible to specify a temporary directory in the 'tmpdir' option. By default, it will be the subdir 'tmp' of the user’s home directory. The 'tmpdir' option accepts both a file: URL and an OS path, thanks to this normalization step.

Please note that temporary files will not be deleted by this step.

Unique File Names for HTTP Downloads

If the option 'make-unique' is true (which it is by default), the files that are fetched by p:http-request will get a random string like _0fa8d348 appended to their base name.

Output format

The output is a c:result element with the following attributes:

os-path
OS-specific path. This is always present except when there is error-status
local-href
file: URI. This is always present except when there is error-status
error-status
This may only happen if the 'filename' was an HTTP URI and if there was an error retrieving the resource
href
The post catalog-resolution URI of the resource (if it is an HTTP URI)
orig-href
The pre catalog-resolution URI of the resource (if different from post catalog)
lastpath
For ordinary files, the non-directory part including suffix. For directories, the last path component without trailing slash.

Input Ports

NameDocumentationConnections

source

Just to prevent that the default readable port will be connected to the catalog or resolver ports.

catalog

If it is a

<catalog>

document in the namespace

urn:oasis:names:tc:entity:xmlns:xml:catalog

, it will be used for catalog resolution of URIs that start with 'http'.

resolver

An XSLT stylesheet that provides the named template resolve. This template takes a parameter $uri and produces a document <result unresolved="{$uri}"/>. If the URI could be resolved to another URI, the result will take the form <result unresolved="{$uri}" resolved="{$resolved-uri}"/>.

By default, this step only provides trivial (i.e., identity plus URL escaping) catalog resolution.

You have to supply an XSLT-based catalog resolver on the resolver port in order to use catalog resolution. That is because native catalog resolution is not available for p:http-request or by XPath function. This means that you can’t programmatically decide whether to retrieve a file via p:http-request or use the local file.

You may use the repository version of the XSLT-based resolver. However, in order to avoid network traffic, you should consider using a local copy. In order to avoid importing it via its absolute or relative file system path, you should use the transpect appoach of importing the resolver’s XML catalog via <nextCatalog from your project catalog. Then you can import the XSLT-based resolver by its canonical URI.

  • Default document: ../xsl/without-resolver.xsl

Output Ports

NameDocumentationConnections

result

A c:result document with a local-href and an os-path attribute.

Options

NameDocumentationDefault

filename

A URI or an OS-specific identifier. Relative paths will be resolved against the static-base-uri(). A future improvement might use the XSLT-based catalog resolver in order to detect whether a given http: URL will actually resolve to a local file.

make-unique

Whether to store files retrieved over HTTP with a unique random name in the temp dir.

'true'

fetch-http

Whether to fetch files referenced by URIs matching '^https?:'.

'true'

check-http

Whether to check that the HTTP status of '^https?:' URIs matches '2\d\d'. check-http and fetch-http should be made mutually exclusive. For the time being, if both are given, fetch-http has precedence. With the given default values, this means that you need to specify both check-http=true and fetch-http=false if you only want to check.

'true'

tmpdir

URI or OS name of a directory for storing files retrieved via HTTP.

''

Subpipeline

StepInputsOutputsOptions

pos:info info

p:xslt catalog-resolve

stylesheet

resolver on file-uri

source

p:document

catalog on file-uri

result

template-name = 'resolve'

p:sink d52e300

source

result on catalog-resolve

p:add-attribute empty-result

source

 <c:result/>

result

attribute-name = 'cwd'

match = '/*'

attribute-value = replace( if (/*/@file-separator = '\') then replace(/*/@cwd, '\\', '/') else /*/@cwd, '([^/])$', '$1/' )

p:set-attributes add-orig-href

If the URL has been catalog-resolved, the original URL will be copied here from the preceding XSLT step, in an orig-href attribute. Apart from that, the XSLT step has to prodce an href attribute.

Please note that despite its name, the @href attribute doesn’t necessarily contain a URI. If $filename is an OS path, @href will contain this path.

source

result on empty-result

attributes

result on catalog-resolve

result

match = '/c:result'

p:group d52e332

p:variable catalog-resolved-uri

/c:result/@href

p:choose analyze-filename

matches($catalog-resolved-uri, '^file://///[^/]')

Windows UNC path URI. file:///// → \\ .

p:add-attribute d52e343

source

 <c:result/>

result

attribute-name = 'local-href'

match = '/*'

attribute-value = $catalog-resolved-uri

p:add-attribute d52e356

source

result on d52e343

result

match = '/*'

attribute-name = 'os-path'

attribute-value = replace(replace($catalog-resolved-uri, '^file:///', ''), '/', '\\')

matches($catalog-resolved-uri, '^file:/')

Unix file URI or Windows file: URI containing a drive letter.

p:add-attribute local-href

source

source on file-uri

result

match = '/*'

attribute-name = 'local-href'

attribute-value = $catalog-resolved-uri

p:sink d52e372

source

result on local-href

tr:unescape-uri unescape-uri

result

uri = replace($catalog-resolved-uri, '^file:/+(([a-z]:)/)?', '$2/', 'i')

p:add-attribute d52e379

source

result on local-href

result

match = '/*'

attribute-name = 'os-path'

attribute-value = /c:result

matches($catalog-resolved-uri, '^/')

Unix Filename

p:add-attribute d52e398

source

source on file-uri

result

match = '/*'

attribute-name = 'local-href'

attribute-value = concat('file:', $catalog-resolved-uri)

p:add-attribute d52e403

source

result on d52e398

result

match = '/*'

attribute-name = 'os-path'

attribute-value = $catalog-resolved-uri

matches($catalog-resolved-uri, '^[a-z]:', 'i')

Windows path, either with forward or backward slashes.

p:add-attribute d52e414

source

source on file-uri

result

match = '/*'

attribute-name = 'local-href'

attribute-value = concat('file:///', replace($catalog-resolved-uri, '\\', '/'))

p:add-attribute d52e419

source

result on d52e414

result

match = '/*'

attribute-name = 'os-path'

attribute-value = $catalog-resolved-uri

matches($catalog-resolved-uri, '^https?:') and $fetch-http = 'true'

HTTP URL. Since there is no system property for a temp dir, store it in the subdir tmp of the user’s home dir. Optionally generate a random name.

p:uuid uuid

source

 <doc uuid=""/>

result

match = '/*/@uuid'

p:sink d52e441

source

result on uuid

tr:file-uri tmp-dir

source

result

filename = ($tmpdir[normalize-space()], concat(/c:result/@user-home, '/tmp/'))[1]

p:group d52e451

p:variable tmp-dir-href

result on tmp-dir

/c:result/@local-href

p:add-attribute local-href

source

result on uuid

result

attribute-name = 'local-href'

match = '/*'

attribute-value = concat( $tmp-dir-href, replace( replace($catalog-resolved-uri, '^.+/', ''), '(.+?)([.?#].+)?', '$1' ), if ($make-unique = 'true') then concat('_', substring(/*/@uuid, 1, 8)) else '', replace(replace(replace($catalog-resolved-uri, '^.+/', ''), '^[^?#.]+', ''), '[?#].*$', '') )

p:sink d52e471

source

result on local-href

p:identity d52e473

source

 <c:request method="GET" detailed="true"/>

result

p:add-attribute d52e484

source

result on d52e473

result

match = '/c:request'

attribute-name = 'href'

attribute-value = $catalog-resolved-uri

p:try http-request

p:group d52e492

p:http-request d52e496

source

result on d52e484

result

p:catch d52e499

error

p:identity d52e503

source

 <c:response status="999"/>

result

p:choose store-http-resource

not(starts-with(/c:response/@status, '2'))

cx:message d52e520

source

result

message = concat('Cannot retrieve ', $catalog-resolved-uri, '. Status: ', /c:response/@status)

p:sink d52e525

source

result on d52e520

p:add-attribute d52e527

source

 <c:result/>

result

attribute-name = 'error-status'

match = '/c:result'

attribute-value = /c:response/@status

/c:response/c:body/(.[normalize-space(.)] | c:data)

p:store d52e548

source

result on http-request

result

href = /doc/@local-href

tr:file-uri http-to-local-result_binary

source

result on d52e548

result

filename = /doc/@local-href

p:otherwise

p:store d52e572

source

result on http-request

result

omit-xml-declaration = 'false'

href = /doc/@local-href

tr:file-uri http-to-local-result_xml

source

result on d52e572

result

filename = /doc/@local-href

p:add-attribute d52e595

source

result

match = '/c:result'

attribute-name = 'href'

attribute-value = $catalog-resolved-uri

matches($catalog-resolved-uri, '^https?:') and $check-http = 'true'

HTTP URL, check only return status Ok.

p:identity d52e608

source

 <c:request method="HEAD" detailed="true" status-only="true"/>

result

p:add-attribute d52e619

source

result on d52e608

result

match = '/c:request'

attribute-name = 'href'

attribute-value = escape-html-uri($catalog-resolved-uri)

p:try http-request-check

p:group d52e628

p:http-request d52e632

source

result on d52e619

result

p:catch d52e635

error

p:identity d52e639

source

 <c:response status="999"/>

result

p:sink d52e653

source

p:identity d52e655

source

 <c:result/>

result

p:choose attach-error-status

not(starts-with(/c:response/@status, '2'))

p:add-attribute d52e675

source

result on d52e655

result

attribute-name = 'error-status'

match = '/c:result'

attribute-value = /c:response/@status

p:otherwise

p:identity d52e686

source

result on d52e655

result

p:add-attribute d52e690

source

result

match = '/c:result'

attribute-name = 'href'

attribute-value = $catalog-resolved-uri

matches($catalog-resolved-uri, '^https?:')

HTTP URL, do not fetch content or check availability.

p:identity d52e703

source

source on file-uri

result

matches($catalog-resolved-uri, '^\\\\[^\\]')

Windows UNC path. \\ → file:///// .

p:add-attribute d52e711

source

 <c:result/>

result

attribute-name = 'os-path'

match = '/*'

attribute-value = $catalog-resolved-uri

p:add-attribute d52e724

source

result on d52e711

result

match = '/*'

attribute-name = 'local-href'

attribute-value = concat('file:///', replace($catalog-resolved-uri, '\\', '/')

p:otherwise

Other protocol or relative filename. We don’t support other protocols/notations, so we assume it to be a relative path.

tr:file-uri cwd-uri

source

source on file-uri

result

filename = concat(/c:result/@cwd, '/')

tr:file-uri resolved-uri

source

result on cwd-uri

result

filename = resolve-uri($catalog-resolved-uri, /c:result/@local-href)

p:add-attribute lastpath

source

result

attribute-name = 'lastpath'

match = '/*'

attribute-value = replace(/*/@local-href, '^.+/([^/]+)/*$', '$1')

Used by