tr:file-uri file-uri
xproc-util/file-uri/xpl/file-uri.xpl
Import URI: http://transpect.io/xproc-util/file-uri/xpl/file-uri.xpl
This step accepts either a file system path or a URL in its 'filename' option. It will normalize them so that both a file system path and a file: URL are available. If filename starts with http: or https:, the file will be retrieved and stored locally. Please note that this retrieval will not work for remote directories.
Its primary uses are
- giving users the liberty to either specify a URL or an OS-specific path for input file parameters;
- making XML catalog resolution available to any URI, not just when accessing resources through catalog-enabled methods
such as
doc()
; - if, after optional catalog resolution, the 'filename' URI is still http:/https:,
p:http-request
will be used to store the file locally.
Examples for 'filename' values
C:/temp/file.docx
,c:\temp\file.docx
,file:/C:/temp/file.docx
,file:///C:/temp/file.docx
,/tmp/file.docx
,subdir/file.docx
,https://github.com/me/myrepo/blob/master/file.docx?raw=true
Relative Paths
Relative paths will be resolved against the current working directory, which is better than the static base uri most of the
time but which might not always be what the user wants. It is a good idea to absolutize paths, as in
$(readlink -f subdir/file.docx)
or $(cygpath -ma subdir/file.docx)
.
XML Catalogs
If a catalog is provided on the catalog port and an XSLT stylesheet for catalog resolution is supplied on the resolver port, http:/https: URIs will be catalog-resolved first, see below.
Storage Location for HTTP Downloads
It is possible to specify a temporary directory in the 'tmpdir' option. By default, it will be the subdir 'tmp' of the user’s home directory. The 'tmpdir' option accepts both a file: URL and an OS path, thanks to this normalization step.
Please note that temporary files will not be deleted by this step.
Unique File Names for HTTP Downloads
If the option 'make-unique' is true (which it is by default), the files that are fetched by p:http-request
will get a random string like _0fa8d348
appended to their base name.
Output format
The output is a c:result
element with the following attributes:
os-path
- OS-specific path. This is always present except when there is
error-status
local-href
- file: URI. This is always present except when there is
error-status
error-status
- This may only happen if the 'filename' was an HTTP URI and if there was an error retrieving the resource
href
- The post catalog-resolution URI of the resource (if it is an HTTP URI)
orig-href
- The pre catalog-resolution URI of the resource (if different from post catalog)
lastpath
- For ordinary files, the non-directory part including suffix. For directories, the last path component without trailing slash.
Input Ports
Name | Documentation | Connections |
---|---|---|
sourceⓅ | Just to prevent that the default readable port will be connected to the catalog or resolver ports. |
|
catalog | If it is a <catalog> document in the namespace urn:oasis:names:tc:entity:xmlns:xml:catalog , it will be used for catalog resolution of URIs that start with 'http'. | |
resolver | An XSLT stylesheet that provides the named template By default, this step only provides trivial (i.e., identity plus URL escaping) catalog resolution. You have to supply an XSLT-based catalog resolver on the resolver port in order to use catalog resolution. That is because native catalog resolution is not available for p:http-request or by XPath function. This means that you can’t programmatically decide whether to retrieve a file via p:http-request or use the local file. You may use the repository version of the XSLT-based resolver. However, in order to avoid network traffic, you should consider
using a local copy. In order to avoid importing it via its absolute or relative file system path, you should use the
transpect appoach of importing the resolver’s XML catalog via |
|
Output Ports
Name | Documentation | Connections |
---|---|---|
resultⓅ | A c:result document with a local-href and an os-path attribute. |
Options
Name | Documentation | Default |
---|---|---|
filename | A URI or an OS-specific identifier. Relative paths will be resolved against the static-base-uri(). A future improvement might use the XSLT-based catalog resolver in order to detect whether a given http: URL will actually resolve to a local file. | |
make-unique | Whether to store files retrieved over HTTP with a unique random name in the temp dir. | 'true' |
fetch-http | Whether to fetch files referenced by URIs matching '^https?:'. | 'true' |
check-http | Whether to check that the HTTP status of '^https?:' URIs matches '2\d\d'. check-http and fetch-http should be made mutually exclusive. For the time being, if both are given, fetch-http has precedence. With the given default values, this means that you need to specify both check-http=true and fetch-http=false if you only want to check. | 'true' |
tmpdir | URI or OS name of a directory for storing files retrieved via HTTP. | '' |
Subpipeline
Step | Inputs | Outputs | Options | ||||||
---|---|---|---|---|---|---|---|---|---|
pos:info info | |||||||||
p:xslt catalog-resolve | result | template-name = 'resolve' | |||||||
p:sink d52e300 |
| ||||||||
p:add-attribute empty-result |
| result | attribute-name = 'cwd' match = '/*' attribute-value = replace( if (/*/@file-separator = '\') then replace(/*/@cwd, '\\', '/') else /*/@cwd, '([^/])$', '$1/' ) | ||||||
p:set-attributes add-orig-href If the URL has been catalog-resolved, the original URL will be copied here from the preceding XSLT step, in an orig-href attribute. Apart from that, the XSLT step has to prodce an href attribute. Please note that despite its name, the @href attribute doesn’t necessarily contain a URI. If $filename is an OS path, @href will contain this path. |
| result | match = '/c:result' | ||||||
p:group d52e332 | |||||||||
p:variable catalog-resolved-uri | /c:result/@href | ||||||||
p:choose analyze-filename | |||||||||
matches($catalog-resolved-uri, '^file://///[^/]') | Windows UNC path URI. file:///// → \\ . | ||||||||
p:add-attribute d52e343 |
| result | attribute-name = 'local-href' match = '/*' attribute-value = $catalog-resolved-uri | ||||||
p:add-attribute d52e356 | result | match = '/*' attribute-name = 'os-path' attribute-value = replace(replace($catalog-resolved-uri, '^file:///', ''), '/', '\\') | |||||||
matches($catalog-resolved-uri, '^file:/') | Unix file URI or Windows file: URI containing a drive letter. | ||||||||
p:add-attribute local-href | result | match = '/*' attribute-name = 'local-href' attribute-value = $catalog-resolved-uri | |||||||
p:sink d52e372 |
| ||||||||
tr:unescape-uri unescape-uri | result | uri = replace($catalog-resolved-uri, '^file:/+(([a-z]:)/)?', '$2/', 'i') | |||||||
p:add-attribute d52e379 |
| result | match = '/*' attribute-name = 'os-path' attribute-value = /c:result | ||||||
matches($catalog-resolved-uri, '^/') | Unix Filename | ||||||||
p:add-attribute d52e398 | result | match = '/*' attribute-name = 'local-href' attribute-value = concat('file:', $catalog-resolved-uri) | |||||||
p:add-attribute d52e403 | result | match = '/*' attribute-name = 'os-path' attribute-value = $catalog-resolved-uri | |||||||
matches($catalog-resolved-uri, '^[a-z]:', 'i') | Windows path, either with forward or backward slashes. | ||||||||
p:add-attribute d52e414 | result | match = '/*' attribute-name = 'local-href' attribute-value = concat('file:///', replace($catalog-resolved-uri, '\\', '/')) | |||||||
p:add-attribute d52e419 | result | match = '/*' attribute-name = 'os-path' attribute-value = $catalog-resolved-uri | |||||||
matches($catalog-resolved-uri, '^https?:') and $fetch-http = 'true' | HTTP URL. Since there is no system property for a temp dir, store it in the subdir tmp of the user’s home dir. Optionally generate a random name. | ||||||||
p:uuid uuid |
| result | match = '/*/@uuid' | ||||||
p:sink d52e441 | |||||||||
tr:file-uri tmp-dir |
| result | filename = ($tmpdir[normalize-space()], concat(/c:result/@user-home, '/tmp/'))[1] | ||||||
p:group d52e451 | |||||||||
p:variable tmp-dir-href | /c:result/@local-href | ||||||||
p:add-attribute local-href | result | attribute-name = 'local-href' match = '/*' attribute-value = concat( $tmp-dir-href, replace( replace($catalog-resolved-uri, '^.+/', ''), '(.+?)([.?#].+)?', '$1' ), if ($make-unique = 'true') then concat('_', substring(/*/@uuid, 1, 8)) else '', replace(replace(replace($catalog-resolved-uri, '^.+/', ''), '^[^?#.]+', ''), '[?#].*$', '') ) | |||||||
p:sink d52e471 |
| ||||||||
p:identity d52e473 |
| result | |||||||
p:add-attribute d52e484 | result | match = '/c:request' attribute-name = 'href' attribute-value = $catalog-resolved-uri | |||||||
p:try http-request | |||||||||
p:group d52e492 | |||||||||
p:http-request d52e496 | result | ||||||||
p:catch d52e499 | error | ||||||||
p:identity d52e503 |
| result | |||||||
p:choose store-http-resource | |||||||||
not(starts-with(/c:response/@status, '2')) | |||||||||
cx:message d52e520 |
| result | message = concat('Cannot retrieve ', $catalog-resolved-uri, '. Status: ', /c:response/@status) | ||||||
p:sink d52e525 | |||||||||
p:add-attribute d52e527 |
| result | attribute-name = 'error-status' match = '/c:result' attribute-value = /c:response/@status | ||||||
/c:response/c:body/(.[normalize-space(.)] | c:data) | |||||||||
p:store d52e548 |
| result | href = /doc/@local-href | ||||||
tr:file-uri http-to-local-result_binary | result | filename = /doc/@local-href | |||||||
p:otherwise | |||||||||
p:store d52e572 |
| result | omit-xml-declaration = 'false' href = /doc/@local-href | ||||||
tr:file-uri http-to-local-result_xml | result | filename = /doc/@local-href | |||||||
p:add-attribute d52e595 |
| result | match = '/c:result' attribute-name = 'href' attribute-value = $catalog-resolved-uri | ||||||
matches($catalog-resolved-uri, '^https?:') and $check-http = 'true' | HTTP URL, check only return status Ok. | ||||||||
p:identity d52e608 |
| result | |||||||
p:add-attribute d52e619 | result | match = '/c:request' attribute-name = 'href' attribute-value = escape-html-uri($catalog-resolved-uri) | |||||||
p:try http-request-check | |||||||||
p:group d52e628 | |||||||||
p:http-request d52e632 | result | ||||||||
p:catch d52e635 | error | ||||||||
p:identity d52e639 |
| result | |||||||
p:sink d52e653 |
| ||||||||
p:identity d52e655 |
| result | |||||||
p:choose attach-error-status | |||||||||
not(starts-with(/c:response/@status, '2')) | |||||||||
p:add-attribute d52e675 | result | attribute-name = 'error-status' match = '/c:result' attribute-value = /c:response/@status | |||||||
p:otherwise | |||||||||
p:identity d52e686 | result | ||||||||
p:add-attribute d52e690 |
| result | match = '/c:result' attribute-name = 'href' attribute-value = $catalog-resolved-uri | ||||||
matches($catalog-resolved-uri, '^https?:') | HTTP URL, do not fetch content or check availability. | ||||||||
p:identity d52e703 | result | ||||||||
matches($catalog-resolved-uri, '^\\\\[^\\]') | Windows UNC path. \\ → file:///// . | ||||||||
p:add-attribute d52e711 |
| result | attribute-name = 'os-path' match = '/*' attribute-value = $catalog-resolved-uri | ||||||
p:add-attribute d52e724 | result | match = '/*' attribute-name = 'local-href' attribute-value = concat('file:///', replace($catalog-resolved-uri, '\\', '/') | |||||||
p:otherwise | Other protocol or relative filename. We don’t support other protocols/notations, so we assume it to be a relative path. | ||||||||
tr:file-uri cwd-uri | result | filename = concat(/c:result/@cwd, '/') | |||||||
tr:file-uri resolved-uri | result | filename = resolve-uri($catalog-resolved-uri, /c:result/@local-href) | |||||||
p:add-attribute lastpath |
| result | attribute-name = 'lastpath' match = '/*' attribute-value = replace(/*/@local-href, '^.+/([^/]+)/*$', '$1') |