Work in progress
by Gérald Jean Francis Banon
June 2023
Updated in october 2025
The integrity of digital information is an old problem [1]. One feature that determines information integrity and still deserves special attention for archiving purposes is the insertion of hyperlinks in a Web resource. Such a hyperlink must work forever without the need to be edited again in the Web resource, also called here the source resource.
Currently, a widely used model for hyperlink in a source resource is the "absolute hyperlink" which has three components: a persistent, location-independent destination resource identifier, a domain name of the identifier resolver that redirects the user to the current requested destination resource location, and a URL scheme. Such a hyperlink is usually called a persistent hyperlink because it is based on a persistent, location-independent identifier.
A sample of a persistent hyperlink, found in a journal article published by Elsevier [2], is:
https://doi.org/10.1016/j.rse.2021.112667.
The corresponding hyperlink source code is:
<a href="https://doi.org/10.1016/j.rse.2021.112667" target="_blank">https://doi.org/10.1016/j.rse.2021.112667</a>
In this example:
The URL scheme is: https
The resolver domain name is: doi.org
The destination resource identifier is: 10.1016/j.rse.2021.112667
The disavantage of the so-called a persistent hyperlink in a Web resource (such as the hyperlink above on this page) is that, of its three components, only the destination resource identifier is unlikely to change. On the other hand, there is no garantee that the resolver's domain name and scheme (the communication protocol) will remain unchanged forever.
In other words, the integrity of such hyperlink and, consequently, of the Web resource, which depends on the persistence of the resolver domain name and the scheme, cannot be garanteed for the long term.
The purpose of this note is to illustrate, by mean of three examples, the existence of a digital service that overcomes the potential risk to information integrity when using the usual persistent hyperlinks.
The solution adopted here is to use, as URIs [3], some kind of uniform resource global name in a "relative hyperlink", rather than some kind of uniform resource locator in an "absolute hyperlink".
Where the uniform resource global name consists of a global namespace prefix, global in the sense that the prefix has possibly been registered with Internet Assigned Numbers Authority (IANA), followed by a name within the scope of that namespace.Finally, the Archive (Digital Repository) hosting the source resource is adapted to also serve as a proxy resolver in the sense that it directs, based on the namespace prefix, the client's resolution request to the appropriate resolver and ultimately returns the URL of the destination resource to the client's browser for redirection.
Despite the fact that the use of the resolver's domaine name is made implicit, the proposed solution still relies on a global resolver to exist. For this reason, the relative hyperlinks presented below are not called fully persistent but almost fully persistent. For a solution that does not necessarely require the use of a global resolver, see [4].
The HTML hyperlinks in this section are said "almost fully persistent"† in the sense that each one uses a persistent, location-independent destination resource identifier that is resolved without mentioning explicitly the respective resolver's domaine name (n2t.net in the first example using ARK, doi.org in the second example using DOI and urlib.net in the third using IBI). This property can be verified by looking at the value of the respective href attribute in the source code of this HTML page.
Observation: The above examples work because this page (the Web resource containing the hyperlinks) has been deposited in an Archive hosted on an experimental computational platform called the URLib. On this platform, each Archive serves as proxy resolver.
Table 1 illustrates how the Archive $localSite works as a proxy resolver to solve the above three examples. As with any resolver, the URL path component (e.g., ark:13030/c7cv4br18 — see Line 1 of Table 1) is the URI of the destination resource.
The character string of this URI up to the last colon (:) (e.g., ark) is the namespace prefix that identifies a possible resolver for the resolution of the destination resource identifier (e.g., 13030/c7cv4br18).
In turn, a possible resolver (e.g., $arkResolverURL for ark — see Line 2 of Table 1) is assigned to each namespace prefix (e.g., ark) thus forming a mapping called "Prefix-Resolver".
Finally, based on the output of the prefix-resolver mapping corresponding to a given namespace prefix, the proxy server triggers the appropriate resolver (e.g., see Line 3 of Table 1).
In this implementation, the information contained in the usual persistent hyperlink, consisting of the concatenation of the URL scheme and the resolver domain name, migrates from the source recourse to the prefix-resolver mapping accessible by a cgi-script running in the Archive.
This way, the integrity of the source resource can be preserved over time, even in the presence of any possible future changes to that information. Such changes will have no impact on the source recourse and can be easily reflected in the prefix-resolver mapping.
Table 1 - Proxy resolver operation
Proxy resolver | http://$localSite/ark:13030/c7cv4br18 |
Mapping value for ARK | ark ↦ $arkResolverURL |
ARK global resolver | $arkResolverURL/ark:13030/c7cv4br18 |
Proxy resolver | http://$localSite/urn:doi:10.1016/j.rse.2021.112667 |
Mapping value for DOI | urn:doi ↦ $doiResolverURL |
DOI global resolver | $doiResolverURL/urn:doi:10.1016/j.rse.2021.112667 |
Proxy resolver | http://$localSite/ibi:8JMKD3MGP3W34R/44C25PS |
Mapping value for IBI | urn:doi ↦ $ibiResolverURL |
IBI global resolver | $ibiResolverURL/ibi:8JMKD3MGP3W34R/44C25PS |
‡To be interpreted as a relative hyperlink by the browser, the URI must be preceded by a period (.) followed by a slash mark (/).