Bibliographic Protocol Level 1: Link Resolution and Metapage Retrieval

Status of this Memo

This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

Copyright Notice

Copyright (C) Robert D. Cameron and Serban G. Tatu (2000). All Rights Reserved.

Table of Contents

Abstract

BibP (bibliographic protocol) links bibliographic identifiers of published works to bibliographic services for those works. Identifiers follow the Universal Serial Item Name (USIN) scheme, providing a scholar-friendly conventional notation for journal articles, books and institutional publications, as well as a generic framework that can scale to identify documents in any organized collection. A hierarchical resolution model emphasizes bibliographic services available through local libraries backed up by publisher-specified and global services. Resolution is achieved through existing DNS technology coupled with appropriate client-side support. Deployment of BibP clients with most of the popular web browsers is possible today; this paper presents one such client, written in JavaScript.

1. Introduction

BibP (bibliographic protocol) is a web-based protocol for linking bibliographic references via Universal Serial Item Names [USIN]. It is intended to allow linking to each bibliographic item as a conceptual entity, independent of any particular copy or service with respect to that item. Indeed, it is even intended to allow linking to items which may not exist on-line; resolution of such a link could yield a metapage that identifies existing print-based services (library holdings, document delivery) for accessing the item. In this regard, BibP is a proposed reference linking solution that seeks to maintain integrated access to both newly published on-line items as well as the vast body of print-based literature.

The BibP/USIN approach applies the principles underlying the Uniform Resource Name (URN) concept [RFC1737] to the particular problem of bibliographic linking based on a decentralized model. Members of the URN WG have identified the need for "contextualized resolution" as a complementary approach to the top-down authoritative resolution of URNs using the Dynamic Delegation and Discovery Service (DDDS). Bibliographic protocol provides such contextualized resolution in the bibliographic domain, addressing the need for "appopriate copy" service from local libraries. The focus on the bibliographic domain also allows for the development of a simple approach to decentralized resolution based on existing DNS support for relative domain names [RFC1034]. The model requires no new development or deployment of DNS technology. In addition, the problems of namespace definition and management [RFC2611] are considerably simplified by restriction to bibliographic identifiers of the USIN system.

From the author perspective, reference linking with BibP is intended to be as simple and scholar-friendly as possible. For example, to denote the paper by Norman Paskin entitled "Information Identifiers" as it appears on pages 135-6 of volume 10, issue 2 of the journal Learned Publishing, the BibP link is formed from the journal ISSN, volume and page using the USIN conventional syntax: bibp:ISSN/0953-1513:10@135. Similarly, bibp:RDNS(ietf.org)/RFC:2396 is the minimal syntax that denotes the report by T. Berners-Lee, R. Fielding and L. Masinter entitled "Uniform Resource Identifiers (URI): Generic Syntax," published as Request for Comments 2396 of the Internet Engineering Task Force. In general, BibP links to most published documents can be constructed using elements of existing identification standards combined in a minimal way according to USIN syntactic conventions. In the parlance of Paskin [Idents], USINs are compound identifiers; this contrasts with the simple identifiers (or dumb pointers) of the DOI system [DOI].

Ultimately, the BibP framework is envisioned to facilitate access to bibliographic items through a library-based network of BibP servers. In essence, each library-operated server will provide information and access to items emphasizing locally-available resources and agreements; networking will provide access to items not locally available. For example, a university library may operate a BibP server as the default server for its students and faculty, providing access to bibliographic items in accord with university holdings, interlibrary loan options and site licensing arrangements. However, the framework is also envisioned to allow other options as well. For example, commercial document delivery services may compete to provide BibP service to industrial clients.

As a first step in the staged development of a multi-level specification for BibP, this report addresses the basic client-server interaction in resolving an individual BibP link and retrieving an appropriate metapage. In particular, we present both a Level 1 specification for this interaction and a scalable client-side implementation of that specification. This work is sufficient for initial deployment of BibP-based links and servers.

The remainder of this paper is organized as follows. Section 2 describes the syntactic framework and conventions for Universal Serial Item Names under BibP Level 1. BibP Level 1 itself is addressed in Section 3, with accompanying notes discussing the rationale and planning for further development. A scalable client-side implementation of the specification, in the form of an JavaScript program, is presented in Section 4. Section 5 concludes the paper with a discussion of the further development of BibP.

2. Universal Serial Item Names

Previous work has proposed a system of Universal Serial Item Names (USINs) for the persistent identification of documents published or otherwise organized in serial collections [USIN]. The overall framework defines a concept of publication domains within which standardized codes are used to identify particular collections. In principle, each collection may then have its own particular system of hierarchical enumeration and labeling to identify particular published items within the collection. In this way, the USIN framework is generic and extensible; it can be readily scaled to provide for unambiguous identification of documents in any organized collection.

This section defines a precise syntactic framework for USINs, slightly modified from the original proposal to better account for the encoding requirements of HTML documents. Within this framework, each collection potentially has its own syntax. However, the USIN proposal also outlines a conventional predefined syntax that provides substantial coverage of the existing literature published as journal articles, books, book articles (include papers in published proceedings) and institutional reports in numbered series. The conventional syntax is formalized here and is used as the basis of identification under BibP Level 1. Mechanisms for defining customized syntax for particular publication domains or collections are left for future work.

2.1 Grammatical Notation

The grammatical notation used for describing the syntax of USINs is based on EBNF. Terminal symbols (symbols that will actually appear in the syntactic forms) are enclosed in quotation marks. Nonterminal symbols (names of syntactic classes) are expressed as identifiers with possible embedded hyphens or underscores. Alternative syntactic forms are separated by the vertical bar ("|"). Parentheses ("(" and ")" are used to group syntactic phrases. Square brackets ("[" and "]") are used for optional phrases. Braces ("{" and "}") are used for phrases to be repeated zero or more times. Names of nonprinting characters are enclosed in angle brackets ("<" and ">").

2.2 Character Set

Under BibP Level 1, USINs are character strings composed of characters in the following classes.

UC_LETTER = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | 
            "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" |
            "U" | "V" | "W" | "X" | "Y" | "Z"
LC_LETTER = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | 
            "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" |
            "u" | "v" | "w" | "x" | "y" | "z"
LETTER = UC_LETTER | LC_LETTER
DIGIT = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
ALPHANUMERIC = LETTER | DIGIT
EXTENDER  = "_" | "-" 
SEPARATOR = "/" | ":" | "!" | "@" | "$" | "*" | "~" | "+" | "," | "."
PAREN = "(" | ")"
WHITE = <newline> | <space> 

The USIN framework is designed to accommodate future extension of the USIN character set in support of internationalization. That is, non-ASCII characters of Unicode/ISO 10646 [Unicode] may be added to the LETTER, DIGIT, and EXTENDER character classes. However, USINs are designed to be parsed based on recognition of SEPARATOR and PAREN characters. Thus, carefully written USIN parsers under BibP Level 1 may accommodate future extensions to the USIN character set without modification.

Closely related to the USIN is the notion of a USIN Octet Sequence (UOS), an encoding of a USIN as a sequence of 8-bit bytes. USINs themselves are simply character strings without any particular constraint on their representation. Thus a USIN may be represented as a sequence of handwritten or printed marks on paper. Alternatively, it may be represented as a series of 16-bit quantities in the UCS-2 format of Unicode/ISO 10646 [Unicode]. However, when a USIN is to be communicated under BibP Level 1, it is always encoded as a USIN Octet Sequence, as described in Section 3.1 following.

2.3 Lexical Elements and Generic Grammar

USINs are made up of lexical elements known as symbols, operators and phrases.

symbol = ALPHANUMERIC {[EXTENDER] ALPHANUMERIC}
operator = SEPARATOR {SEPARATOR}
phrase = "(" {ALPHANUMERIC | EXTENDER | SEPARATOR} ")"

Symbols are generally names or numerals that identify particular entities within some level of the identification hierarchy. Parenthesized phrases play a similar role but provide wider-ranging syntax for imported notations and/or internal structure. Operators are generally syntactic markers that guide the interpretation of symbols and phrases.

WHITE characters (whitespace) may be embedded in a USIN only in accord with the following hyphenation convention. A hyphenation substring consisting of a single hyphen ("-") followed by zero or more whitespace characters may be inserted before an operator or parenthesized phrase. Whitespace inserted in this way has no semantic effect. This hyphenation convention is systematic: for each grammatical rule of the USIN syntax, a hyphenation substring is implicitly permitted before each operator or parenthesized phrase.

The hyphenation convention permits a USIN appearing in plain text to be formatted over more than one line. Cut-and-paste operations on USINs displayed in this manner may thus extract USINs with embedded whitespace. USIN processing software will normally remove the embedded whitespace prior to further work.

The USIN framework allows symbols, operators and phrases to be combined in a variety of ways, depending on the identification needs of particular publication domains and collections. However, a USIN must always satisfy the following generic grammar of permissible USIN forms (after removal of hyphenation substrings).

form = symbol | form phrase | form operator symbol

The generic grammar of forms reflects the hierarchical left-to-right structure of USINs. The most elementary form of a USIN is a single symbol. All other USINs are formed hierarchically by extending known forms with additional identification elements consisting of phrases or operator-symbol combinations.

2.4 Syntactic Framework

The syntactic framework for USINs identifies publication-domains, collections, items, and attributes as the four key syntactic structures. The term USIN may refer to any one of these structures, which are hierarchically related as follows.

USIN = publication-domain | collection | item | attribute
collection = publication-domain "/" collection-label
item = (collection | item) item-extension
attribute = (collection | item | attribute) "!" attribute-specifier

For example, consider the USIN ISSN/0953-1513:10@135!title. The publication domain is ISSN, the space of all serial publications registered with an International Standard Serial Number [ISO3297]. The collection is the set of all articles published in the journal whose ISSN is 0953-1513, namely, Learned Publishing. The item extensions :10 and @135 specify respectively volume 10 of the journal and the article that appears on page 135 of that volume (using the conventional syntax described later). Attribute notation is used to specify the title of the article as the object of interest.

2.5 Publication Domains

Publication domains represent namespaces within which publications and other collections are assigned identifiers according to a specific scheme and/or authority. The syntax presented here is used both for the three initial domains supported under BibP Level 1 (namely ISSN, ISBN, and RDNS) and for future domains. Although the initial domains provide for substantial coverage of referenced literature, the general syntax accommodates future development of a richer hierarchical domain structure to provide for both greater coverage and the development of more mnemonic forms.

Publication domains may be simple, hierarchical, and/or parameterized.

publication-domain = symbol 
                     | publication-domain "." symbol
                     | publication-domain phrase

When a parenthesized phrase is appended to a publication domain, it may be considered to instantiate that domain for the particular string value given in parentheses.

Under BibP Level 1, two simple domains are predefined, represented by the symbols ISSN and ISBN. As noted previously, the ISSN domain consists of those serial publications that may be identified by an International Standard Serial Number. Similarly, the ISBN domain is the space of those publications identified by an International Standard Book Number [ISO2108].

RDNS is a parameterized domain that uses a restricted subset of names assigned under the Domain Name System [RFC1034] to identify publication namespaces for individual institutions. For example, RDNS(sfu.ca) denotes a publication namespace for Simon Fraser University, while RDNS(ietf.org) denotes a similar namespace for the Internet Engineering Task Force. Here, the parameter string must a well-established domain name under DNS that is both owned by the institution and has a clear interpretation as a code for that institution.

The domain parameter for RDNS is case-insensitive, following the conventions for DNS. For example, RDNS(sfu.ca) and RDNS(SFU.CA) are equivalent. Following DNS tradition, the lower case version of the RDNS parameter is considered the canonical and preferred form.

Hierarchical divisions of an institution may be identified by hierarchical RDNS domains. The subdomains are identified by unambiguous codes for the divisions as used by the institution itself. For example, RDNS(sfu.ca).CMPT denotes the School of Computing Science at Simon Fraser University using the four-letter code CMPT unambiguously used by SFU for the School. Alternatively, RDNS(cs.sfu.ca) also denotes the School, using its well-established DNS name.

The astute reader may note that the parameterized domain syntax used for RDNS differs from the quoted DNS names original proposed [USIN]. It is slightly cleaner and simplifies the USIN Octet Sequence representation (see Section 3.1) by eliminating the need for escape-encoding of quotation marks.

2.6 Collections and Collection Labels

Collections are sets of documents organized by a particular serial numbering scheme. For example, a journal is typically a collection organized using volume, issue and page numbering, while a technical report series is a collection organized by a numbering scheme specified by the issuing institution. A book may be a collection of articles (for example, the proceedings of a conference) or may be considered a singleton collection (a single document in its own right).

Collection labels are symbols that identify particular collections within the context of a publication domain.

collection-label = symbol

Collection labels must always conform to this syntax, but particular publication domains may impose further restrictions.

In the context of the ISSN domain, collection labels are restricted to the following ISSN syntax [ISSN].

collection-label(ISSN) = ISSN
ISSN = DIGIT DIGIT DIGIT DIGIT ["-"] DIGIT DIGIT DIGIT DIGITX
DIGITX = DIGIT | "X" | "x"

The embedded hyphen within an ISSN is preferred and canonical for USIN syntax, but may be omitted. Similarly, the upper case X is the preferred and canonical form for the ISSN check digit denoting 10, but x is considered equivalent. BibP servers must accept serial-codes in any of these forms. However, when generating or otherwise reporting USINs within the ISSN domain, BibP servers must use the canonical forms.

Collection labels within the ISBN domain similarly follow ISBN syntax [ISO2108].

collection-label(ISBN) = ISBN
ISBN = INTEGER "-" INTEGER "-" INTEGER "-" DIGITX |
       DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGITX
INTEGER = DIGIT {DIGIT}

The preferred and canonical form of ISBNs includes the correct hyphenation to separate it into four fields for the country/group coding, publisher coding, title coding and check digit. Each of the first three fields is variable length, but in total these fields must contain exactly nine digits. As with ISSNs, x may be used for the check digit, but X is preferred and canonical.

With RDNS domains, collection labels should be the identifiers actually used by the institution. For example, the Internet Engineering Task Force uses RFC to refer to documents in its Request for Comments Series, so this collection may be identified by the USIN RDNS(ietf.org)/RFC. The technical report series of SFU's School of Computing Science is denoted RDNS(sfu.ca).CMPT/TR. Theses published by an institution are conventionally denoted by the abbreviation for the degree, so RDNS(sfu.ca).CMPT/PhD denotes the Ph.D. thesis series of the School.

2.7 Items and Item Extensions

Item is the generic term used to refer to an individual document or group of documents that form an identified division within the hierarchical identification scheme of a collection. For example, the volumes, issues and articles of a journal are all items.

Within the context of a specific collection, item extensions are the USIN suffixes that specify items. In general, the syntax and interpretation of item extensions depends on the particular collection or publication domain involved. However, each item extension conforms to the generic grammar of Section 2.2.

The USIN conventional syntax predefines a number of item extensions for common forms of hierarchical identification. These involve the operators ":" for introducing the principal enumeration scheme of a collection, "@" for page-based article specification and "$" for direct article specification by symbol or count.

Whenever a collection is explicitly divided into enumerated divisions, the ":" operator is used to introduce the division label. Volumes of a journal are a typical use, so :10 is the item-extension specifying volume 10 of Learned Publishing in the USIN ISSN/0953-1513:10@135. Although volumes will be denoted by integer numerals in most cases, the conventional syntax also permits arbitrary symbols. For example, ISSN/0098-5589:SE-12 denotes Volume SE-12 of IEEE Transactions on Software Engineering.

Report numbers, year numbers and other top-level enumeration elements are also introduced using the ":" operator. For example, RDNS(ietf.org)/RFC:2396 denotes RFC 2396 of the IETF Request for Comment series, while RDNS(sfu.ca).CMPT/PhD:2000 denotes PhD theses published in the year 2000 by the SFU School of Computing Science.

The USIN convention for journals also includes a syntax for issue numbers as a second level of enumeration, namely a parenthesized phrase. Thus ISSN/0953-1513:10(2) denotes volume 10, issue 2 of Learned Publishing. Special issues and combined issues typically use non-numeric issue strings. For example, ISSN/0038-0644:20(S2) denotes special issue S2 of volume 20 of Software--Practice & Experience (December 1990), while ISSN/0361-526X:36(3/4) denotes combined issue 3/4 of volume 36 of Serials Librarian (1999). Note that the parenthesized notation for issues is also quite common in bibliographic citations; the USIN convention takes advantage of this for mnemonic effect.

The second conventional operator under the USIN system is the "@" operator for specifying articles in books or journals by starting page number. For example, the article at page 135 of Learned Publishing 10(2) is denoted ISSN/0953-1513:10(2)@135. For journals that are paginated by volume, such as this one, the issue number may be omitted; ISSN/0953-1513:10@135 is thus equivalent to the USIN just given.

In the event that more than one article starts on a given page, the articles are numbered sequentially with an alphabetic code: a for the first, b for the second, c for the third, and so on. In the rare event that there are more than 26 articles on the page, the code allows arbitrary base 26 numerals such as aa for the 27th item, ab for the 28th and so on.

The final form of item extension in the USIN conventional syntax uses the "$" operator to specify articles in unpaginated e-journals or other contexts by a numeric or symbolic label. Numeric labels indicate either explicit or implicit enumeration within contents lists. However, where a clear symbolic label exists either in plain text or encoded in the article URL, then the symbolic form is preferred and canonical. For example, the article "Towards Universal Serial Item Names" published in Volume 1, Issue 3 of the Journal of Digital Information is denoted ISSN/1368-7506:1(3)$Cameron, where Cameron is the symbolic code clearly used in the JoDI URL to distinguish this article from others in the same issue.

The generic USIN grammar supports the definition of many additional forms of item extension. Future developments of the USIN system will likely introduce additional conventional syntax as well as mechanisms for specifying domain- or collection-dependent syntax.

2.8 Attributes and Attribute Specifiers

Attributes are properties or metadata elements that pertain to a particular collection or item. The USIN syntax reserves the "!" operator for introducing attribute-specifiers, as shown in the grammar of Section 2.3 above. An attribute-specifier itself consists of a symbol naming the attribute, with an optional parenthesized phrase to specify a parameter value.

attribute-specifier = symbol [phrase]
For example, ISSN/0953-1513!title denotes the title of the journal whose ISSN is 0953-1513, namely Learned Publishing, ISSN/0953-1513:10@135!title denotes the article title "Information Identifiers" and ISSN/0953-1513:10@135!author(1) denotes the first (and only, in this case) author of this article, namely, Norman Paskin.

In general, attributes denote publication facts about particular items or collections. Attributes are not intended to account for classification or other metadata that may be attributed to items by third parties. Philosophically, third-party metadata is considered to be interpretative, not factual. Different third parties may well describe and/or classify the same document in quite different ways. Thus the attribute sets used with USINs may be expected to be substantially narrower than general metadata element sets such as those of Dublin Core [RFC2413].

Of particular importance to the further development of the BibP network as it evolves towards the concept of a universal citation database [UCD] is the parameterized ref attribute. This attribute refers to the bibliographic references in a document, identified by numeric or symbolic citation tag. For example, RDNS(ietf.org)/RFC:2XXX!ref(UCD) denotes the document cited as UCD in this RFC.

The ref attribute supports even broader coverage of the literature than that provided by the direct identification provisions of the USIN conventional syntax. Any documents that are cited within other documents may be identified by specification of the citing document and a citation tag. Effectively, this provides for universal coverage of all documents that are transitively reachable by citation.

The attribute framework of the USIN scheme is substantially an area for future work, however. No requirements for attribute processing are specified under BibP Level 1, except to recognize that attribute syntax is valid.

3. BibP Level 1: Description and Rationale

BibP Level 1 establishes the syntax of BibP links together with requirements on HTTP-based client-server interactions for resolving individual links and retrieving bibliographic metapages for display to the user. A BibP client is a web browser or other user agent that either has built-in support for BibP (BibP-aware user agent) or operates in conjunction with an appropriate client-side script. (Section 4 presents one such script-based implementation of BibP link resolution). The BibP client resolves BibP links by identifying an appropriate BibP server and generating a well-formatted BibP request to that server. Upon receiving the request, the BibP server is responsible for generating an HTML page presenting relevant bibliographic and service information with respect to the cited item.

3.1 BibP Link Syntax

A BibP link is a uniform resource identifier (URI) [RFC2396] of the form bibp:UOS, where UOS is a USIN Octet Sequence as described below. In the parlance of RFC2396, BibP links are absolute URIs whose scheme is bibp and whose scheme-specific-part is a UOS of a cited USIN. The UOS is considered an opaque part because its structure has no meaning with respect to the network.

In the normal case, a UOS is simply the representation of a USIN as an ASCII character string [ASCII]. Under BibP Level 1, the only exception is that WHITE characters must be encoded according to the following grammar.

WHITESPACE = (CR | LF | HT | SPACE)*
CR = "%" "0" ("D" | "d")
LF = "%" "0" ("A" | "a")
HT = "%" "0" "8"
SPACE = "%" "2" "0"

That is, the URI transformation of escape encoding [URI] must be applied to the normal ASCII spacing control characters to produce the UOS. Also note that the WHITESPACE grammar permits newlines to be encoded using any of the common file format conventions with various combinations of CR and LF characters.

As described previously, the USIN character set is subject to future extension to include non-ASCII characters of Unicode/ISO 10646 for the purpose of internationalization. These characters may be represented within a UOS by first expressing them as octet sequences in the UTF-8 format of Unicode and then applying the URI-encoding transformation to the octets. Because UTF-8 octet sequences for non-ASCII characters always have their high-order bit set, the first hex digit of the escaped encoding will be 8 through F. Thus character sequences of the following grammar may be expected.

A_F = "A" | "B" | "C" | "D" | "E" | "F" | 
      "a" | "b" | "c" | "d" | "e" | "f"
HEX8_F = "8" | "9" | A_F
HEX = DIGIT | A_F
UTF-8_encoded = "%" HEX8-F HEX

Finally, although the canonical and preferred representation of the USIN characters under BibP Level 1 is indeed as normal ASCII octets, URI-encoded forms thereof are permitted and considered equivalent. Thus a UOS may also contain character sequences of the following grammar.

HEX2_7 = "2" | "3" | "4" | "5" | "6" | "7"
ASCII_encoded = "%" HEX2_7 HEX

The USIN syntax is designed to make considerations of escape encoding completely transparent to the user. Under BibP Level 1, the whitespace-free form of every USIN may be entered directly as a normal ASCII character sequence. Escaped forms will normally only be generated by BibP-aware document composition software supporting the cut-and-paste of USINs or other software that performs escape encoding as a part of of general URI processing.

3.2 Server Identification Hierarchy

The first step in BibP link resolution is identification of an appropriate BibP server to handle the request. In order of preference, a BibP client must select from the following servers.

  1. The local bibhost, if it exists (Section 3.3).
  2. The document-specified citehost, if it exists (Section 3.4).
  3. A known global server (such as usin.org) (Section 3.5).

This server identification hierarchy provides for a scalable BibP network with particular provisions for library- and publisher-operated BibP servers. Library-operated servers that provide access to local holdings and site licensing information will generally be made available through the bibhost mechanism. Publisher-operated servers that provide particular support for the BibP links contained in a given document may be specified with the citehost mechanism. The citehost is consulted directly if the local bibhost is unavailable, and is also passed as a parameter in bibhost-based resolution to provide for indirect consultation (see Section 3.7). Both mechanisms provide for scalability by reducing the load on global BibP servers as the overall BibP network grows.

A BibP-aware user agent may provide a finer-grained hierarchy for server identification by allowing users to specify overriding servers at any position within the hierarchy. For example, a particular BibP-aware web browser may specify three separate configuration settings, one each for overriding the BibP server determination at the bibhost, citehost and global levels.

3.3 Default Local Server

The key characteristic of BibP Level 1 is the ability for a locally available server to act as the default BibP server for a particular user environment. The following conventions apply.

  1. The DNS (Domain Name System) alias bibhost is used to identify the default BibP server (if one exists) in the the local environment of the web browser or other user agent. For example, if a web browser accessing a BibP link is operating in the univ.edu domain, then the typical configuration of the local DNS resolver would interpret the relative domain name bibhost as the fully qualified domain name bibhost.univ.edu (if it exists). In accord with the recommendations of RFC 2219 [RFC2219], bibhost is the conventional DNS alias for the BibP protocol.
  2. A bibhost server must signal its ability to respond to BibP Level 1 requests by providing HTTP access to the BibP Identification Icon at the URL http://bibhost/bibp1.0/bibpicon.jpg. A user agent tests for the existence of a conforming bibhost by issuing an HTTP HEAD or GET request for this URL. If an error response is received, the user agent directs resolution of the BibP link to the next level in the server hierarchy.
  3. A bibhost server must provide HTTP-based access to a JavaScript-based implementation of BibP link resolution at the URL http://bibhost/bibp1.0/bibres.js. See Section 4 for a suitable script. A user agent may implement link resolution through this script or by some other method. If the script is unavailable, a user agent may direct resolution of a BibP link to the next level in the server hierarchy.

Use of the DNS alias bibhost provides a browser-independent and highly configurable mechanism for identifying local BibP servers. Using the normal configuration options available with typical DNS software, it is possible to configure a local BibP server on either a per-client basis or a per-domain basis. Configuration of the DNS resolver on a client machine can specify the machine to be used as bibhost for that client only. However, configuration of a DNS nameserver to provide a bibhost definition for an entire local domain will normally be a much more convenient option. Such a configuration can generally be made without requiring any configuration actions on individual client machines on the network, assuming only that the DNS resolvers on those machines follow the usual practice of including the local domain in the search list for resolution of relative domain names.

The BibP Identification Icon has four roles. First, it provides a graphical trademark serving to visually identify a particular bibhost as a participating server with respect to the BibP network. Second, it provides an extra level of assurance to user agents that bibhost does indeed denote a BibP server rather than a machine that just happens to have that name. Third, it allows distinction between different levels and versions of the BibP protocol that may be supported by a particular BibP server. Fourth and finally, given the restrictive security model of JavaScript and other client-side scripting languages, it also provides for feasible script-based testing of bibhost existence using image preloading.

The availability of a JavaScript-based resolver on the bibhost server provides for flexibility, scalability and maintainability. Although other resolution mechanisms exist, the local script nevertheless provides authors, publishers and user agents the flexibility to delegate link resolution to the local service. Such delegation represents an inherently scalable design in comparison to an implementation that relies on JavaScript served from a single global source. Furthermore, as the BibP protocol evolves, previously published documents can benefit from updated resolution scripts installed on local bibhosts.

The use of the path component bibp1.0 in the URLs for the identification icon and local resolution script identifies specific support for Level 1 of BibP. Future clients dependent on services defined at Level 2 must not assume that these are available from a bibhost identifying itself as a provider of Level 1 services only.

3.4 The Document-Specified Server

In order to identify a BibP server that provides specific and known support for the links in a particular document, publishers or authors may use the citehost mechanism. In the absence of a local bibhost, the citehost denotes the actual BibP server to be used for link resolution and metapage retrieval. When a local bibhost is known, the citehost setting is passed on to the bibhost for consultation or citation as a service relevant to the identified document.

A citehost is specified by the http URL of a server or server subdirectory. To set the citehost to http://www.pubhost.com/bibpserver/, for example, two declarations should be included in the <HEAD> element of the document.

<link rel="citehost" href="http://www.pubhost.com/bibpserver/" />
<script type="text/javascript">
  BibP_citehost = "http://www.pubhost.com/bibpserver/"
</script>

These two declarations respectively identify the citehost to BibP-aware user agents and JavaScript-based user-agents. In particular, the latter declaration is defined to work with the JavaScript resolvers presented in Section 4.

3.5 Default Global Server

In the absence of either a local bibhost or a document-specified citehost, a web browser or other user agent must use a known global server as the default BibP server. At the time of writing of this report, the prototype BibP server at usin.org is available and is being further developed as the recommended global server this purpose.

3.6 Link Translation

After identification of an appropriate server to resolve BibP links, the second step in link resolution is to generate well-formed HTTP requests to that server. The form of those requests is specified using the following translational semantics. A BibP URI of the form bibp:UOS is equivalent to an HTTP URL of one the following forms.

The second form is used when a document-specified citehost is defined in accordance with Section 3.4. In both cases, server denotes the BibP server determined by the rules of Sections 3.2 through 3.5 above. The path component bibp1.0 indicates that the client is expecting resolution services defined at this level (Level 1) of BibP.

A user agent may use this translation rule either explicitly or implicitly to generate well-formed HTTP requests. If used explicitly, the form of the required HTTP request follows directly from the HTTP 1.1 specification [RFC2616]. However, the translation may be implicit, so long as the HTTP request generated is that same as that specified by the explicit translational semantics.

3.7 Metapage Response

Given an HTTP request constructed according to the specifications of Section 3.6, a BibP server must generate an appropriate response in the form of an HTML document [HTML]. When a UOS corresponding to a valid USIN for a known document has been cited, the response page should report bibliographic and service metadata in a format intended for human readers as follows.

  1. The canonical form for the USIN should be reported, removing whitespace and performing transformations as described previously.
  2. Basic bibliographic metadata for the appropriate document type should be provided. For journal articles, this typically includes authors, title, journal, volume, issue, year, month and pagination. For books, author, title, publisher, publisher address, date and total pagination are usual. Other document types include the appropriate bibliographic elements commonly accepted to establish a bibliographic citation.
  3. Additional bibliographic metadata may be provided. This may include an abstract, keywords, classification metadata, known reviews, citations received, additional author information and so on. However, a BibP server must provide only factual metadata in the public domain or copyrighted metadata where explicit permission has been obtained. Links to copyrighted materials (e.g., reviews) are preferred.
  4. Known service metadata for the cited item should be provided. This may include on-line full-text access, paper-based library holdings, document delivery options, additional bibliographic sources and so on. A BibP server operating as bibhost for a particular domain will be expected to emphasize locally available services for that domain.
  5. If citehost has been specified, a link to, or information from, the appropriate document metapage at citehost must be provided.

Under BibP 1.0, no additional constraints are placed on the metadata to be provided on the response page or its format. The intent is to provide a relatively open framework to allow the development of alternative models for document metapages.

BibP 1.0 servers may freely format metadata for human readers, without consideration of how this data may be extracted under program control. However, subsequent development of BibP is expected to specify formal requirements for server-to-server interaction for sharing of basic bibliographic and service metadata.

3.8 Fault Handling

BibP servers must provide mechanisms to handle errors, ambiguities and unknowns.

3.9 Future Server Requirements

It is anticipated that BibP Level 2 will impose additional requirements on BibP servers, particularly in the areas of server-to-server interactions and acceptance of metadata submissions. BibP Level 3 is further planned to incorporate the capture and dissemination of the citing relationship (from citing works to cited works) as metadata, as a step towards the universal citation database [UCD]. BibP Level 1 server software should be designed to accomodate these evolving requirements.

4. A JavaScript Resolver for BibP

This section presents and documents a JavaScript program for client-side resolution of BibP. This program is intended to be embedded in the HEAD element of HTML documents to implement client-side resolution with browsers that provide JavaScript support. The program has been written to use only those JavaScript features that are relatively standard and are expected to remain so. The script is effective with Netscape Navigator (versions 3 through 6), Internet Explorer (versions 4 through 5.5) and Opera (version 4), although full bibhost support is not yet available in the latter.

The prefix BibP_ is used for all global functions and variables of the resolver so that the resolver can be freely mixed with other client-side JavaScript that respects this prefix. The full script, in a relatively condensed form for ease of cut-and-paste, is presented immediately below and documented in the following subsections.

<script type="text/javascript">
<!-- // bibres.js  version 1.1
     // (c) Robert D. Cameron and Serban Tatu, November 2000. 
     // GNU General Public License, Version 2 applies.
var BibP_BaseURL;
var BibP_nocitehost = typeof(BibP_citehost) == "undefined";
function BibP_SetBaseURL (server) {
  BibP_BaseURL = server + "bibp1.0/resolve?" +
  (BibP_nocitehost ? "usin=" : "citehost="+ BibP_citehost+ "&usin=")}
BibP_SetBaseURL(BibP_nocitehost ? "http://usin.org/" :BibP_citehost);
function BibP_onMouseOver () {
  window.status = "bibp:" + this.href.substring(BibP_BaseURL.length); 
  return true}
function BibP_onMouseOut () {window.status = "";  return true}
function BibP_ProcessLink(L, srchKey) {
  var spot = L.href.indexOf(srchKey);
  if (spot != -1) {
    L.href = BibP_BaseURL + L.href.substring(spot + srchKey.length);
    L.onmouseover = BibP_onMouseOver;
    L.onmouseout = BibP_onMouseOut}}
var BibP_Icon = new Image ();      // To test for local bibhost icon.
function BibP_onIcon () {
  if (BibP_Icon.height!=0) {
    var oldBase = BibP_BaseURL;
    BibP_SetBaseURL("http://bibhost/");
    for (var i = 0; i < document.links.length; i++) 
      BibP_ProcessLink(document.links[i], oldBase)}}
function BibP_onLoad () {
  for (var i = 0; i < document.links.length; i++)
    BibP_ProcessLink(document.links[i], "bibp:")
  BibP_Icon.onload = BibP_onIcon;  // Now test for bibhost.
  BibP_Icon.src = "http://bibhost/bibp1.0/bibpicon.jpg"}
if (typeof(navigator.bibpSupport) == "undefined") {
  window.onload = BibP_onLoad}
// -->
</script>

4.1 Setting BibP_BaseURL

The core strategy of the resolver is to define and use the global variable BibP_BaseURL as the common prefix for resolution of BibP links. That is, given a link of the form bibp:USIN, the link translation of Section 3.6 is performed by concatenation of BibP_BaseURL and USIN. The function BibP_SetBaseURL constructs the prefix given a BibP server as its input parameter and using the global setting of BibP_citehost as described in Section 3.4.

The determination of the server to be used for BibP_BaseURL follows the server identification hierarchy of Section 3.2. Initially, the value of BibP_BaseURL is set based on the document-specified BibP_citehost if it exists, or the global server usin.org, otherwise. However, if a test for the availability of a local bibhost subsequently proves successful, BibP_BaseURL will be adjusted to use bibhost (BibP_onIcon function).

The test for the availability of bibhost uses the image preloading feature of common web browsers to check the required identification icon at http://bibhost/bibp1.0/bibpicon.jpg. After the document has loaded, and links have been processed with the initial value of BibP_BaseURL, the assignment of the src property of BibP_Icon initiates the test for that icon (BibP_onLoad function). On a successful load event, the BibP_onIcon handler is called. If an icon of nonzero height is reported, bibhost is used to establish BibP_BaseURL. A zero height icon indicates either that images were turned off in the browser (an empty icon is trivially loaded without verifying the existence of bibhost) or an erroneous icon.

4.2 Translating and Displaying BibP Links

The function BibP_ProcessLink is responsible for translating BibP links to the appropriate URLs as well as for arranging for correct display of the links in the browser status bar on MouseOver events. It is first used within the BibP_onLoad function to process links based on the initial BibP_BaseURL value (before bibhost testing). Subsequently, it may also be used within the BibP_onIcon function to update the translation if bibhost availability is confirmed.

The two-pass approach assures the availability of BibP link service as soon as a document is loaded. Because the test for bibhost availability proceeds asynchronously with user action, it is possible that a user may access a BibP link after document loading but before bibhost availability is known. In this case, service from the citehost or the global server will be provided.

Link translations are effected within BibP_ProcessLink by changing the stored href attribute associated with each BibP link. In the first pass, the USIN is determined as the substring following the first occurrence of the string "bibp:" and the transformed link is formed by appending this USIN to the value of BibP_BaseURL. The second pass, invoked if bibhost availability has been confirmed, performs a similar transformation, replacing the initial BibP_BaseURL prefix with the updated value.

A complication of href modification is that the link value displayed in the browser status bar on mouseover events would normally be the actual stored value, not the original bibp: form. The BibP_onMouseOver function arranges to display the original form, while the BibP_onMouseOut function ensures that the status bar is cleared when the mouse is moved off a BibP link.

4.3 Future Development of Client-Side Resolvers

It is anticipated that future versions of web browsers and other user agents will provide direct BibP support. In this case, it will likely be desirable to disable the JavaScript resolver. The resolver provides for this with the test on the property navigator.bibpSupport. Any appropriately defined value for this property will prevent BibP link processing that would otherwise be initiated by the window.onload event.

It is also anticipated that future developments of the JavaScript resolver may expand the coverage of browser support and perhaps add functionality. Updated resolvers should be available through each local bibhost of the BibP network. These resolvers may be directly used in documents served from the domain. For example, if bibhost.xxx.tld has been implemented, it is recommended that documents served from within the xxx.tld domain incorporate client-side resolution using the following declaration in the HEAD element.

<script type="text/javascript" 
           src="http://bibhost.xxx.tld/bibp1.0/bibres.js">
</script>
This provides for automatic updating of client-side resolvers without document modification.

5. Security Considerations

BibP Level 1 is intended to provide public, read-only access to information metapages offered by libraries and other bibliographic service providers. Because requests are implemented through translation to http, no security assumptions beyond those afforded by http should be made in offering the initial metapage. However, the metapage itself may offer authenticated access for licensed or otherwise protected materials through https or other mechanisms.

It is possible that spoofing of the bibhost for a particular domain could provide inaccurate bibliographic or metaservice information. However, such an effect would be localized and should be easy to address by the local domain administrator. A second security concern is the security of the default global service at usin.org as well as the potential use of www.bibhost.com to capture global services. Both of these domains have been registered by the first author of this document; eventually they should be turned over to an appropriate institutional authority.

A potential security concern is the substitution of a malicious JavaScript applet in place of the JavaScript resolver under bibp1.0/bibres.js. Server administrators should ensure the security of installed resolvers.

6. Conclusions

Bibliographic Protocol Level 1 provides a new layer of abstraction for web-based reference linking. In essence, the linking to a copy or service with respect to a referenced document is eliminated in favor of a link to the document itself. The link specifies what the cited document is, not how to access it.

Link resolution under BibP is based on an open-architecture model involving a network of library-based and publisher-based servers. A default global BibP service is also defined, but can be overridden by alternative global services as configured in BibP clients. Document-specified citehosts override global servers while library-specified bibhosts override citehosts and global servers. Users may also configure personal bibliographic servers to take precedence over all of these through the flexibility of the bibhost relative domain name mechanism.

A JavaScript based client-side resolver is incorporated into HTML documents to enable BibP with current web browsers. The protocol may also be implemented natively, without the use of JavaScript. Several such clients have been written, including a BibP-aware version of lynx. The JavaScript resolver allows the use of the protocol with a critical mass of existing web browsers, but is also designed for the graceful introduction of native BibP support over time.

BibP servers may be implemented using a variety of technologies. Trivial bibhost service may be implemented entirely using Apache rewrite rules. An initial prototype with full support for BibP Level 1 and several features of BibP Level 2 has been constructed using Java servlets [BibP-MSc]. Prototypes for local bibhost service have been implemented using PHP/YAZ to provide access to local library catalog information via Z39.50.

This report documents the current state of ongoing developments with bibliographic protocol. It is also intended to initiate an open process of further protocol development.

7. References

[BibP-MSc]
Serban Tatu. "Bibliographic Protocol: Distributed Reference Linking to Document Metaservices on the Web." M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 2000. USIN: RDNS(sfu.ca).CMPT/MSc:2000$SerbanTatu
[DOI]
Norman Paskin. "DOI: Current Status and Outlook," D-Lib Magazine Volume 5, Number 5, May 1999. USIN: ISSN/1082-9873:5(5)$paskin
[Idents]
Norman Paskin. "Information Identifiers," Learned Publishing Volume 10, Number 2, April 1997, pp. 135-156. USIN: ISSN/0953-1513:10@135
[ISO2108]
International Organization for Standardization, Information and documentation - International standard book numbering (ISBN), ISO 2108:1992, 1992. USIN: RDNS(iso.ch)/ISO:2108(1992)
[ISO3297]
International Organization for Standardization, Information and documentation - International standard serial numbering (ISSN), ISO 3297:1998, 1998. USIN: RDNS(iso.ch)/ISO:3297(1998)
[RFC1034]
P. Mockapetris. "Domain Names - Concepts and Facilities," Request for Comments 1034, Internet Engineering Task Force, November 1987. USIN: RDNS(ietf.org)/RFC:1034
[RFC1737]
K. Sollins and L. Masinter. "Functional Requirements for Uniform Resource Names," Request for Comments 1737, Internet Engineering Task Force, December 1994. USIN: RDNS(ietf.org)/RFC:1737
[RFC2219]
M. Hamilton and R. Wright. "Use of DNS Aliases for Network Services," Request for Comments 2219, Internet Engineering Task Force, October 1997. USIN: RDNS(ietf.org)/RFC:2219
[RFC2276]
K. Sollins. "Architectural Principles of Uniform Resource Name Resolution," Request for Comments 2276, Internet Engineering Task Force, January 1998. USIN: RDNS(ietf.org)/RFC:2276
[RFC2396]
T. Berners-Lee, R. Fielding and L. Masinter. "Uniform Resource Identifiers (URI): Generic Syntax," Request for Comments 2396, Internet Engineering Task Force, August 1998. USIN: RDNS(ietf.org)/RFC:2396
[RFC2413]
S. Weibel, J. Kunze, C. Lagoze and M. Wolf. "Dublin Core Metadata for Resource Discovery," Request for Comments 2413, Internet Engineering Task Force, September 1998. USIN: RDNS(ietf.org)/RFC:2413
[RFC2611]
L. Daigle, D. van Gulik, R. Iannella, P. Faltstrom. "URN Namespace Definition Mechanisms," Request for Comments 2611, Internet Engineering Task Force, June 1999. USIN: RDNS(ietf.org)/RFC:2611
[RFC2616]
R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee. "Hypertext Transfer Protocol -- HTTP/1.1," Request for Comments 2616, Internet Engineering Task Force, June 1999. USIN: RDNS(ietf.org)/RFC:2616
[UCD]
Robert D. Cameron. "A Universal Citation Database as a Catalyst for Reform in Scholarly Communication." First Monday, Volume 2, No. 4, April 1997. USIN: ISSN/1396-0466:2(4)$cameron
[Unicode]
The Unicode Consortium. The Unicode Standard, Version 3.0, Addison Wesley Longman, Reading, Massachusetts, 2000. USIN: ISBN/0-201-61633-5
[USIN]
Robert D. Cameron. "Towards Universal Serial Item Names," Journal of Digital Information, Volume 1, Number 3, October 1998. USIN: ISSN/1368-7506:1(3)$Cameron