This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.
Copyright (C) Robert D. Cameron and Serban G. Tatu (2000). All Rights Reserved.
BibP (bibliographic protocol) links bibliographic identifiers of published works to bibliographic services for those works. Identifiers follow the Universal Serial Item Name (USIN) scheme, providing a scholar-friendly conventional notation for journal articles, books and institutional publications, as well as a generic framework that can scale to identify documents in any organized collection. A hierarchical resolution model emphasizes bibliographic services available through local libraries backed up by publisher-specified and global services. Resolution is achieved through existing DNS technology coupled with appropriate client-side support. Deployment of BibP clients with most of the popular web browsers is possible today; this paper presents one such client, written in JavaScript.
BibP (bibliographic protocol) is a web-based protocol for linking bibliographic references via Universal Serial Item Names [USIN]. It is intended to allow linking to each bibliographic item as a conceptual entity, independent of any particular copy or service with respect to that item. Indeed, it is even intended to allow linking to items which may not exist on-line; resolution of such a link could yield a metapage that identifies existing print-based services (library holdings, document delivery) for accessing the item. In this regard, BibP is a proposed reference linking solution that seeks to maintain integrated access to both newly published on-line items as well as the vast body of print-based literature.
The BibP/USIN approach applies the principles underlying the Uniform Resource Name (URN) concept [RFC1737] to the particular problem of bibliographic linking based on a decentralized model. Members of the URN WG have identified the need for "contextualized resolution" as a complementary approach to the top-down authoritative resolution of URNs using the Dynamic Delegation and Discovery Service (DDDS). Bibliographic protocol provides such contextualized resolution in the bibliographic domain, addressing the need for "appopriate copy" service from local libraries. The focus on the bibliographic domain also allows for the development of a simple approach to decentralized resolution based on existing DNS support for relative domain names [RFC1034]. The model requires no new development or deployment of DNS technology. In addition, the problems of namespace definition and management [RFC2611] are considerably simplified by restriction to bibliographic identifiers of the USIN system.
From the author perspective, reference linking with BibP is
intended to be as simple and scholar-friendly as possible. For
example, to denote the paper by Norman Paskin entitled "Information
Identifiers" as it appears on pages 135-6 of volume 10, issue 2 of
the journal Learned Publishing, the BibP link is
formed from the journal ISSN, volume and page using the USIN
conventional syntax: bibp:ISSN/0953-1513:10@135
.
Similarly, bibp:RDNS(ietf.org)/RFC:2396
is the minimal
syntax that denotes the report by T. Berners-Lee, R. Fielding and
L. Masinter entitled "Uniform Resource Identifiers (URI): Generic
Syntax," published as Request for Comments 2396 of the
Internet Engineering Task Force. In general, BibP links to most
published documents can be constructed using elements of existing
identification standards combined in a minimal way according to
USIN syntactic conventions. In the parlance of Paskin [Idents], USINs are compound identifiers; this
contrasts with the simple identifiers (or dumb pointers) of the DOI
system [DOI].
Ultimately, the BibP framework is envisioned to facilitate access to bibliographic items through a library-based network of BibP servers. In essence, each library-operated server will provide information and access to items emphasizing locally-available resources and agreements; networking will provide access to items not locally available. For example, a university library may operate a BibP server as the default server for its students and faculty, providing access to bibliographic items in accord with university holdings, interlibrary loan options and site licensing arrangements. However, the framework is also envisioned to allow other options as well. For example, commercial document delivery services may compete to provide BibP service to industrial clients.
As a first step in the staged development of a multi-level specification for BibP, this report addresses the basic client-server interaction in resolving an individual BibP link and retrieving an appropriate metapage. In particular, we present both a Level 1 specification for this interaction and a scalable client-side implementation of that specification. This work is sufficient for initial deployment of BibP-based links and servers.
The remainder of this paper is organized as follows. Section 2 describes the syntactic framework and conventions for Universal Serial Item Names under BibP Level 1. BibP Level 1 itself is addressed in Section 3, with accompanying notes discussing the rationale and planning for further development. A scalable client-side implementation of the specification, in the form of an JavaScript program, is presented in Section 4. Section 5 concludes the paper with a discussion of the further development of BibP.
Previous work has proposed a system of Universal Serial Item Names (USINs) for the persistent identification of documents published or otherwise organized in serial collections [USIN]. The overall framework defines a concept of publication domains within which standardized codes are used to identify particular collections. In principle, each collection may then have its own particular system of hierarchical enumeration and labeling to identify particular published items within the collection. In this way, the USIN framework is generic and extensible; it can be readily scaled to provide for unambiguous identification of documents in any organized collection.
This section defines a precise syntactic framework for USINs, slightly modified from the original proposal to better account for the encoding requirements of HTML documents. Within this framework, each collection potentially has its own syntax. However, the USIN proposal also outlines a conventional predefined syntax that provides substantial coverage of the existing literature published as journal articles, books, book articles (include papers in published proceedings) and institutional reports in numbered series. The conventional syntax is formalized here and is used as the basis of identification under BibP Level 1. Mechanisms for defining customized syntax for particular publication domains or collections are left for future work.
The grammatical notation used for describing the syntax of USINs
is based on EBNF. Terminal symbols (symbols that will actually
appear in the syntactic forms) are enclosed in quotation marks.
Nonterminal symbols (names of syntactic classes) are expressed as
identifiers with possible embedded hyphens or underscores.
Alternative syntactic forms are separated by the vertical bar
("|
"). Parentheses ("(
" and
")
" are used to group syntactic phrases. Square
brackets ("[
" and "]
") are used for
optional phrases. Braces ("{
" and "}
")
are used for phrases to be repeated zero or more times. Names of
nonprinting characters are enclosed in angle brackets
("<
" and ">
").
Under BibP Level 1, USINs are character strings composed of characters in the following classes.
UC_LETTER = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" LC_LETTER = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" LETTER = UC_LETTER | LC_LETTER DIGIT = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ALPHANUMERIC = LETTER | DIGIT EXTENDER = "_" | "-" SEPARATOR = "/" | ":" | "!" | "@" | "$" | "*" | "~" | "+" | "," | "." PAREN = "(" | ")" WHITE = <newline> | <space>
The USIN framework is designed to accommodate future extension
of the USIN character set in support of internationalization. That
is, non-ASCII characters of Unicode/ISO 10646 [Unicode] may be added to the LETTER
,
DIGIT
, and EXTENDER
character classes.
However, USINs are designed to be parsed based on recognition of
SEPARATOR
and PAREN
characters. Thus,
carefully written USIN parsers under BibP Level 1 may accommodate
future extensions to the USIN character set without
modification.
Closely related to the USIN is the notion of a USIN Octet Sequence (UOS), an encoding of a USIN as a sequence of 8-bit bytes. USINs themselves are simply character strings without any particular constraint on their representation. Thus a USIN may be represented as a sequence of handwritten or printed marks on paper. Alternatively, it may be represented as a series of 16-bit quantities in the UCS-2 format of Unicode/ISO 10646 [Unicode]. However, when a USIN is to be communicated under BibP Level 1, it is always encoded as a USIN Octet Sequence, as described in Section 3.1 following.
USINs are made up of lexical elements known as symbols, operators and phrases.
symbol = ALPHANUMERIC {[EXTENDER] ALPHANUMERIC} operator = SEPARATOR {SEPARATOR} phrase = "(" {ALPHANUMERIC | EXTENDER | SEPARATOR} ")"
Symbols are generally names or numerals that identify particular entities within some level of the identification hierarchy. Parenthesized phrases play a similar role but provide wider-ranging syntax for imported notations and/or internal structure. Operators are generally syntactic markers that guide the interpretation of symbols and phrases.
WHITE
characters (whitespace) may be embedded in a
USIN only in accord with the following hyphenation convention. A
hyphenation substring consisting of a single hyphen
("-
") followed by zero or more whitespace characters
may be inserted before an operator or parenthesized phrase.
Whitespace inserted in this way has no semantic effect.
This hyphenation convention is systematic: for each
grammatical rule of the USIN syntax,
a hyphenation substring is implicitly permitted before each operator
or parenthesized phrase.
The hyphenation convention permits a USIN appearing in plain text to be formatted over more than one line. Cut-and-paste operations on USINs displayed in this manner may thus extract USINs with embedded whitespace. USIN processing software will normally remove the embedded whitespace prior to further work.
The USIN framework allows symbols, operators and phrases to be combined in a variety of ways, depending on the identification needs of particular publication domains and collections. However, a USIN must always satisfy the following generic grammar of permissible USIN forms (after removal of hyphenation substrings).
form = symbol | form phrase | form operator symbol
The generic grammar of forms reflects the hierarchical left-to-right structure of USINs. The most elementary form of a USIN is a single symbol. All other USINs are formed hierarchically by extending known forms with additional identification elements consisting of phrases or operator-symbol combinations.
The syntactic framework for USINs identifies publication-domains, collections, items, and attributes as the four key syntactic structures. The term USIN may refer to any one of these structures, which are hierarchically related as follows.
USIN = publication-domain | collection | item | attribute collection = publication-domain "/" collection-label item = (collection | item) item-extension attribute = (collection | item | attribute) "!" attribute-specifier
For example, consider the USIN
ISSN/0953-1513:10@135!title
. The publication domain is
ISSN
, the space of all serial publications registered
with an International Standard Serial Number [ISO3297]. The collection is the set of all articles
published in the journal whose ISSN is 0953-1513, namely,
Learned Publishing. The item extensions :10
and
@135
specify respectively volume 10 of the journal and
the article that appears on page 135 of that volume (using the
conventional syntax described later). Attribute notation is used to
specify the title of the article as the object of interest.
Publication domains represent namespaces within which
publications and other collections are assigned identifiers
according to a specific scheme and/or authority. The syntax
presented here is used both for the three initial domains supported
under BibP Level 1 (namely ISSN
, ISBN
,
and RDNS
) and for future domains. Although the initial
domains provide for substantial coverage of referenced literature,
the general syntax accommodates future development of a richer
hierarchical domain structure to provide for both greater coverage
and the development of more mnemonic forms.
Publication domains may be simple, hierarchical, and/or parameterized.
publication-domain = symbol | publication-domain "." symbol | publication-domain phrase
When a parenthesized phrase is appended to a publication domain, it may be considered to instantiate that domain for the particular string value given in parentheses.
Under BibP Level 1, two simple domains are predefined,
represented by the symbols ISSN
and ISBN
.
As noted previously, the ISSN
domain consists of those
serial publications that may be identified by an International
Standard Serial Number. Similarly, the ISBN
domain is
the space of those publications identified by an International
Standard Book Number [ISO2108].
RDNS
is a parameterized domain that uses a
restricted subset of names assigned under the Domain Name System
[RFC1034] to identify publication namespaces
for individual institutions. For example, RDNS(sfu.ca)
denotes a publication namespace for Simon Fraser University, while
RDNS(ietf.org)
denotes a similar namespace for the
Internet Engineering Task Force. Here, the parameter string must a
well-established domain name under DNS that is both owned by the
institution and has a clear interpretation as a code for that
institution.
The domain parameter for RDNS is case-insensitive, following the
conventions for DNS. For example, RDNS(sfu.ca)
and
RDNS(SFU.CA)
are equivalent. Following DNS tradition,
the lower case version of the RDNS parameter is considered the
canonical and preferred form.
Hierarchical divisions of an institution may be identified by
hierarchical RDNS domains. The subdomains are identified by
unambiguous codes for the divisions as used by the institution
itself. For example, RDNS(sfu.ca).CMPT
denotes the
School of Computing Science at Simon Fraser University using the
four-letter code CMPT
unambiguously used by SFU for
the School. Alternatively, RDNS(cs.sfu.ca)
also
denotes the School, using its well-established DNS name.
The astute reader may note that the parameterized domain syntax used for RDNS differs from the quoted DNS names original proposed [USIN]. It is slightly cleaner and simplifies the USIN Octet Sequence representation (see Section 3.1) by eliminating the need for escape-encoding of quotation marks.
Collections are sets of documents organized by a particular serial numbering scheme. For example, a journal is typically a collection organized using volume, issue and page numbering, while a technical report series is a collection organized by a numbering scheme specified by the issuing institution. A book may be a collection of articles (for example, the proceedings of a conference) or may be considered a singleton collection (a single document in its own right).
Collection labels are symbols that identify particular collections within the context of a publication domain.
collection-label = symbol
Collection labels must always conform to this syntax, but particular publication domains may impose further restrictions.
In the context of the ISSN
domain, collection
labels are restricted to the following ISSN syntax [ISSN].
collection-label(ISSN) = ISSN ISSN = DIGIT DIGIT DIGIT DIGIT ["-"] DIGIT DIGIT DIGIT DIGITX DIGITX = DIGIT | "X" | "x"
The embedded hyphen within an ISSN is preferred and canonical
for USIN syntax, but may be omitted. Similarly, the upper case
X
is the preferred and canonical form for the ISSN
check digit denoting 10, but x
is considered
equivalent. BibP servers must accept serial-codes in any of these
forms. However, when generating or otherwise reporting USINs within
the ISSN domain, BibP servers must use the canonical forms.
Collection labels within the ISBN
domain similarly
follow ISBN syntax [ISO2108].
collection-label(ISBN) = ISBN ISBN = INTEGER "-" INTEGER "-" INTEGER "-" DIGITX | DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGITX INTEGER = DIGIT {DIGIT}
The preferred and canonical form of ISBNs includes the correct
hyphenation to separate it into four fields for the country/group
coding, publisher coding, title coding and check digit. Each of the
first three fields is variable length, but in total these fields
must contain exactly nine digits. As with ISSNs, x
may
be used for the check digit, but X
is preferred and
canonical.
With RDNS domains, collection labels should be the identifiers
actually used by the institution. For example, the Internet
Engineering Task Force uses RFC
to refer to documents
in its Request for Comments Series, so this collection may be
identified by the USIN RDNS(ietf.org)/RFC
. The
technical report series of SFU's School of Computing Science is
denoted RDNS(sfu.ca).CMPT/TR
. Theses published by an
institution are conventionally denoted by the abbreviation for the
degree, so RDNS(sfu.ca).CMPT/PhD
denotes the Ph.D.
thesis series of the School.
Item is the generic term used to refer to an individual document or group of documents that form an identified division within the hierarchical identification scheme of a collection. For example, the volumes, issues and articles of a journal are all items.
Within the context of a specific collection, item extensions are the USIN suffixes that specify items. In general, the syntax and interpretation of item extensions depends on the particular collection or publication domain involved. However, each item extension conforms to the generic grammar of Section 2.2.
The USIN conventional syntax predefines a number of item
extensions for common forms of hierarchical identification. These
involve the operators ":
" for introducing the
principal enumeration scheme of a collection, "@
" for
page-based article specification and "$
" for direct
article specification by symbol or count.
Whenever a collection is explicitly divided into enumerated
divisions, the ":
" operator is used to introduce the
division label. Volumes of a journal are a typical use, so
:10
is the item-extension specifying volume 10 of
Learned Publishing in the USIN
ISSN/0953-1513:10@135
. Although volumes will be denoted by
integer numerals in most cases, the conventional syntax also
permits arbitrary symbols. For example,
ISSN/0098-5589:SE-12
denotes Volume SE-12 of IEEE
Transactions on Software Engineering.
Report numbers, year numbers and other top-level enumeration
elements are also introduced using the ":
" operator.
For example, RDNS(ietf.org)/RFC:2396
denotes RFC 2396
of the IETF Request for Comment series, while
RDNS(sfu.ca).CMPT/PhD:2000
denotes PhD theses published in
the year 2000 by the SFU School of Computing Science.
The USIN convention for journals also includes a syntax for
issue numbers as a second level of enumeration, namely a
parenthesized phrase. Thus ISSN/0953-1513:10(2)
denotes volume 10, issue 2 of Learned Publishing.
Special issues and combined issues typically use non-numeric issue
strings. For example, ISSN/0038-0644:20(S2)
denotes
special issue S2
of volume 20
of
Software--Practice & Experience (December 1990), while
ISSN/0361-526X:36(3/4)
denotes combined issue
3/4
of volume 36
of Serials
Librarian (1999). Note that the parenthesized notation for
issues is also quite common in bibliographic citations; the USIN
convention takes advantage of this for mnemonic effect.
The second conventional operator under the USIN system is the
"@
" operator for specifying articles in books or
journals by starting page number. For example, the article at page
135 of Learned Publishing 10(2) is denoted
ISSN/0953-1513:10(2)@135
. For journals that are paginated by
volume, such as this one, the issue number may be omitted;
ISSN/0953-1513:10@135
is thus equivalent to the USIN just
given.
In the event that more than one article starts on a given page,
the articles are numbered sequentially with an alphabetic code:
a
for the first, b
for the second,
c
for the third, and so on. In the rare event that there are
more than 26 articles on the page, the code allows arbitrary base
26 numerals such as aa
for the 27th item,
ab
for the 28th and so on.
The final form of item extension in the USIN conventional syntax
uses the "$
" operator to specify articles in
unpaginated e-journals or other contexts by a numeric or symbolic
label. Numeric labels indicate either explicit or implicit
enumeration within contents lists. However, where a clear symbolic
label exists either in plain text or encoded in the article URL,
then the symbolic form is preferred and canonical. For example, the
article "Towards Universal Serial Item Names" published in Volume
1, Issue 3 of the Journal of Digital Information is
denoted ISSN/1368-7506:1(3)$Cameron
, where
Cameron
is the symbolic code clearly used in the JoDI URL to
distinguish this article from others in the same issue.
The generic USIN grammar supports the definition of many additional forms of item extension. Future developments of the USIN system will likely introduce additional conventional syntax as well as mechanisms for specifying domain- or collection-dependent syntax.
Attributes are properties or metadata elements that pertain to a
particular collection or item. The USIN syntax reserves the
"!
" operator for introducing attribute-specifiers, as
shown in the grammar of Section 2.3 above. An attribute-specifier
itself consists of a symbol naming the attribute, with an optional
parenthesized phrase to specify a parameter value.
attribute-specifier = symbol [phrase]For example,
ISSN/0953-1513!title
denotes the title of
the journal whose ISSN is 0953-1513, namely Learned
Publishing, ISSN/0953-1513:10@135!title
denotes
the article title "Information Identifiers" and
ISSN/0953-1513:10@135!author(1)
denotes the first (and only,
in this case) author of this article, namely, Norman Paskin.
In general, attributes denote publication facts about particular items or collections. Attributes are not intended to account for classification or other metadata that may be attributed to items by third parties. Philosophically, third-party metadata is considered to be interpretative, not factual. Different third parties may well describe and/or classify the same document in quite different ways. Thus the attribute sets used with USINs may be expected to be substantially narrower than general metadata element sets such as those of Dublin Core [RFC2413].
Of particular importance to the further development of the BibP
network as it evolves towards the concept of a universal citation
database [UCD] is the parameterized
ref
attribute. This attribute refers to the bibliographic
references in a document, identified by numeric or symbolic
citation tag. For example,
RDNS(ietf.org)/RFC:2XXX!ref(UCD)
denotes the document cited
as UCD
in this RFC.
The ref
attribute supports even broader coverage of
the literature than that provided by the direct identification
provisions of the USIN conventional syntax. Any documents that are
cited within other documents may be identified by specification of
the citing document and a citation tag. Effectively, this provides
for universal coverage of all documents that are transitively
reachable by citation.
The attribute framework of the USIN scheme is substantially an area for future work, however. No requirements for attribute processing are specified under BibP Level 1, except to recognize that attribute syntax is valid.
BibP Level 1 establishes the syntax of BibP links together with requirements on HTTP-based client-server interactions for resolving individual links and retrieving bibliographic metapages for display to the user. A BibP client is a web browser or other user agent that either has built-in support for BibP (BibP-aware user agent) or operates in conjunction with an appropriate client-side script. (Section 4 presents one such script-based implementation of BibP link resolution). The BibP client resolves BibP links by identifying an appropriate BibP server and generating a well-formatted BibP request to that server. Upon receiving the request, the BibP server is responsible for generating an HTML page presenting relevant bibliographic and service information with respect to the cited item.
A BibP link is a uniform resource identifier (URI) [RFC2396] of the form
bibp:
UOS, where UOS is a USIN Octet
Sequence as described below. In the parlance of RFC2396, BibP links
are absolute URIs whose scheme is bibp
and
whose scheme-specific-part is a UOS of a cited
USIN. The UOS is considered an opaque part
because its structure has no meaning with respect to the
network.
In the normal case, a UOS is simply the representation of a USIN
as an ASCII character string [ASCII]. Under
BibP Level 1, the only exception is that WHITE
characters must be encoded according to the following grammar.
WHITESPACE = (CR | LF | HT | SPACE)* CR = "%" "0" ("D" | "d") LF = "%" "0" ("A" | "a") HT = "%" "0" "8" SPACE = "%" "2" "0"
That is, the URI transformation of escape encoding [URI] must be applied to the normal ASCII spacing control
characters to produce the UOS. Also note that the
WHITESPACE
grammar permits newlines to be encoded using any
of the common file format conventions with various combinations of
CR
and LF
characters.
As described previously, the USIN character set is subject to future extension to include non-ASCII characters of Unicode/ISO 10646 for the purpose of internationalization. These characters may be represented within a UOS by first expressing them as octet sequences in the UTF-8 format of Unicode and then applying the URI-encoding transformation to the octets. Because UTF-8 octet sequences for non-ASCII characters always have their high-order bit set, the first hex digit of the escaped encoding will be 8 through F. Thus character sequences of the following grammar may be expected.
A_F = "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f" HEX8_F = "8" | "9" | A_F HEX = DIGIT | A_F UTF-8_encoded = "%" HEX8-F HEX
Finally, although the canonical and preferred representation of the USIN characters under BibP Level 1 is indeed as normal ASCII octets, URI-encoded forms thereof are permitted and considered equivalent. Thus a UOS may also contain character sequences of the following grammar.
HEX2_7 = "2" | "3" | "4" | "5" | "6" | "7" ASCII_encoded = "%" HEX2_7 HEX
The USIN syntax is designed to make considerations of escape encoding completely transparent to the user. Under BibP Level 1, the whitespace-free form of every USIN may be entered directly as a normal ASCII character sequence. Escaped forms will normally only be generated by BibP-aware document composition software supporting the cut-and-paste of USINs or other software that performs escape encoding as a part of of general URI processing.
The first step in BibP link resolution is identification of an appropriate BibP server to handle the request. In order of preference, a BibP client must select from the following servers.
bibhost
, if it exists (Section
3.3).citehost
, if it exists
(Section 3.4).usin.org
) (Section
3.5).This server identification hierarchy provides for a scalable
BibP network with particular provisions for library- and
publisher-operated BibP servers. Library-operated servers that
provide access to local holdings and site licensing information
will generally be made available through the bibhost
mechanism. Publisher-operated servers that provide particular
support for the BibP links contained in a given document may be
specified with the citehost
mechanism. The
citehost
is consulted directly if the local
bibhost
is unavailable, and is also passed as a parameter in
bibhost
-based resolution to provide for indirect
consultation (see Section 3.7). Both mechanisms provide for
scalability by reducing the load on global BibP servers as the
overall BibP network grows.
A BibP-aware user agent may provide a finer-grained hierarchy
for server identification by allowing users to specify overriding
servers at any position within the hierarchy. For example, a
particular BibP-aware web browser may specify three separate
configuration settings, one each for overriding the BibP server
determination at the bibhost
, citehost
and global levels.
The key characteristic of BibP Level 1 is the ability for a locally available server to act as the default BibP server for a particular user environment. The following conventions apply.
bibhost
is used
to identify the default BibP server (if one exists) in the the
local environment of the web browser or other user agent. For
example, if a web browser accessing a BibP link is operating in the
univ.edu
domain, then the typical configuration of the
local DNS resolver would interpret the relative domain name
bibhost
as the fully qualified domain name
bibhost.univ.edu
(if it exists). In accord with the
recommendations of RFC 2219 [RFC2219],
bibhost
is the conventional DNS alias for the BibP
protocol.bibhost
server must signal its ability to
respond to BibP Level 1 requests by providing HTTP access to the BibP Identification
Icon at the URL
http://bibhost/bibp1.0/bibpicon.jpg
. A user agent tests for
the existence of a conforming bibhost
by issuing an
HTTP HEAD or GET request for this URL. If an error response is
received, the user agent directs resolution of the BibP link to the
next level in the server hierarchy.bibhost
server must provide HTTP-based access to
a JavaScript-based implementation of BibP link resolution at the
URL http://bibhost/bibp1.0/bibres.js
. See Section 4
for a suitable script. A user agent may implement link
resolution through this script or by some other method. If the
script is unavailable, a user agent may direct resolution of a BibP
link to the next level in the server hierarchy.Use of the DNS alias bibhost
provides a
browser-independent and highly configurable mechanism for
identifying local BibP servers. Using the normal configuration
options available with typical DNS software, it is possible to
configure a local BibP server on either a per-client basis or a
per-domain basis. Configuration of the DNS resolver on a client
machine can specify the machine to be used as bibhost
for that client only. However, configuration of a DNS nameserver to
provide a bibhost
definition for an entire local
domain will normally be a much more convenient option. Such a
configuration can generally be made without requiring any
configuration actions on individual client machines on the network,
assuming only that the DNS resolvers on those machines follow the
usual practice of including the local domain in the search list for
resolution of relative domain names.
The BibP Identification Icon has four roles. First, it provides
a graphical trademark serving to visually identify a particular
bibhost
as a participating server with respect to the
BibP network. Second, it provides an extra level of assurance to
user agents that bibhost
does indeed denote a BibP
server rather than a machine that just happens to have that name.
Third, it allows distinction between different levels and versions
of the BibP protocol that may be supported by a particular BibP
server. Fourth and finally, given the restrictive security model of
JavaScript and other client-side scripting languages, it also
provides for feasible script-based testing of bibhost
existence using image preloading.
The availability of a JavaScript-based resolver on the
bibhost
server provides for flexibility, scalability and
maintainability. Although other resolution mechanisms exist, the
local script nevertheless provides authors, publishers and user
agents the flexibility to delegate link resolution to the local
service. Such delegation represents an inherently scalable design
in comparison to an implementation that relies on JavaScript served
from a single global source. Furthermore, as the BibP protocol
evolves, previously published documents can benefit from updated
resolution scripts installed on local bibhosts.
The use of the path component bibp1.0
in the URLs
for the identification icon and local resolution script identifies
specific support for Level 1 of BibP. Future clients dependent on
services defined at Level 2 must not assume that these are
available from a bibhost
identifying itself as a
provider of Level 1 services only.
In order to identify a BibP server that provides specific and
known support for the links in a particular document, publishers or
authors may use the citehost
mechanism. In the absence
of a local bibhost
, the citehost
denotes
the actual BibP server to be used for link resolution and metapage
retrieval. When a local bibhost
is known, the
citehost
setting is passed on to the bibhost
for consultation or citation as a service relevant to the
identified document.
A citehost is specified by the http URL of a server or server
subdirectory. To set the citehost
to
http://www.pubhost.com/bibpserver/
, for example, two
declarations should be included in the <HEAD>
element of the document.
<link rel="citehost" href="http://www.pubhost.com/bibpserver/" /> <script type="text/javascript"> BibP_citehost = "http://www.pubhost.com/bibpserver/" </script>
These two declarations respectively identify the
citehost
to BibP-aware user agents and JavaScript-based
user-agents. In particular,
the latter declaration is defined to work with the
JavaScript resolvers presented in Section 4.
In the absence of either a local bibhost
or a
document-specified citehost
, a web browser or other
user agent must use a known global server as the default BibP
server. At the time of writing of this report, the prototype BibP
server at usin.org
is available and is being further
developed as the recommended global server this purpose.
After identification of an appropriate server to resolve BibP
links, the second step in link resolution is to generate
well-formed HTTP requests to that server. The form of those
requests is specified using the following translational semantics.
A BibP URI of the form bibp:
UOS is
equivalent to an HTTP URL of one the following forms.
http://
server/bibp1.0/resolve?usin=
UOS
http://
server/bibp1.0/resolve?citehost=
citehost&usin=
UOSThe second form is used when a document-specified
citehost is defined in accordance with Section 3.4. In both
cases, server denotes the BibP server determined by the
rules of Sections 3.2 through 3.5 above. The path component
bibp1.0
indicates that the client is expecting resolution
services defined at this level (Level 1) of BibP.
A user agent may use this translation rule either explicitly or implicitly to generate well-formed HTTP requests. If used explicitly, the form of the required HTTP request follows directly from the HTTP 1.1 specification [RFC2616]. However, the translation may be implicit, so long as the HTTP request generated is that same as that specified by the explicit translational semantics.
Given an HTTP request constructed according to the specifications of Section 3.6, a BibP server must generate an appropriate response in the form of an HTML document [HTML]. When a UOS corresponding to a valid USIN for a known document has been cited, the response page should report bibliographic and service metadata in a format intended for human readers as follows.
bibhost
for a particular domain will be expected to emphasize locally
available services for that domain.Under BibP 1.0, no additional constraints are placed on the metadata to be provided on the response page or its format. The intent is to provide a relatively open framework to allow the development of alternative models for document metapages.
BibP 1.0 servers may freely format metadata for human readers, without consideration of how this data may be extracted under program control. However, subsequent development of BibP is expected to specify formal requirements for server-to-server interaction for sharing of basic bibliographic and service metadata.
BibP servers must provide mechanisms to handle errors, ambiguities and unknowns.
resolve
requests for consistency with Section 3.6 and the individual syntax
of USINs for consistency with Section 2 and report any errors.
However, if a syntax error in a resolve
requests
consists of additional
keyword=
value parameters with
&
separators, then the server should simply warn of
unknown parameters and continue to respond to the request by
ignoring these parameters.a
, b
, and so on,
is likely to be common. By reporting all articles on the page, the
BibP server nevertheless provides a credible response to the
ambiguous USIN.It is anticipated that BibP Level 2 will impose additional requirements on BibP servers, particularly in the areas of server-to-server interactions and acceptance of metadata submissions. BibP Level 3 is further planned to incorporate the capture and dissemination of the citing relationship (from citing works to cited works) as metadata, as a step towards the universal citation database [UCD]. BibP Level 1 server software should be designed to accomodate these evolving requirements.
This section presents and documents a JavaScript program for
client-side resolution of BibP. This program is intended to be
embedded in the HEAD
element of HTML documents to
implement client-side resolution with browsers that provide
JavaScript support. The program has been written to use only those
JavaScript features that are relatively standard and are expected
to remain so. The script is effective with Netscape Navigator
(versions 3 through 6), Internet Explorer (versions 4 through 5.5)
and Opera (version 4), although full bibhost
support
is not yet available in the latter.
The prefix BibP_
is used for all global functions
and variables of the resolver so that the resolver can be freely
mixed with other client-side JavaScript that respects this prefix.
The full script, in a relatively condensed form for ease of
cut-and-paste, is presented immediately below and documented in the
following subsections.
<script type="text/javascript"> <!-- // bibres.js version 1.1 // (c) Robert D. Cameron and Serban Tatu, November 2000. // GNU General Public License, Version 2 applies. var BibP_BaseURL; var BibP_nocitehost = typeof(BibP_citehost) == "undefined"; function BibP_SetBaseURL (server) { BibP_BaseURL = server + "bibp1.0/resolve?" + (BibP_nocitehost ? "usin=" : "citehost="+ BibP_citehost+ "&usin=")} BibP_SetBaseURL(BibP_nocitehost ? "http://usin.org/" :BibP_citehost); function BibP_onMouseOver () { window.status = "bibp:" + this.href.substring(BibP_BaseURL.length); return true} function BibP_onMouseOut () {window.status = ""; return true} function BibP_ProcessLink(L, srchKey) { var spot = L.href.indexOf(srchKey); if (spot != -1) { L.href = BibP_BaseURL + L.href.substring(spot + srchKey.length); L.onmouseover = BibP_onMouseOver; L.onmouseout = BibP_onMouseOut}} var BibP_Icon = new Image (); // To test for local bibhost icon. function BibP_onIcon () { if (BibP_Icon.height!=0) { var oldBase = BibP_BaseURL; BibP_SetBaseURL("http://bibhost/"); for (var i = 0; i < document.links.length; i++) BibP_ProcessLink(document.links[i], oldBase)}} function BibP_onLoad () { for (var i = 0; i < document.links.length; i++) BibP_ProcessLink(document.links[i], "bibp:") BibP_Icon.onload = BibP_onIcon; // Now test for bibhost. BibP_Icon.src = "http://bibhost/bibp1.0/bibpicon.jpg"} if (typeof(navigator.bibpSupport) == "undefined") { window.onload = BibP_onLoad} // --> </script>
The core strategy of the resolver is to define and use the
global variable BibP_BaseURL
as the common prefix for
resolution of BibP links. That is, given a link of the form
bibp:
USIN, the link translation of Section 3.6 is
performed by concatenation of BibP_BaseURL
and
USIN. The function BibP_SetBaseURL
constructs
the prefix given a BibP server as its input parameter and using the
global setting of BibP_citehost
as described in
Section 3.4.
The determination of the server to be used for
BibP_BaseURL
follows the server identification hierarchy of
Section 3.2. Initially, the value of BibP_BaseURL
is
set based on the document-specified BibP_citehost
if
it exists, or the global server usin.org
, otherwise.
However, if a test for the availability of a local
bibhost
subsequently proves successful,
BibP_BaseURL
will be adjusted to use bibhost
(BibP_onIcon
function).
The test for the availability of bibhost
uses the
image preloading feature of common web browsers to check the
required identification icon at
http://bibhost/bibp1.0/bibpicon.jpg
. After the document has
loaded, and links have been processed with the initial value of
BibP_BaseURL
, the assignment of the src
property of BibP_Icon
initiates the test for that icon
(BibP_onLoad
function). On a successful load event,
the BibP_onIcon
handler is called. If an icon of
nonzero height is reported, bibhost
is used to
establish BibP_BaseURL
. A zero height icon indicates
either that images were turned off in the browser (an empty icon is
trivially loaded without verifying the existence of
bibhost
) or an erroneous icon.
The function BibP_ProcessLink
is responsible for
translating BibP links to the appropriate URLs as well as for
arranging for correct display of the links in the browser status
bar on MouseOver
events. It is first used within the
BibP_onLoad
function to process links based on the
initial BibP_BaseURL
value (before
bibhost
testing). Subsequently, it may also be used within
the BibP_onIcon
function to update the translation if
bibhost
availability is confirmed.
The two-pass approach assures the availability of BibP link
service as soon as a document is loaded. Because the test for
bibhost
availability proceeds asynchronously with user
action, it is possible that a user may access a BibP link after
document loading but before bibhost
availability is
known. In this case, service from the citehost
or the
global server will be provided.
Link translations are effected within
BibP_ProcessLink
by changing the stored href
attribute associated with each BibP link. In the first pass, the
USIN is determined as the substring following the first occurrence
of the string "bibp:
" and the transformed link is
formed by appending this USIN to the value of
BibP_BaseURL
. The second pass, invoked if
bibhost
availability has been confirmed, performs a similar
transformation, replacing the initial BibP_BaseURL
prefix with the updated value.
A complication of href
modification is that the
link value displayed in the browser status bar on mouseover events
would normally be the actual stored value, not the original
bibp:
form. The BibP_onMouseOver
function
arranges to display the original form, while the
BibP_onMouseOut
function ensures that the status bar is
cleared when the mouse is moved off a BibP link.
It is anticipated that future versions of web browsers and other
user agents will provide direct BibP support. In this case, it will
likely be desirable to disable the JavaScript resolver. The
resolver provides for this with the test on the property
navigator.bibpSupport
. Any appropriately defined value for
this property will prevent BibP link processing that would
otherwise be initiated by the window.onload
event.
It is also anticipated that future developments of the
JavaScript resolver may expand the coverage of browser support and
perhaps add functionality. Updated resolvers should be available
through each local bibhost
of the BibP network. These
resolvers may be directly used in documents served from the domain.
For example, if bibhost.xxx.tld
has been implemented,
it is recommended that documents served from within the
xxx.tld
domain incorporate client-side resolution using the
following declaration in the HEAD element.
<script type="text/javascript" src="http://bibhost.xxx.tld/bibp1.0/bibres.js"> </script>This provides for automatic updating of client-side resolvers without document modification.
BibP Level 1 is intended to provide public, read-only access to information metapages offered by libraries and other bibliographic service providers. Because requests are implemented through translation to http, no security assumptions beyond those afforded by http should be made in offering the initial metapage. However, the metapage itself may offer authenticated access for licensed or otherwise protected materials through https or other mechanisms.
It is possible that spoofing of the bibhost
for a
particular domain could provide inaccurate bibliographic or metaservice
information. However, such an effect would be localized and should
be easy to address by the local domain administrator. A second
security concern is the security of the default global service at
usin.org
as well as the potential use of
www.bibhost.com
to capture global services. Both of these
domains have been registered by the first author of this document;
eventually they should be turned over to an appropriate
institutional authority.
A potential security concern is the substitution of a malicious
JavaScript applet in place of the JavaScript resolver under
bibp1.0/bibres.js
. Server administrators should ensure the
security of installed resolvers.
Bibliographic Protocol Level 1 provides a new layer of abstraction for web-based reference linking. In essence, the linking to a copy or service with respect to a referenced document is eliminated in favor of a link to the document itself. The link specifies what the cited document is, not how to access it.
Link resolution under BibP is based on an open-architecture model involving a network of library-based and publisher-based servers. A default global BibP service is also defined, but can be overridden by alternative global services as configured in BibP clients. Document-specified citehosts override global servers while library-specified bibhosts override citehosts and global servers. Users may also configure personal bibliographic servers to take precedence over all of these through the flexibility of the bibhost relative domain name mechanism.
A JavaScript based client-side resolver is incorporated into HTML documents to enable BibP with current web browsers. The protocol may also be implemented natively, without the use of JavaScript. Several such clients have been written, including a BibP-aware version of lynx. The JavaScript resolver allows the use of the protocol with a critical mass of existing web browsers, but is also designed for the graceful introduction of native BibP support over time.
BibP servers may be implemented using a variety of technologies. Trivial bibhost service may be implemented entirely using Apache rewrite rules. An initial prototype with full support for BibP Level 1 and several features of BibP Level 2 has been constructed using Java servlets [BibP-MSc]. Prototypes for local bibhost service have been implemented using PHP/YAZ to provide access to local library catalog information via Z39.50.
This report documents the current state of ongoing developments with bibliographic protocol. It is also intended to initiate an open process of further protocol development.