Copyright 1997, 1998, Robert D. Cameron.
The Universal Serial Item Name (USIN) scheme is proposed as a framework for a single global namespace of articles and other contributions published in organized serial collections. Requirements for USINs are analyzed with an emphasis on the use of USINs in scholarly communication. A uniform naming model is described based on the hierarchical naming of serial publications and the hierarchical numbering of serial items. A number of concrete design ideas for USIN syntax are presented. A USIN Global Registry and a USIN Global Database are proposed and analyzed in terms of specific architectural features that interact to meet the requirements of publishers, librarians and scholars. Applications of the USIN concept to literature research, document retrieval, bibliography preparation and addressing the "broken links" problem of the World-Wide Web are considered.
The Universal Serial Item Name (USIN) scheme is proposed as a framework for a single global namespace of articles and other contributions published in organized serial collections. Although the initial focus is scholarly literature published in journals, conference proceedings, technical reports and books, the scheme is intended to accomodate extensions to include other types of serialized contributions such as magazine articles, bills of a legislature, decisions of a court or minutes of university committee meetings. The USIN is intended as a vehicle for interoperability between various bibliographic citation applications, including finding citations (literature research), retrieving citations (from on-line sources, libraries or document delivery services), citation indexing, and citation formatting (bibliography preparation). The USIN is also intended as one possible mechanism for migrating the World-Wide Web away from dependence on Uniform Resource Locators (URLs) [4] to a system meeting the requirements for Uniform Resource Names (URNs) [23].
The USIN concept is related to the Serial Item and Contribution Identifier (SICI) [19], the Publisher Item Identifier (PII) [1], and the Digital Object Identifier (DOI) [11] schemes. However the USIN approach is primarily concerned with the task of document identification in human communication, particularly scholarly, technical and legal communication, whereas the other schemes are more concerned with document delivery, library processing and publisher perspectives. In particular, the USIN should use mnemonic coding and be reproducible by ordinarily literate people (authors, students, librarians, law clerks, and so on) without the need for specialized coding knowledge and check-sum algorithms. The USIN system is also intended for serialized material that is not or cannot be registered with an International Standard Serial Number (ISSN); both SICI and PII rely on ISSNs for serial item identification. Philosophically, the USIN concept is most closely related to the SICI scheme in that they each identify documents with their publication in a particular organized series. The PII and DOI schemes identify documents as items owned by publishers, with numbers possibly assignable in advance of publication and independent of publication numbering. Green and Bide [14] and Paskin [21] provide good overviews of the various current approaches to identification of published articles or other items.
Central to the USIN concept is the notion of publication in an organized serial collection. This is a generalization of the traditional notion of a serial publication. An organized serial collection is defined to be any series of items published with a specific publication numbering framework. A (volume, issue, page) numbering framework might be used for a particular journal. The framework may change over time (e.g., changes in the number of issues per volume), but the numbering for any particular item is set when it is published. Both explicit and implicit elements may be used in the numbering framework, so long as they are fixed at the time of publication. For example, numbering of articles may be by explicit (volume, issue, page) numbering, with a counting rule based on page layout to distinguish multiple articles on a page. The authority for number assignment is usually, but not always, the publisher. For example, ISBN numbering of books satisfies the USIN definition of publication numbering framework and so allows the USIN scheme to be applied to books as well as to conventional serials.
In application to scholarly writing and bibliography preparation, the USIN concept is envisioned to be used with bibliographic processing "plug-ins" to standard word-processing software. These plug-ins should be capable of resolving USIN references into appropriately formatted citations consistent with chosen style guide lines. USIN resolution may be achieved through locally-mounted databases coupled with World-Wide Web access as a backup. Authors could thus use USINs as citation tags for papers of interest, much as they use similar tags with BibTeX, ProCite, EndNote or other bibliographic formatting tools. However, with the USIN approach, authors will be spared the drudgery of creating their own bibliographic databases for use with these products, editors will be spared the task of correcting author errors in citation, and readers will be spared the difficulty of resolving errors in citations that authors and editors miss.
In application to literature databases, the USIN can serve as a standard notation to report the results of a search process. This could open up new opportunities for combining search results from distinct databases. For example, duplications could be filtered by USIN matching, or relevant items from one search might be fed back into a search on a different database. In fact, the USIN idea is intended to serve as the core data element in a scheme for universal citation databases: databases that link every document to the documents it cites and vice versa [6].
In application to the World-Wide Web, the USIN concept has considerable promise as a potential partial solution to the problem of "broken links" [5, 13]. In short, the URLs that are presently used for hypertext links on the World-Wide Web are based on "locations" that specify documents in terms of access protocols, port numbers, directory paths, and filenames. For various reasons, all of these attributes of document location are subject to change and web links frequently become broken as a result. Many proposals to resolve this problem through the creation of some form of Uniform Resource Name have been put forward, but none seem to have progressed beyond the experimental stage [8, 9].
In comparison to the URN approach, the USIN scheme concentrates on the somewhat smaller problem of establishing a universal naming scheme for publications in serialized collections only. One could imagine that USINs could be developed within the overall URN structure as one particular "namespace" [17]. On the other hand, there are several reasons why it may be best to focus on a specific solution for USINs instead of the general URN problem. First of all, it could be argued that the best focus for perpetual naming schemes is to concentrate on those items actually intended to be long-term contributions to the global knowledge archive. From this perspective, publication in an organized serial collection may be the best single indication of such an intent. Second, the act of assigning a document a number within a serial collection represents an important technical opportunity unavailable for general web resources; a specific event in the publication process to which naming scheme protocols can be tied. Third, focussing on the evolving global knowledge archive as a development from the present international network of libraries may suggest different approaches to identifying the "resolution service" for a USIN. For example, users could be allowed to choose their own resolution service from those offered by different local libraries, instead of being forced to accept a network-specified service. In the terminology of the Dexter Hypertext Reference Model [15], we can take advantage of the flexibilities afforded by resolution within the run-time layer to overcome difficulties in storage-layer resolution. For all these reasons, focussing on publications in organized serial collections may be both the right problem to solve and the one for which URN solutions are most feasible.
Applications of the USIN scheme to other areas such as legal citation and legal research are also envisaged. However, these are at present beyond the scope of this paper and are left as an area for future consideration.
This paper is intended as a discussion document to set the framework for development of the USIN concept. Overall, the goal is to propose the requirements that must be met by any USIN system, and to suggest some reasonably concrete design ideas that meet those requirements. Section II focuses on the requirements analysis with a particular emphasis on the concept of scholar-friendly naming. Sections III and IV focus on design concepts that satisfy the USIN requirements, broken down into two main tasks: globally unique naming of serial publications and hierarchical identification of serial items within a particular publication series. Section V then discusses requirements for important USIN support technologies. Section VI concludes the paper.
The goal of this section is to discuss the general requirements that any USIN system must meet, without making premature commitments to particular USIN design ideas. At the same time, the requirements are used to analyze some of the inadequacies of the existing identification standards, primarily SICI and ISSN. This serves both to help establish the need for a new identification scheme and to bring some concreteness to the discussion. The reader who prefers additional concreteness may wish to briefly look ahead to some example design ideas for journal article citation in Section IV.
It may seem obvious that a USIN scheme must meet the basic goal of unambiguous article identification: every article must be denotable and every USIN denoting an article must denote no other article. However, there are difficulties in achieving this goal and the goal is in fact not achieved by the existing SICI coding scheme. In essence, the SICI scheme is prone to failure in some rare cases involving articles appearing on the same page and having similarly abbreviated titles. To deal with the multiple article per page problem, SICI uses a "title code" of up to six characters, usually formed from the initial letters of title words. Different articles on a page can be usually distinguished by this title abbreviation. In principle, however, it is possible to have two or more articles with the same SICI title abbreviation and hence the same overall SICI code. Presumably this is one of the reasons for the 12 ambiguities reported within 4 million SICI strings stored in the Uncover database [22]. Another problem with SICI serial title abbreviation is that it requires human judgment when the title contains symbology; this is a further possible source of ambiguity.
In order to ensure that every article is denotable, a logical first step is to ensure that every serial is denotable. Unfortunately, the existing international standard in serial identification, the ISSN, has an insufficiently large denotation space. The ISSN system is based on an eight-digit identifier with seven working digits and a check digit. The upper limit on the number of serials that can be accommodated is therefore 10 million. When contemplating a universal designation scheme for serial items as fine-grained as the minutes of curriculum committee meetings of a particular university department, it should be become clear that the ISSN system as presently constituted will not suffice.
Although every USIN must denote at most one article, it is reasonable to allow different USINs to denote the same article. For example, issue numbers may be an optional part of the USIN syntax, required only when journals are paginated by issue. In the case that journals are paginated by volume, it could be desirable to allow either form (with or without issue numbers) as an acceptable USIN form. There are many other reasons that alternative forms of a USIN might be desirable and there is no particular reason to rule this option out in the initial requirement specification for USINs.
Nevertheless, of the set of USINs that may legally denote an article, exactly one of them should be specified as the canonical or preferred form. One use for canonical forms is to make it easy to determine whether two different USINs denote the same article: convert them both to canonical form and see if they are the same. For example, if a user searches two distinct databases for articles of interest on a particular topic and both databases return USINs in canonical form, then it is an easy matter to filter out duplicate references to the same article because they are represented by exactly the same string. A second important role for canonical forms is to support indexing of information by USIN. By always associating information with the canonical form of a USIN, it will be possible to retrieve that information given any legal USIN form by first converting to the canonical form.
A further requirement for USINs is that conversion to canonical form be an algorithmic process based on globally available information. In this way, separate software systems will be able to interoperate by conversion to the common canonical form. The requirement for globally available information is not particularly a restriction on the syntax of USINs, but is a constraint on the implementation of the overall USIN system and how the basic information on USINs and their formation on a serial-by-serial basis must be shared.
Although the primary focus of the USIN concept is on the identification of published articles, there are a number of other related elements worthy of identification at both coarser and finer granularities. On the coarser side, this includes identification of the serial itself, volumes or volume ranges of serials, an index to the volume, individual issues and issue ranges, contents of an issue or special sections of an issue. At a finer level of granularity, it may include named or numbered components of articles, such as article abstracts or individual sections, figures, tables, or equations. Scholars may sometimes want to make reference to these components; other applications include identifying library holdings on a volume/issue basis, checking in serial issues when they arrive at the library or submitting claims for them when they are late, and ordering table of contents pages for awareness services. The SICI scheme includes capabilities for designating some of these components through its code structure identifier (CSI) and derivative part identifier (DPI); the PII and DOI schemes do not appear to account for such components. ANSI Serials Holdings Statements, used to identify holdings in library catalogs, includes a variety of conventions for specifying volumes, issues, ranges of volumes and issues and similar units of collection [2].
It is not possible nor desirable to define a priori the specific set of secondary serial components that are identifiable in the USIN syntax. Instead the requirement presented here is that the USIN scheme should accommodate specification of these elements through an extensible syntax that can be coupled with a specification of what elements exist on a serial-by-serial basis.
A key requirement central to the entire focus of the USIN concept is that it emphasizes the needs of the people who use USINs over the needs of computers that process them. This encompasses many aspects that can be generally grouped under the term scholar-friendliness. However, this term is not intended to restrict the set of people whose requirements are considered. Instead, it reflects the notion that anyone who uses a USIN to cite prior works may be said to be taking on the role of a scholar in that act.
One might consider that there is a middle ground between accommodating the needs of scholars and the needs of computer systems. However, the goal of establishing USINs as names that will serve to denote published items over the long term should be considered. From this viewpoint, apparent requirements that might derive from the limitations of present-day computer systems (e.g., fixed-length fields, limited storage capacity, etc.) should be avoided. There is little doubt that the processing and storage capabilities of the computer systems that will be available in coming decades will be vastly superior to those of their present-day counterparts.
Nevertheless, scholar-friendliness cannot be considered an absolute requirement at the possible expense of unambiguous article identification, canonical forms or other requirements. Instead, scholar-friendliness should be considered as a desirable trait to be maximized subject to the constraints imposed by other requirements.
Scholars will often need to write down USINs of interest or type them into their computers. To minimize the tedium and the chance of error in these manual processes, USINs should be designed to include only that information necessary to clearly identify the cited work. Redundant forms that include additional information may be allowed but must not be required. For example, for a journal that is paginated by volume and that follows the convention of beginning each article on a new page, it is sufficient to specify the journal, volume number and initial page number to uniquely identify an article. In this case, a USIN specification must not require the inclusion of additional information such as issue number, date or complete page range.
One counterargument is that redundant information helps prevent errors, but one can in turn counter that this approach to error control is obsolescent and inferior. Historically, the requirement for redundant information at data-entry time is designed to allow error detection at some future processing step. This is the basis for three forms of redundancy in the existing SICI scheme for article identification: chronology (date of publication), title codes and check digits. However, these devices provide error detection without error correction. When an error is encountered, there may be a considerable delay (e.g., days in interlibrary loan applications) before the error can be corrected and processing resumed. Consider instead an interactive process supported by a global network. When a scholar enters a USIN, interactive software could immediately consult the global USIN database to verify its correctness and to allow any necessary corrections or resolutions of ambiguity. One existing model for this is the immediate feedback one receives when entering an incorrect URL on the World-Wide Web (Web). In this way, an interactive data entry process can both avoid the tedium of redundant data entry and support a process of immediate error correction as well as detection. Construction of such a global USIN system is probably feasible using the present-day technology of Internet-connected computers; if not, it will certainly become feasible within a small number of years.
A second requirement deriving from scholar-friendliness
is to emphasize the use of mnemonic forms for identifying serial
publications, and whenever possible, the standard mnemonic
forms that are actually used by the community of scholars
that use a particular serial.
For example, the journal ACM Transactions on
Programming Languages and Systems
published by the Association for Computing Machinery
is widely known by the acronym TOPLAS.
An acceptable mnemonic form
for identifying this serial might thus be
S.ACM/TOPLAS where S might
denote a global domain of scholarly societies and
ACM is a unique code for the Association
for Computing Machinery within that domain.
As a second example, a designation such as
CA.SFU.CMPT/TR might be acceptable as a globally
unique mnemonic code for the Technical Report series of the
School of Computing Science at Simon Fraser University.
This code is mnemonic and builds on several accepted and
standard abbrevations: CA as the ISO country code
for Canada, SFU as a unique institutional code
for Simon Fraser University in the CA domain
(cf. the Internet DNS name sfu.ca), CMPT
as the standard 4-letter department ID used by Simon Fraser
University for the School of Computing Science and TR
as the abbreviation for "Technical Report" as used by the School.
The syntax shown in these examples is intended to be illustrative
of a possible realization of this USIN requirement,
but not prescriptive.
Incorporation of existing standard mnemonic codes within USIN designations will assist scholars in a number of ways. USINs will be reported to scholars as the results of bibliographic search processes, scholars will enter USINs when doing citation searches, scholars will use USINs when including references in papers and scholars will make note of USINs when they find papers of interest. In all these applications, scholars will find mnemonic forms easier to read, easier to reproduce and generally more useful. However, note that these requirements are met if the mnemonic forms are acceptable as one of the alternative forms on USIN input and are produced during output by any USIN-generating software. More precisely, the requirement for using mnemonic forms applies to the definition of the canonical form of USINs, but does not preclude alternative non-mnemonic forms.
Adopting scholar-friendly mnemonic identification necessarily imposes a further limit on the role of ISSNs within the USIN scheme. Where a serial is unambiguously known by a mnemonic form, that form must be used as canonical in place of the ISSN. Nevertheless, ISSNs are likely to have an important role both in identifying serials for which no mnemonic abbreviation has been defined and for initially identifying serials before their mnemonic identifications have been registered and accepted as globally unique.
A further requirement deriving from the general principle of scholar-friendliness is that existing publication numbering conventions should be employed or adapted wherever possible to identify published articles within a particular serial. For example, articles in traditional print journals will typically be identified by volume number, issue number (if required) and page number, with the possible addition of a code to discriminate multiple articles on a single page. This will be of the greatest assistance to scholars when forming USINs from either copies of the article in question or from a citation of the article in a reference list. It will also be helpful to scholars in decoding USINs and retrieving the items from (physical or virtual) library shelves.
The requirement for the use of publication numbering rules out the article identification mechanisms contemplated by the PII and DOI schemes as a basis for canonical USINs. Both of those schemes emphasize publisher-generated numbers that may be different from the actual numbering on the published serial. This requirement also rules out other reasonable schemes for unambiguous article identification. For example, a scheme based on volume number and sequential article number would be widely applicable as an unambiguous numbering scheme for many journals. But scholars may be unable to easily determine the sequential article number from either a printed copy of the article or a conventional bibliographic citation. If publication numbering exists, it should be used.
One might argue that identification by publication numbering is less scholar-friendly than identification using more mnemonic article attributes, such as author name and key title words. However, this is an instance in which scholar-friendliness should not be considered an absolute at the expense of a system of unambiguous article identification.
One might also prefer to use publication chronology (e.g., dates, month-year combinations) instead of publication numbering. In fact, chronology is a form of numbering that happens also to be correlated with the passage of time. For some types of publication, chronology may be the only numbering that exists and hence must be used. In other cases, acceptable alternative USIN forms may be defined based on chronology. However, chronology is generally more complex and involves more identification pitfalls. For example, if (volume, page) identification generally suffices for article identification in a particular journal, it may be the case that (year, page) identification is inadequate for at least two reasons. First, the journal may publish multiple volumes per year. Second, even if volumes are annual, they may not correspond to calendar years; articles with the same starting page number in two consecutive volumes could still end up being published in the same year. In other cases, serial items may have duplicated and hence ambiguous chronology, for example, when two technical reports are issued on the same date. There are also a number of annoying coding problems for chronology. If numeric codes are used for months, how do you code for month combinations or seasons? If nonnumeric coding is used should it be in English or the original language and should abbreviations be used? For all these reasons of potential ambiguity and complexity, identification by simple publication numbering should be used in preference to chronology.
Because technical reports, government publications, court documents and journal papers have various different numbering schemes, alternative syntactic conventions for each type of publication will likely be necessary. In principle, each serial should be accompanied by a definition of its numbering scheme, including syntax and semantics of the USIN designations. However, in order to ease the burden on scholars, efforts should be made to limit the syntactic variations wherever possible. Thus, there should also be methods for defining standardized numbering schemes, with the goal that the vast majority of serials will use one of the standard schemes rather than one of their own design.
From the scholar's point of view, the primary role and need for USINs is in identification of articles. Identifying secondary serial components (volumes, issues, special sections, abstracts, etc.) is a secondary issue of considerably less importance. The requirement for scholar-friendliness then is that the syntax for article identification not be complicated by codes to distinguish articles from other types of component. Instead, where necessary, the syntax for secondary components should include additional coding to indicate that a secondary component is being identified; the absence of such coding should be taken to indicate an article identification.
It should be easy for scholars to construct and
analyze USINs manually. Checksums and other calculations should
be avoided. Appropriate punctuation should be used to
avoid running numeric items together.
For example, the code 20000229 used
as the SICI specification for February 29, 2000
violates this requirement.
Arcane numeric codes should also be avoided.
Although numeric month codes 1 through 12 are arguably
acceptable, the SICI code 23 meaning "Fall"
is not.
It is not uncommon to find a particular serial published in two or more formats, for example, in HTML format on the Web and on paper. From the scholar's viewpoint, it is usually the case that it is the content of the article, not the form of its presentation, that matters. When there is no difference in content, the USIN specification for articles should be fundamentally independent of publication medium. This requirement does not preclude media specification from inclusion as an optional element in a USIN syntax. However, the SICI convention of including the medium format identifier (MFI) as standard practice would not satisfy the requirement for USINs.
It may be the case that a publisher creates separate designations for different formats of a serial, particularly when there may be significant differences in content. In this case, the publication medium or format may be implicitly identified by the choice of publication series designation. However, this does not represent a violation of format independence of the USIN syntax itself.
Scholars will need to make use of USINs as notational elements in a variety of contexts, both formal and informal. Formal contexts include use of USINs as citation tags for bibliographic formatting software and data elements for bibliographic database queries. Informal contexts are generally oriented to the human reader, such as presentation of USINs in reference lists or direct use of USINs as nouns in sentences. In any of these contexts, there is a potential for confusion to be created by interaction of the syntax of the USIN with the notational conventions of its embedding.
The syntax of USINs should be designed to avoid
confusions that can be created by common notational
features that may be expected in typical embeddings.
In particular, both formal and informal settings may embed
USINs as notational elements within
structures delimited by parentheses, braces or
similar bracketting structures. To avoid confusion,
USIN syntax should be constrained to allow bracketting
symbols only if they occur in matched pairs.
For example, if a USIN X is to be acceptable as a parameter
in a BibTeX citation tag of the form \cite{X},
then any unmatched braces within X would surely
cause confusion. It may be worthwhile to avoid braces
altogether because of their use in the TeX family of
document languages and similarly to avoid angle brackets
("<" and ">") because of their use in HTML and SGML.
When USINs are used as elements in ordinary discourse, they may often occur at the end of a sentence or phrase. Punctuation (periods, commas, semicolons and so on) added at this point should not be a source of confusion. The presence or absence of whitespace (blanks, tabs or line breaks) after such a punctuation symbol may be used to discriminate. That is, a period, comma or other punctuation may be used within the USIN syntax only if it is immediately followed by a nonblank character. Any of these punctuation marks followed by whitespace should always denote the end of a sentence or phrase.
A necessary requirement for the USIN system is that USINs, once assigned and validated, remain permanently unambiguous identifiers of their documents. This applies to both canonical and noncanonical USINs. Three hundred years from now, a scholar may come across a USIN designation in an obsolete form of print media. She may highlight it with her data capture pen and expect to see instantly the resolution of it to a full bibliographic reference on her electronic work area. This requirement implies the need for a global registry system and a set of protocols for ensuring that USINs, once assigned, are never reused.
However, it need not be required that canonical USINs always remain canonical, at least in the initial development of the USIN system. Initially, the canonical USIN forms for many serials will include serial designation by ISSN. As globally unique mnemonic designations for these serials are gradually registered and accepted, those forms may become canonical. It may also be the case that changes in the canonical form of serial numbering become desirable, particularly for those aspects of numbering that are not directly reflected in publication numbering (for example, position of an article on a page).
It may be useful to impose constraints on how frequently canonical forms may be varied and/or on how results of USIN processing may be combined. For example, new canonical forms might be allowed to be registered at any time, but taking effect only at certain designated times. When such a time is reached, an updating process might (a) temporarily disallow new USIN processing requests, (b) allow current requests to complete or time out, (c) perform global updating of canonical form information, and (d) allow USIN processing requests to resume. Any application that needs to ensure the completeness of USIN matching could use the simple device of requiring that all USIN processing requests are initiated and completed in the same time frame.
Serials evolve. Changes in title, publisher or publication frequency are commonplace. Serials may merge together or split apart. Serials may suspend publication and then resume publication at a later date. Serial publishers are also subject to many kinds of change: renaming, relocation, reorganization and so on. There is no doubt that accommodation of change must be an important design goal for USIN development.
Two issues involving particular forms of change deserve special attention in the development of USIN syntax. The first is that title changes should not necessarily require changes in the USIN code for a serial. This is at odds with the ISSN convention, which requires new ISSNs to be issued when there is any significant change in title. However, in considering mnemonic abbreviations of serial titles, various changes in title may be accommodated with the same mnemonic. If the publisher and readers of a journal wish to retain a particular mnemonic by which the journal is known, the USIN system should respect this. The second issue is that the syntax for identifying components of a particular serial should be flexible and changeable. For example, if a serial starts out with sequentially numbered issues, its USIN syntax should nevertheless accommodate a later reorganization to number the publication by volume. Similarly, if a traditional print journal identifies articles by volume and page number, the USIN syntax should accommodate a later change to an electronic format in which articles are identified by volume and article number.
Articles evolve. Draft versions may be initially circulated in a working paper series, followed by revised versions in conference papers and further revised versions in journals. At various stages an author may circulate intermediate versions to limited groups for review and comment. Post-publication revision of journal articles is also becoming a possibility with novel e-journal policies such as those of Living Reviews in Relativity [24].
The USIN scheme generates distinct identifiers for each separately published version of an article. One possible view of this is that each of these identifiers is in fact an alternative identifier of the same article, with one of them (presumably the most recent) being the canonical form. However, this approach has several serious problems. The first is that there is no good basis for saying when two versions of an article should be treated as the same. How many insertions and/or deletions of text may be accomodated? What about changes in title or authorship? It is difficult to imagine any set of rules that could provide a satisfactory and implementable decision procedure. It is also difficult to imagine any mechanism that could ensure that publishers actually identify these equivalent versions so that the correct mappings to canonical form can be made automatically. Beyond these concerns, there is also a problem with such equivalences automatically being applied to citations: changes in the content of an article between versions may render a citation apparently irrelevant or incorrect. This should not be considered a failure on the part of the citing author. In essence, it is a misrepresentation to map the author's citation of a particular version to any other version than the author intended.
Philosophically, then, USINs are names for particular versions of articles, not names for the more abstract notion of an article that maintains its identity through various versions over time. Systems to support this more abstract notion, at least at the coarse-grain level of publication versioning, might well be built on top of a USIN system, using USINs to identify particular published versions of articles. Finer-grained versioning concepts, such as those of Augment/NLS [12] or Xanadu [20], might also make use of USINs to interoperate with conventional bibliographic databases.
The sharp reader may notice an apparent contradiction between the USIN requirements with respect to changes to serials and changes to articles. The USIN requirement for serial codes does represent the more abstract notion of a serial publication as it goes through various changes rather than the serial as it exists at a single point in time. However, this distinction between the treatment of serial and article identifications reflects a fundamental philosophical view. In this view, serials are like timelines and articles are like points on those lines. The timeline may go through the twists and turns of changes in publisher, title or numbering scheme and still retain its identity. Each point on each line is separate entity with a separate identity. There may be relationships between points such as "version-of" and "cites", but the separate identities of the points should be maintained in the USIN approach.
The Domain Name System (DNS) of the Internet is a successful model of a hierarchical, globally-unique naming system using distributed authority [18]. Under DNS, a number of global domains such as "edu" (educational institutions, primarily U.S.), "org" (organizations, primarily non-profit), "ca" (Canadian sites), have been established by common agreement. Each domain is managed by an independent domain authority. Each domain authority assigns unique identifiers within its domain to create subdomains and/or to specify particular computer systems. When a subdomain is created, authority for assigning further identifiers within the subdomain is often passed to a responsible organization. Subdomains may be further divided into subsubdomains and so on.
Consider a USIN scheme that adopts the hierarchical naming idea of DNS, but with a focus on naming serial publications and publishing organizations, not computer resources. The distinction between naming publications and naming computer resources is critical; the failure to make it may be one of the underlying problems of the URN concept. Notations such as the following may be contemplated:
S.ACM/TOPLAS as a designation for ACM Transactions
on Programming Languages and Systems published by the
Association for Computing Machinery within a global
domain for scholarly societies,
S.ACM.SIGPLAN/Notices for SIGPLAN Notices
of the ACM's Special Interest Group on Programming Languages,
CA.SFU.CMPT/TR for the Technical Report series of the
School of Computing Science of Simon Fraser University,
AU.NLA.ABN.SC/Papers for papers of the Standards Committee
of the Australian Bibliographic Network of the National Library
of Australia within a global
domain for Australia.
In the USIN scheme, then, serial publications are given
identifiers which must be unique in the context of a particular
publication domain.
Thus d1.d2.d3 is interpreted to specify a subdomain d3
within domain d1.d2, which is itself hierarchically specified
as a subdomain d2 within the global domain d1.
In general, domains will denote publishing organizations,
administrative divisions of such organizations or
collectives for identifying organizations or publications.
The USIN syntax shown in this paper is intended to be
illustrative rather than prescriptive of the final form of USINs.
Thus the choice of periods and slash marks as separators is somewhat
arbitrary. One could also argue that the distinction between
slash marks and periods is artificial, i.e., that S.ACM.TOPLAS
would do as well as S.ACM/TOPLAS. However, distinguished
punctuation allows us to infer directly from the form of a
specification that S.ACM/TOPLAS is a serial publication
of the ACM, while S.ACM.SIGPLAN is an administrative
division thereof. One could also question the decision to
reverse the right-to-left structuring of domains under DNS;
the reason for this is to use a consistent left-to-right
hierarchical structuring within all levels of the USIN notation.
Lastly, the final syntax of
domain, subdomain and series identifiers is left as an area for
further work.
However, allowance for case-sensitivity
in such identifiers seems reasonable, e.g., CaS and
CAS could denote separate items.
Prior to international agreements to develop a full domain
structure for USINs, it is nevertheless possible to
initialize the scheme
by building on existing global identification standards.
With the present focus on the problem for scholarly
literature taken in this paper, three initial USIN domains
can be identified:
ISSN, ISBN and RDNS.
The ISSN and ISBN domains directly use
the international standard numbering systems for serials
and books. For example, ISSN/0164-0925 is an initial USIN
designation for ACM TOPLAS.
Over time the notation S.ACM/TOPLAS
might be adopted as the canonical designation of this journal,
but ISSN/0164-0925 will always be acceptable.
Similarly ISBN is identified as a global domain based
on International Standard Book Numbers.
Names assigned under the Internet's Domain Name System
are the basis for the third leg of the initial
tripod supporting the USIN scheme.
Whenever a DNS domain name or host name
is clearly associated with a particular publishing organization,
it may be used
as a component of the RDNS (restricted DNS)
domain of the USIN scheme.
For example, acm.org is a DNS domain identified with the
Association for Computing Machinery, so
RDNS."acm.org"/TOPLAS denotes ACM TOPLAS.
Similarly, sfu.ca is a DNS domain for Simon Fraser University,
so RDNS."sfu.ca".CMPT/TR
denotes the Technical Report series of the School of Computing
Science at SFU.
In this last example, one might consider instead
basing the USIN specification on the cs.sfu.ca domain, that is,
RDNS."cs.sfu.ca"/TR.
This form might be allowed, but the form based on
the CMPT designation may be
preferred (canonical), because that designation has been specifically
chosen by SFU in a system of unambiguous codes for its departments.
The syntactic convention of enclosing a DNS name in double quotes when used as an RDNS domain serves two purposes. First, it emphasizes that the hierarchical structure of the DNS name plays no role in the interpretation of that name as an RDNS subdomain. In essence, DNS names are being cited as atomic identifiers for publishing organizations. Second, the quote marks delimit the scope of a DNS name, within which the "." separator is understood not as a part of the USIN syntax, but simply as a character in a quoted DNS name.
Unfortunately, there is no constraint within the DNS system that DNS domains are permanently unique designations of organizations or their successors. Under DNS, the essential requirement is that domains are unique at any particular point in time, but it is quite conceivable that a naming authority at some level may reuse or reassign a name. Furthermore, the association between DNS names and organizations breaks down as one descends into the hierarchy of subdomains, subsubdomains and so on. To avoid these problems, the USIN standardization process could include the publication of a list of acceptable DNS names and their associated organizations for use within the RDNS domain of the USIN scheme. These designations should be permanent; the interpretation of a designation within the RDNS domain should be derived from this list, even if that designation is later reassigned to some other purpose within DNS itself. The intention of the list should be to identify all and only those DNS domains that may be clearly identified with publishing organizations.
The astute reader will note that designations such as
RDNS."acm.org"/TOPLAS and
RDNS."sfu.ca".CMPT/TR seem unnecessarily awkward compared to
the earlier examples S.ACM/TOPLAS and CA.SFU.CMPT/TR.
We should hope that forms such as the latter
ultimately become canonical under the USIN system.
One might ask, then, why not just skip the RDNS prefix,
reverse the order of DNS domain names and use those reversed names
directly at the top-level of the
USIN hierarchy in the initial instance? The answer is that
the top-level domain structure of the USIN system
should not be prematurely constrained.
Once established for a particular use, USIN designations are intended to
be reserved permanently for that use.
The RDNS prefix allows existing DNS names to be used as
a way of initializing the USIN system, giving time for
an orderly process of developing an internationally-acceptable
top-level domain structure.
Within the RDNS domain for a particular publishing
organization, the identification of administrative divisions
and publication series should use codes specified by
that organization. In many cases, clear coding schemes are
already in place now. In the important case of universities,
a system of unambiguous mnemonic codes for the
academic departments is typically available in the university
calendar. Codes to denote a publication series of a university
department (e.g., TR for Technical Report,
TN for Technical Note and so on) are often included
on publication lists produced by the department or may
be found on the documents themselves.
Wherever possible, the use
of existing naming schemes should be accommodated in this way,
in order to maximize
the scholar-friendliness of USIN designations.
Occasionally, one finds a DNS domain that directly
corresponds to a particular serial publication.
For example, the electronic journal First Monday
has an associated DNS domain firstmonday.dk.
In this case, the DNS name can be used as a serial publication
name directly within RDNS.
Assuming then that the internet domain for
First Monday is registered on the list of
acceptable RDNS domains, it has
the USIN RDNS/"firstmonday.dk".
In order to ensure the robustness and permanence of USIN designations, one should expect that certain adaptations and accommodations of historical naming schemes will be required. Thus, the USIN system must include a method for describing naming schemes and rules for maintaining consistency. In order to make the greatest use of historical naming schemes, the rules should be designed to accommodate a great deal of variability. Nevertheless, some modifications of historical naming schemes should be expected in order to comply with USIN requirements.
The three initial domains ISSN, ISBN and RDNS provide a plausible initial basis for unified, permanent and globally-unique designations of archivable serial, book and institutional publications. There are undoubtedly many cases in which the coding of USIN specifications will initially be unclear, especially in the case of institutional publications. However, it is certainly a common practice for the serial publications of an institution to be identified using a numbering scheme that serves to unambiguously denote those publications in the local context of an institution. It is certainly also the case that the vast majority of publishing institutions in the industrialized world can now be identified by an appropriate DNS domain. These conditions suggest that it is presently feasible to initiate a USIN system.
Although the ISSN, ISBN and RDNS domains may serve to initialize a USIN system, they will not generally provide a satisfactory basis for the scholar-friendly canonical designations that meet USIN Requirement #4.2. The development of an internationally acceptable domain structure is beyond the scope of this paper. However, to stimulate discussion along these lines, the References section of this paper includes, for each of the cited references, the discussion of possible initial USIN designations and forms that may evolve over time.
This section focusses on the problem of identifying articles and other components within the context of a particular serial. For concreteness, the first subsection starts with a proposed USIN syntax for citing journal articles. Following this, a general model for serial item identification by hierarchical numbering of items within a series is presented. The final subsection returns to the exploration of some additional design ideas for USIN syntax.
The following examples illustrate a proposed syntax for citation of traditional (print) journal articles.
S.ACM/TOPLAS:16@1811
S.ACM does become the code for
the Association for Computing Machinery in the global domain for
scholarly societies, this is the canonical USIN in the proposed syntax for
the article "A Behavioral Notion of Subtyping" by Barbara H. Liskov
and Jeannette M. Wing appearing in ACM Transactions
on Programming Languages and Systems,
volume 16, number 6, (November 1994),
pages 1811-1841.
S.ACM/TOPLAS:16(6)@1811
S.ACM.SIGPLAN/Notices:32(1)@66
It is possible to contemplate a generic syntax for
the numbering of serial items, avoiding specialized syntax for
each type of item.
For example, the conventions of the Web's
Universal Resource Identifiers [3]
might be adopted to use the "/" punctuation for separation
of all elements within the hierarchical numbering
of a serial item. The designation of the TOPLAS
example might become S.ACM/TOPLAS/16/6/1811.
Unfortunately, there are a number of disadvantages
to a generic syntax for hierarchical numbering.
First, with respect to journal numbering, optional issue numbers are
not easily accommodated.
For example, how does one reconcile S.ACM/TOPLAS/16/1811 as an
article denotation with S.ACM/TOPLAS/16/6 as an issue
denotation?
Second, the mnemonic value of associating specific
symbols (e.g., "@") with specific concepts
(e.g., "at page number") is lost.
Finally, there may be syntactic conflicts between the
universal syntax and existing syntaxes for publisher's numbering schemes.
For example, the "/" separator
for URI syntax conflicts with the combined-issue designations
such as 3/4 that are frequently used by journals such as
The Serials Librarian.
For these reasons, it seems preferable to
avoid specifying a generic universal
syntax for serial numbering and instead allow series-dependent
syntax. Nevertheless, the number of alternative
syntactic schemes should be kept fairly limited to avoid cognitive
burdens for the scholar.
Occasionally, one may find journals with more than one article starting
on a particular page. For example, these might be items of
technical correspondence.
One solution to this problem of starting page ambiguity
is to use sequential denotations with lower case letters.
For example,
S.ACM/CACM:38(1)@43a
and S.ACM/CACM:38(1)@43b could respectively
denote the two short articles "Women and Computing in the UK"
by Alison Adam and "Announcing a New Resource: The WCAR List"
by Laura L. Downey, both appearing on page 43 of
Communications of the ACM, volume 38, number 1 (January 1995).
There are three small problems with this scheme
that may be quite rare but are theoretically possible and
should be addressed.
The first is that there may potentially be more than 26 articles
on a page. However, the scheme easily extends so that
designations such as aa for the 27th article
and aaa
for the 677th article may be used.
Second, there may be an ambiguity in determining the ordering
of articles; pages are two-dimensional while orderings are one-dimensional.
The most scholar-friendly way to resolve this is to follow the
natural text ordering. For publications in English and similar
languages, this is column-major numbering:
articles in column 1 always precede articles in column 2 and so on, while
articles within columns are numbered top to bottom.
Finally, note that page numbers themselves might in
some cases include lower case letters. An example is
preface material in a journal volume numbered using lower
case roman numberals. To handle this case, the USIN scheme
might specify that the underscore ("_") character
can be used as a separator.
In practice, scholars will not want to learn the details of how to distinguish multiple articles on a page until it becomes a problem. They may not even be aware of the problem if they are entering a citation from its written form in a reference list. In such a case, the user will likely omit the required lower case code when entering the citation. Interactive USIN processing software should notify the user of the ambiguity and query him or her for its resolution. Batch-oriented software could return the set of all articles on the page and issue a warning report through an appropriate message or log file.
When a journal is not printed on pages, one might expect that article identification by page number is no longer appropriate. Although many electronic journals have in fact retained page-oriented formatting and numbering, many others have chosen not to do so. In particular, there is a growing trend to use the logical document markup capabilities of SGML [7] and HTML in electronic journals. One advantage is that formatting may be left to the reader's software; articles can be viewed and printed in a variety of different formats (with a variety of different paginations) depending on hardware capability and reader preference. In view of this, it seems reasonable to expect that the trend towards unpaginated e-journals will continue.
Consider a variation on the standard USIN journal syntax
that accommodates unpaginated e-journals by
replacing the @page syntax
with $article-number.
(An earlier version of this paper used the more mnemonic #
to denote article numbers, but the $ is easier
to use when USINs may be encoded as URLs.)
Some e-journals have explicit article numbering by volume,
for example, the Chicago Journal of Theoretical Computer Science.
Supposing that S.MITP/CJTCS identifies this
journal, S.MITP/CJTCS:1995$3 then denotes
article 3 in volume 1995, entitled "Rabin Measures"
by Nils Klarlund and Dexter Kozen.
In other cases, articles may be numbered within issues.
Thus ISSN/1201-2459:2(3)$4 would denote
the article "Reflections on Milton and Ariosto"
by Roy Flannagan, published as article 4 in Early
Modern Literary Studies (ISSN 1201-2459), volume 2, number 3.
When no explicit numbering is provided, article numbers should be determined by issue, if possible, or by volume, otherwise. In general, scholars will determine article numbers by counting through the table of contents. In some cases, this may be a source of ambiguity; if the table of contents includes regular articles, short notes, corrigenda, submission instructions and/or other items, scholars may have difficulty determining what to count and what to omit. With the expected availability of on-line USIN databases, however, a scholar may simply query the database to verify or determine the correct USINs for articles published in a particular issue or volume.
The scheme just illustrated for journal citation is an example of a general concept for serial item identification: the use of a hierarchical numbering system. Abstractly, serial items are identified in the context of their serials by specifying hierarchical numbering tuples. For example, (volume, page) 2-tuples serve to identify articles in some print journals, while (volume, issue, page, item-count) 4-tuples may be required for magazines. In some cases, the hierarchy may be quite deep; items in a particular newspaper may be identified by a 7-level numbering (volume, issue, edition, section, page, column, item-count). In general, this is the essence of serial identification: although the particular scheme employed may vary from serial to serial, every item within every serial may be abstractly identified by some form of hierarchical numbering tuple.
It is interesting to note that a hierarchical enumeration system ("tumbler addressing") was also used as the basis of universal document identification in the proposals for the Xanadu Docuverse [20]. However, those identifications were based on a server/user/document/version/content hierarchy rather than the pure publication numbering hierarchy considered here. In essence, the Xanadu address system attempted to develop a new numbering system to apply to all documents, whereas the USIN approach is to make characterize and use existing publication numbering hierarchies within a common framework.
One defining characteristic of the USIN hierarchical numbering model is that every counter within every numbering tuple has a scope that defines the context of its numbering. Issues of a journal are typically numbered from 1 within each volume; they are said to have volume scope. Page numbers may have volume scope or issue scope, depending on the particular serial. An "item-count" for distinguishing multiple articles per page has page scope. The first, or principal, numbering component of a serial is said to have global scope; it is numbered consecutively in perpetuity.
Numbering scope is correlated with, but not synonymous with, hierarchical level. For example, volume scope for page numbers is often used even when volumes are divided into issues. Similarly, although issues are usually given volume scope when volumes exist, they may sometimes be given global scope.
Another important aspect of the model is the use of scope-dependent numbering. In general, this reflects the fact that some properties of a counter at a particular level may depend on the actual values of counters at superior scope levels. Some of the scope dependencies may be relatively minor. For example, a quarterly journal that changes to a bimonthly journal starting with volume 23 exhibits a scope-dependency: issues are number 1 through 4 for volumes 1 through 22, and are numbered 1 through 6 thereafter. Scope-dependency may even affect the need for a particular counter in serial item identification. For example, the item-counter for multiple articles per page is not needed for those pages that have only one article starting on a page. Scope-dependencies may even affect the entire numbering system. For example, a print journal may switch to electronic publication at some point with a corresponding switch from a (volume, issue, page) numbering scheme to a (volume, article-number) scheme.
In general, the numbering scheme for every serial has
a syntactic representation that may be generated
by mapping rules from the abstract representation as a
hierarchical numbering tuple. In the suggested standard journal
article syntax, the (volume, page, item-number)
tuple of (12, 135, 2) maps to the syntactic representation
12@135b.
In general, each number in a hierarchical numbering tuple is first
mapped to a numeral in some encoding system, such
as arabic numerals, roman numerals or "alphabetic numerals"
(a, b, c, ..., aa, ab, ...). Then a syntactic string
for the entire structure may be constructed by
concatenation with appropriate mnemonic operator symbols
as punctuation.
An essential goal of this process is that the syntactic
encoding be uniquely decodable.
Operator symbols must be carefully chosen both to have
mnemonic value and to ensure unambiguous interpretation
of the syntactic forms.
In principle, the order of appearance of numbering elements
may also be considered a design choice, but
for simplicity and to avoid confusion it may be desirable
to enforce a strict left-to-right ordering
of elements according to the numbering hierarchy.
A fourth aspect of the hierarchical numbering model is
that a serial may have parallel numbering hierarchies
for different purposes. In general, these hierarchies have
a common numbering prefix consisting of
one or more of their uppermost numbering levels,
with divergence of numbering below these level(s). The simplest
example is that of the article-identification and issue-identification
hierarchies of journals that are paginated with volume scope.
In this case, the (volume, page) and (volume, issue) hierarchies
may be considered parallel. In general, syntactic devices are
necessary to distinguish which hierachy is intended
in any particular coding; the (volume, page) and (volume, issue) hierarchies
are distinguished by the @ and () syntax
notations given previously.
Other examples of parallel numbering are given in the later subsection on
secondary component notation.
Finally, chronology is the fifth general property associated with the hierarchical numbering model for serials. Chronology is the association of a date and/or time of publication with a particular serial numbering component. In general, chronology is a fundamental aspect of serial publication and should be defined for all hierarchical numbering components down to some level at which all further structure is considered simultaneously published. For example, traditional print journals have chronology specified to the issue level, while electronic journals may have chronology specified to the article level. In general, chronology is scope-dependent; for example, when a quarterly journal changes to a monthly one, the chronology associated with issue 3 in each volume may change from "Fall" to "March". Chronology may also be irregular and possibly out-of-sequence, that is, with publication numbers assigned out of order of actual publication dates. Chronology itself is also an instance of hierarchical numbering, for example, using (year, month, day) 3-tuples or (year, season) 2-tuples.
One direction for further development is to consider formalization of the model to become a theory of hierarchical numbering. Such a theory would have as its purpose the establishment of certain important properties, such as ensuring that every published item is denotable by a hierarchical numbering tuple, every tuple has a syntactic representation and every syntactic representation is unambiguously decodable. In particular, careful attention should be given to the formulation of arithmetic operations to avoid problems such as the "paradoxes of tumbler arithmetic" in the Xanadu scheme [20]. The theory should also account for the particular properties of hierarchical chronological numbering. In this regard, the theory should be informed by the extensive work of Dershowitz and Reingold in developing the mathematics of many of the world's important calendar systems [10].
The following subsections present a number of additional design ideas for the identification of serial items by hierarchical numbering. Although many of the ideas are illustrated using examples related to journals, they are intended to apply to other types of serial as well.
Beyond article identification, the next most important application
area for USINs may be in the
description of library holdings or document delivery service coverage.
A single volume or issue of a journal is simple to identify
by including numbering only to the desired level.
For example, S.ACM/TOPLAS:16 denotes volume 16
of TOPLAS, while S.ACM/TOPLAS:16(6) denotes
issue 6 thereof.
But holdings are more often described as volume ranges.
In cases where issues are missing, subscriptions are
cancelled and then reinstated, or miscellaneous holdings have
been received by donation, the holdings may be broken up
into a lists of individually held items or ranges.
To accommodate these requirements, it seems reasonable to
reserve the comma (",") to separate
elements of a holdings list and the double hyphen "--" to
serve as a range operator.
Consider a holdings pattern for ACM TOPLAS consisting of volumes 2 through 12 and 16 forward, except for the missing issues 2 and 4 of volume 10. The following USIN holdings specification could be descriptive.
S.ACM/TOPLAS:2--10(1),10(3),11--12,16--ffHere, the serial code is specified only once. Commas separate individually held items or ranges. The start and end of a range are indicated by enumeration to the required level of specificity. An end range of "
ff" indicates a continuing subscription.
As a syntactic constraint to aid in error detection, holdings
should be listed in strictly ascending order.
Only positive holdings data is shown, following the principle adopted by ANSI Serials Holding Statements [2]. Determination of missing items can be made by reference to either the USIN global database or an appropriate serial "definition" (see the subsection on Serials Definition Language in the following section). For example, using the knowledge that TOPLAS was quarterly during volume 10 tells us that 10(2) and 10(4) are missing for these holdings while 10(5) is not (because it does not exist).
The conventions for serials holdings are intended to apply to serials with any form of hierarchical numbering and to any level of specifity. One implication is that the syntax of USINs generally must be structured to avoid conflicts with the "," and "--" symbols of the holdings notation. Another implication is that coverage can be specified to a finer level of detail. For example, a document delivery service may wish to identify "scanned holdings" to the article level, that is the articles that have already been scanned or digitized and are hence available for short-turnaround delivery.
Secondary component notation is a proposed means of specifying abstracts of articles, tables of contents of issues, indexes of volumes and other secondary components of serials or their articles. In general, secondary component notation is introduced by a USIN for the relevant article, issue, volume or other component, followed by a vertical bar and a component specification. The component specification is typically a standardized mnemonic for the component, possibly followed by a parenthesized enumeration. The following examples are illustrative.
S.ACM:TOPLAS:16|index
S.ACM:TOPLAS:16(6)).
S.ACM:TOPLAS:16(6)|contents
S.ACM:TOPLAS:16@1811|abstract
S.ACM:TOPLAS:16@1811|sec(4.1)
S.ACM:TOPLAS:16@1811|fig(3)
It is anticipated that a standard set of mnemonics for standard kinds of components would be globally defined (index, abstract, section, figure, table, equation and so on) while others may be defined for individual publications. However, scope dependencies and numbering syntax for enumerated components will typically be defined on a serial-by-serial basis.
One may question the need for fine-grained identification of article components. Indeed it is reasonable to consider deployment of an initial USIN system that focusses on article identification. Nevertheless, for a scheme that is designed to serve for article identification and related purposes in perpetuity, it would seem foolhardy not to allow the extension of the scheme using a notation such as the secondary component notation presented here.
The reference notation is a particular application of
the secondary component notation that would
allow designation of an article or other contribution
by indirect reference.
For example, S.ACM/TOPLAS:16@1811|ref(17) denotes
reference 17 of the article starting on page 1811 of volume
16 of TOPLAS. As it happens, this reference is to
an article entitled "A semantic database
model," by Hammer and McLeod appearing in
ACM Transactions on Database Systems, 6(3), pp. 351-386.
Assuming that the appropriate citation database exists,
the indirect reference in this case could map
to the canonical form S.ACM/TODS:6@351.
One use of the reference notation is to guarantee that you can quickly generate an acceptable USIN for every reference in an article, providing that you can generate a USIN for the article itself. During creation of citation databases, it may be desirable to produce a full set of USINs for the reference lists of articles in a fairly expeditious fashion. If the resolution of some references to their direct USIN form is proving problematic, they may be left in indirect form during initial data entry. At a later time, the resolutions of indirect references may be entered either manually or by acquisition of an independently developed citation set for the same article.
Another use of the reference notation is to serve as a unique canonical form for personal communications, unpublished works and other otherwise undenotable items. In this way, there would be no need to create a classification or coding scheme for such references. Furthermore, each such item would be automatically given a permanent and unique code. For example, if two authors each write articles citing "Famous Person, personal communication", those citations would be given distinct canonical identifiers. This would prevent false positives when doing coreference searches (finding papers that have 2 or more references in common).
The reference notation is best supported by article styles
with an explicitly numbered reference list at the back.
If a reference list exists, but is not numbered,
reference numbers may be determined by counting.
Alternatively, if references are cited by symbolic tags,
as in this paper, a possible design choice is to use
the symbolic code itself in the reference notation.
For example, the citation of the SICI standard referenced
in an earlier version of this paper might be given the indirect reference
RDNS."sfu.ca".CMPT/TR:97-16|ref(SICI).
Another style may use numbered endnotes, with
the possibility of more than one reference per note.
In this case, enumeration with endnote number may
use lower case letters; |ref(3c) would
denote the third item cited in endnote 3 of a particular
article.
In general, each serial may define its own reference
numbering conventions, but it is highly desirable
that one of the standard forms be chosen.
In some cases it may be desirable to break a long USIN
over multiple lines. This can be accommodated by the
following hyphenation convention.
A line break may be inserted after any hyphen appearing in
a USIN, without changing its meaning. Furthermore,
any nonhyphenated USIN operator can be converted into a
hyphenated equivalent of that operator by adding a
hyphen to the end. Thus, the hyphenated equivalents
of "." and "/" and "--"
are respectively
".-" and "/-" and "--" (no change).
The following examples illustrate this convention in use.
RDNS."sfu.ca".CMPT/- TR:97-16|ref(SICI) S.ACM/TOPLAS:2--15(1),- 15(3),15(5)--17,20--ff S.ACM/TOPLAS:2--15(1),15(3),15(5)-- 17,20--ff RDNS."sfu.ca".CMPT/-TR:97-16|ref(SICI)The last example illustrates that a newline character is not strictly required after a hyphenated operator. This accommodates reformatting operations that might eliminate an inserted newline character but leave a vestigial hyphen in place. Conversion to canonical form eliminates any hyphenated operators and embedded newlines. USIN processing software should fully recognize the hyphenation convention in the event that a multi-line USIN is entered using a cut-and-paste operation.
This section considers two important models of support technology for a USIN scheme: a USIN Global Registry and a USIN Global Database System. The USIN Global Registry is proposed as a system of institutions and technologies designed to preserve the knowledge of assigned USINs and their denotations for posterity and to support publishers and librarians in the assignment of new USINs for new and/or unassigned works. As differentiated from the Registry, a USIN Global Database System is not intended for USIN updating, but is instead intended to support the day-to-day needs of scholars for access to USIN information. This distinction is conceptually valuable in organizing requirements for the separate purposes of USIN registration and USIN-based information retrieval. It might ultimately be the case that the registry and database components are implemented in a single system, however.
In discussing these technologies, the goal is to present a vision of how USINs may be generated, verified and used in the day-to-day work of publishers, librarians and scholars. At this point in the development of the USIN concept, the focus should be more on the analysis of overall system requirements than on the implementation details of underlying mechanisms. Nevertheless, a number of design ideas are included to help give a more concrete picture of the possible operation of an integrated global USIN system.
Consider a design for the USIN Global Registry based on four principal components. These are:
Fundamental to the USIN concept is the use of serial designations and numbering schemes for identification of articles and other serial components. In order to formally specify these schemes, consider the creation of a Serials Definition Language (SDL). Each SDL specification would define one serial, establishing its basic identity and publication scheme. In particular, this would include formal specification of the hierarchical numbering scheme of the serial including its abstract structure, scope-dependencies, chronology, and syntactic identification schemes for articles and other serial components. It would also include the specification of the canonical and allowable alternative forms for USIN designations.
In addition to its formal role in the USIN scheme, SDL should also be designed to serve a variety of related purposes. From a serials check-in and claiming perspective, the enumeration and chronology specifications of an SDL definition should also have predictive value as contemplated, for example, by the serial pattern scheme of McNellis [16]. The SDL definition of a serial should also provide a basis for evaluating and interpreting USIN holdings specifications and possibly converting them to MARC Holdings Format. Similarly, from a bibliographic database perspective, it should be possible to verify the enumeration and chronology recorded in a database entry against that specified in an SDL definition. It should also be possible to determine the comprehensiveness of database coverage: are there any issues or articles published that are not in the database, or is the database complete?
The requirements above relate to a fairly narrow definition of serials, namely, in terms of the logical schemes for enumeration, chronology and serial item identification. It is possible to define a language (say, SECIL) that would be limitied to these requirements. Such a narrow approach would serve to support a USIN system, but it seems reasonable to consider serial definition from a broader perspective while the opportunity exists. In particular, the definition of a serial logically includes not only its numbering scheme, but also the title, publisher and publication format. Incorporation of such elements into the language would seem necessary to merit the term "serials definition language." Beyond this, one might wish to include additional information, notably classification and indexing information. This reflects a cataloguing perspective and suggests that a nomenclature of SCL (serials cataloguing language) might be appropriate. However, from the viewpoint of designing good modular systems, the SDL approach is arguable preferable, because it focusses on information deriving directly from its publication and relevant to the essence of what the serial is. Cataloguing information is essentially third-party information that may derive from a variety of sources and should be kept separate; it is information about the serial, not information defining it. Detailed exploration of these issues is an area for further work.
When USIN-based bibliographic databases are in widespread use, publishers will find that the sooner an article is assigned a USIN, the sooner it is advertised to large communities of scholars. The USIN Publication Protocol (UPP) is therefore proposed to allow publishers to assign each article a USIN during the publication process, thereby updating the USIN databases automatically.
A major requirement for UPP is to ensure the integrity of assigned USINs from the standpoint of global uniqueness and consistency with the current SDL definitions of serials in question. One approach to this is to maintain within the USIN Global Registry a current publication state for each serial and to define acceptable UPP actions in terms of this state. In essence, the publication state identifies the last issued USIN for the serial, plus a specification of which numbering levels in the hierarchical numbering scheme are currently open. This gives a basis for predicting the counter and date values for upcoming UPP requests.
For example, consider the publication state that might
exist after registering the
article "Collecting Interpretations of Expressions"
by Paul Hudak and Jonathon Young appearing in
ACM TOPLAS, Volume 13, Number 2, April 1991,
pages 269-290 with the USIN S.ACM/TOPLAS:13@269.
The state may include volume and issue counters that are currently
open with values
13 and 2, respectively. A page counter may be closed
at page 290 (nothing more will appear on page 290).
At this point, there may be two legal UPP actions: add another
article in this issue or close it.
As it happens, there is one more article in the issue.
Based on the current publication state, an expectation may be
generated
that the next article will have USIN S.ACM/TOPLAS:13@291.
If the publisher indeed submits that USIN with the
next UPP request, it can be accepted, otherwise an error
can be reported.
After a "close issue" request has been made, the SDL
definition and publication state can be used to predict the next
publication action and expected date.
In the example, this is an "open new issue" request for issue 3 of
volume 13, July 1991. These may be verified when the
actual request is made.
When issue 4 of this volume is closed, the SDL definition should
tell us that there are no more expected issues in this
volume. The expected sequence of following UPP requests is then
a "close volume" request, followed by an
"open volume" request for volume 14, 1992, an
"open issue" request for issue 1 in January 1992 and an
article publication request with USIN
S.ACM/TOPLAS:14@1. Each of these
expectations may be in turn verified against the actual UPP requests
made.
Of course, mechanisms will be required to deal with various kinds of exceptions to the predicted publication pattern. For example, when a particular issue is expected, one may instead see a combined issue (with combined enumeration) instead. Alternatively, an issue may be skipped altogether, or a special issue may be inserted into the publication stream between two regular issues. Publication numbering may also be out of order with respect to date of publication. For example, in a technical report series, it is not uncommon for numbers to be assigned in advance of publication, with variable delays between the assignment of a number and actual publication. An apparent publication exception may also be the first indication of an actual change in publication pattern. In this case, the SDL definition should be corrected to reflect the updated publication pattern and reregistered with SRP, described below.
Serials Registration Protocol is the proposed service for registering a serial code and its accompanying SDL definition and tracking changes thereto over time. This includes registering changes in publication numbering or chronology, changes in publisher or publication domain, addition of alternative USIN codings, changes to the canonical USIN form and/or deactivations and reactivations. In general, SRP requests would be made with respect to a particular publication-domain/serial-code combination.
Perhaps the most critical function under SRP is the creation of a new serial code within an existing publication domain. The code may be the initial code for a new or previously unregistered serial publication or it may be an alternative code for an existing publication. In either event, creation of a serial code should always be considered with care, because it creates, in the context of the given publication domain, a permanent USIN binding between that code and the serial in question. From this perspective, it is worth considering appropriate verification actions for creation of a new serial code. Of course, verification that the code is previously unassigned is an automatic function that should be implemented by the appropriate query to the USIN Global Registry. Beyond this, there should also be some manual verification to ensure that the code assignment is reasonably consistent with the USIN concept. One option is to use national serial registration centres analogous to those of the current international ISSN network. However, such a system is likely to be too cumbersome for the management of publications at the fine-grained level of, say, minutes of committee meetings of particular university departments. It also does not account for an institutional role in approving the serial codes chosen by administrative divisions within the institution.
An alternative for verifying serial code assignments that overcomes these problems is the following. SRP requests for new serial code creation must be approved by a USIN-certified cataloguing librarian. Certifications are awarded by an appropriate international standards body. Each authority for a publication domain may designate a certified librarian for that domain. When an SRP request to create a new serial code is issued, it is handled by the librarian registered for that domain, if such a librarian exists. Otherwise, verification of the creation request is attempted in the immediately superior publication domain, and so on. For example, a university may designate a single USIN-certified librarian to handle all institutional requests for new serial codes. Regardless of how deeply structured the administrative hierarchy within the university is, all serial code creation requests within the university are passed up the domain hierarchy to be handled by this individual.
The second major function of the SRP protocol is to register the publication pattern of a serial and changes to that pattern as required from time to time. As described above, these publication patterns are specified as part of the serial's SDL definition. UPP can be used to check the consistency of the publication patterns against future publication attempts. That is, each time a USIN is specified in a future UPP request, it serves to check that the SDL definition is correctly predicting the actual publication numbering and chronology.
Whenever the publication pattern of a serial is changed, the SDL definition must be modified to account for both future and past publications. The checking of future publications is done by UPP. SRP is responsible for checking that the revised SDL definition correctly accounts for the USINs assigned to past publications. This checking may be done by formally re-evaluating the revised definition against the entire history of actual publication as recorded in the global registry. The checking should satisfy two conditions: (1) every USIN previously registered should be accounted for by the new SDL definition, and (2) the new SDL definition should not "predict" any past publication that does not, in fact, exist. Exhaustive checking or a provably equivalent alternative method should be used. That is, a reduced form of checking that puts at risk the consistency of the USIN system should not be justified on the basis of minor concerns of computer processing efficiency.
The third major function of SRP is to register canonical and alternative forms of USIN for a serial. When a serial is registered for the first time, the publication-domain/serial-code combination under which it is first registered is the canonical form of USIN. Subsequently, SRP may be used to create alternative USIN forms. When such an attempt is made, the SRP request must specify both the publication-domain/serial-code combination for the current canonical USIN and the new alternative publication-domain/serial-code combination. It may be reasonable to require that permission from the domain authority of both domains be obtained. Any number of alternative forms for a serial may be created in this way.
The SRP request to change the canonical form of a serial must specify the publication-domain/serial-code combination of both the current and proposed new canonical forms. The request is made by the authority for the new publication domain and must be verified by the authority for the currently canonical publication domain. If approved, the change will be scheduled to occur at the next scheduled global synchronization time for changes to USIN canonical forms, or to a later synchronization time specified in the change request. Once the change becomes effective, the canonical form is switched, but both forms remain acceptable.
SRP also can be used to deactivate or reactivate a serial. In essence, deactivation of a serial registers a new publication pattern in which no further publications are predicted. Reactivation requires a new SDL definition that may change the title and future publication pattern of a serial, but still requires consistency with the entire history of previously assigned USINs.
Publication Domain Protocol is the final proposed service
of the USIN Global Registry. This protocol is used to create
and register new publication domains, transfer authority for
domains, register the USIN-certified librarians for a domain
and other related functions. In general, these actions
will refer to subdomains of some existing publication domain;
even top-level USIN domains such as ISSN and RDNS
may be considerd as subdomains of a global USIN publication domain.
Creation of a code for a new publication domain under PDP parallels the creation of a new serial code under SRP. In both cases, the proposed code must be checked to verify that it is previously unused in the context of the parent publication domain. Furthermore, the manual review of serial codes by a USIN-certified librarian should also occur for new publication domains. Ideally, this manual review should verify that the publication domain corresponds to an actual publishing institution, organization or administrative division thereof and is a scholar-friendly mnemonic designation of that unit consistent with historical practice wherever possible. Alternatively, the publication domain may represent a newly-formed collective or coalition expressly formed for the purpose of organizing the upper levels of the USIN domain structure.
A further parallel with SRP is to suggest that formal domain definitions be registered and revised as required from time to time. These definitions would specify the identity and organizational history of a publishing entity. From a domain definition, then, one should be able to determine the name of a particular publishing entity, its parent organization, its successors and predecessors and so on. However, domain definitions would not have the complexity of serial definitions under SDL, because there are no corresponding requirements in publication domains for enumeration, chronology and other aspects of serial definitions.
PDP should also support the registration of alternative USINs
and changes in canonical USIN for the publishing entities
denoted by publishing domains. The registration of alternative
USINs under PDP could parallel SRP in a straightforward fashion.
However, registration of a new canonical USIN for
a publishing domain is complicated by the implications
for serials and subdomains within that domain.
Consider a proposed change from RDNS."acm.org"
to S.ACM as the canonical USIN for
the Association for Computing Machinery. Normally,
this should imply corresponding changes
for all subordinate serials and subdomains recursively.
Thus, changes in canonical USIN from RDNS."acm.org"/CACM
to S.ACM/CACM, from RDNS."acm.org".SIGPLAN
to S.ACM.SIGPLAN and from
RDNS."acm.org".SIGPLAN/Notices
to S.ACM.SIGPLAN/Notices should all be expected
in the example. However, it may be unwise to automatically
make such changes without review in every instance.
Thus, under PDP, a change in canonical form for
a publishing domain should be carried out by first registering
all the appropriate changes for subordinate serials and subdomains.
This may be enforced under PDP by permitting a registration
of a new canonical form for a publication domain only when
alternative canonical forms for all active subdomains and serials therein
have been registered.
Finally, PDP should also provide for the deactivation and possible reactivation of domains. Deactivation of a publication domain implies that no further publication activity is contemplated within that domain or its subdomains. Hence deactivation of a domain should only be permitted when all subordinate serials and subdomains have themselves been deactivated. Reactivation of a publication domain may occasionally be contemplated. However, to ensure the permanence of identification of USINs issued in the subdomain prior to its earlier deactivation, a reactivation request should not be automatically granted. Instead, a "contract" may be first returned identifying previous use of the domain, assigned subdomains and serials and the requirement that new use will respect these. The proposed new domain authority should agree to these terms before the domain can be reactivated.
Now consider how the day-to-day needs of scholars can be directly supported by a USIN Global Database System. Three basic needs can be identified: (a) the need to inquire about the article or other item denoted by a given USIN, (b) the need of authors to cite articles by USIN, and (c) the need to use USINs in literature research, both to denote search keys (citation indexing) and search results. USIN Inquiry Protocol is the first proposed technology to assist users in this regard; it provides for both the interactive inquiry about USINs and for hypertext citation of USINs in World-Wide Web documents. To support citation by USIN in other types of document formatting software, a Bibliographic Retrieval Protocol is proposed coupled with bibliographic formatting "plug-ins" for standard word processing packages. The final subsection discusses the role of the USIN Global Database and USINs generally in literature research.
One of the primary motivations underlying the USIN concept is to address the "broken links" problem on the World-Wide Web: citation of works by Uniform Resource Locator (URL) is prone to failure when the cited item is moved or removed. To solve this problem, it has long been suggested that names of resources rather than their locations should be the basis of citation, but none of the proposals for Uniform Resource Names (URNs) has yet succeeded. A more successful approach may be to concentrate on an important subset of the general problem: links to serially-published documents. For this subset, consider the direct use of USINs as permanent, "unbreakable" links and the development of USIN Inquiry Protocol (UIP) to enable this use. For example, a hypertext reference to a sample TOPLAS article could be coded using the following HTML markup.
<A HREF="uip:S.ACM/TOPLAS:16@1811">A Behavioral Notion of Subtyping</A>Note that a hyperlink formed in this way makes no reference to any particular computer system. Thus, the requirements of URNs are satisfied; the target of a link is designated by naming what it is instead of where it is located.
Apart from this use in Web-based documents, UIP also supports
the direct inquiries about a particular USIN. All that the
scholar need do is to type uip:S.ACM/TOPLAS:16@1811
directly into the "location"
field of his favorite Web browser (assuming that the browser
has been updated to include the UIP client-side software.)
Ignoring for the moment how it works, the critical issue from a user perspective is what you get when you make a UIP/USIN inquiry, either directly or by activating a hyperlink. One answer is that you retrieve a metadata page, that is, an information page about a document, but not the document itself. In general, the direct retrieval of documents cannot be guaranteed because many of them may not be electronically available. On the other hand, if a document is available on-line, it may be available from a variety of different sources with a variety of different formats and/or pricing structures. The purpose of a metadata page, then, is to provide a full bibliographic description of the article or other item denoted by the target USIN, and a set of links for making further inquiries about the article and/or retrieving a copy of it.
In general, one may consider an ambitious design goal for metadata pages: to provide a comprehensive information resource with respect to the cited items. In addition to basic bibliographic information and links for acquiring copies of articles, a number of other items could be provided. Each article metadata page could include direct links to information about the serial and its publisher. Using the USIN notation it should also be easy to include links for retrieval of contents pages for sibling articles in the same journal issue or volume. Links for exploring other publications by the authors of the article might be included. In particular, links for locating subsequently published corrigenda would be worth highlighting. Information on review articles that discuss the document of interest may be included. In conjunction with a citation database, links for retrieving the sets of articles that are respectively cited by and cite this article could also be considered. Finally, it may be reasonable to consider including links to search services that can locate similar articles by full-text searching using a document surrogate (keywords and other metadata that describe the current document).
It may be the case that the coded USIN in a UIP hyperreference does not refer to a single article, but instead denotes some other serial component or is ambiguous or erroneous. In each of these cases, the page returned through UIP should also strive to provide comprehensive information to the user. For example, in the case of an USIN reference by page number where more than two articles start on the specified page, a menu showing each possible article could be returned together with their correct canonical USINs.
These ambitious goals for the metadata pages returned by UIP servers need not represent an obstacle to server development. The initial implementations of UIP servers may focus on basic capabilities, allowing additional functionality to be added over time. In addition, many of the capabilities could be implemented in a fairly modular fashion. For example, if a particular document delivery service supports web-based document ordering by USIN, then generating the appropriate document ordering link is a simple matter.
Returning to the issue of how UIP may be implemented, note
that the syntax for UIP/USIN citations does not specify the actual
server to be consulted in resolving the UIP request.
Rather it is reasonable to expect that the server would be
specified by an appropriate client-side mechanism, such as a
UIPSERVER browser parameter or environment variable.
Typically, users might choose to set their
UIPSERVER to specify a server operated by
a major local research library or library consortium.
In this way, the metadata pages returned can be formatted
to emphasize local holdings of cited documents,
even when the citing document is remotely located.
A key goal of the USIN scheme is to support authors of scholarly works in the preparation of bibliographic references. This may be achieved by bibliographic processing "plug-ins" or "add-ons" to standard word processing software that will allow authors to cite works by merely entering USINs at the appropriate citation points. The bibliographic processing modules could then take care of all the remaining details for resolving and formatting the citations: retrieving the actual full bibliographic citations, assigning appropriate in-text reference numbers or labels, formatting the citations according to a chosen style guideline, sorting them according to a user- or style-specified ordering, and incorporating the citations into the document as a reference list at the back or sequentially in footnotes. As well as removing a considerable source of tedium in the preparation of scholarly works, the use of USINs in this way should also improve the accuracy and quality of citations by eliminating manual errors and inconsistencies. Finally, a serendipitous benefit of having the citations in a paper represented as USINs is that the citation set can then be made available as data; citation databases can thus be supported by citation data provision at the source [6].
A modular design for a USIN-based bibliographic processing system is to allow many different bibliographic formatting tools to retrieve data from the USIN Global database using a common retrieval protocol (say BRP: Bibliographic Retrieval Protocol) and citation representation format (say BDF: Bibliographic Data Format). This would allow the development of competing bibliographic formatting tools that might cater to different user preferences and to different types of document processing system. BRP could be designed to work with locally-mounted copies of the USIN database for access to the bulk of historic bibliographic data, coupled with direct Internet access to the USIN Global Database for access to the latest references. BDF should provide a highly-structured logical format for citation data, in order to allow various transformations on that data to be easily implemented. Ideally, UPP (USIN Publication Protocol) and BDF should be designed together so that the bibliographic data in the correct format is gathered directly during the USIN registration process.
In support of bibliographic inquiry, retrieval and formatting, the USIN global database is designed to provide a comprehensive solution when starting with a set of citations represented as USINs. But consider also the literature research task, that is, the need to find citations of potential interest using various search methods. In this case, the USINs are not known ahead of time, but may represent the results of the search process. In support of literature research, then, what role should USINs, in general, and the USIN Global Database, in particular, play?
One possible approach is to expand the requirements for the USIN Global Database to also provide comprehensive support for literature research activities. After all, the USIN Global Database is intended to be comprehensive in its coverage of the citable works and must provide the basic bibliographic data (author, title, serial name, serial enumeration, publication date) for each archived item. With the extension of the database to include abstracts, keywords and classification data for each item, it is possible to contemplate comprehensive support for literature research.
An alternative approach, however, is to support multiple alternative literature databases each of which provide their own methods of augmenting the basic bibliographic data available from the USIN Global Database. USINs themselves could form the basis of interoperability between the databases, i.e., distinct results from different databases could be easily combined by USIN sorting and matching operations. Such an approach would support different classification schemes that might be appropriate in different subject areas, competition between different full-text searching techniques based on article abstracts and/or article full text, selective databases that target sources relevant to a particular topic or type of material, experimentation with filtering schemes that grade the level or nature of materials, alternative language databases that support searching in languages other than English, and so on.
From the standpoint of good modular system design, one can also argue that the USIN Global Database should deal only with the basic bibliographic data that derives from the publication process. Classification, evaluation and review materials should be considered third-party metadata that may come from a variety of sources. Without any agreed upon method for standardizing what types of metadata should be provided and who should provide it, it would be a poor choice to impose de facto standardization by incorporating a particular third-party metadata scheme into the USIN Global Database.
Nevertheless, it is reasonable to consider a limited extension of the USIN Global Database to support one additional form of metadata, namely citation metadata. A requirement of UPP could be that the USINs of cited references be supplied as part of the publication process. If, as suggested previously, scholars use USINs in writing their documents, it should not be difficult to provide them in the publication process. If this were done, it could support the development of a universal citation database that would in turn be a valuable tool for literature research and a potential catalyst for reform in scholarly communication [6].
The USIN scheme is a proposed system for the global and persistent identification of the publications in organized serial collections. Ultimately some global identification scheme is likely to be developed for interoperation of various article citation applications. Scholars should seize the opportunity that now exists to ensure that the scheme that succeeds is the one that is designed primarily to meet the long-term needs of people (authors and readers), not the short-term needs of particular present-day computer systems belonging to vendors, libraries or document delivery services.
This paper has presented a vision for a scholar-friendly universal identification system for serially published works. It has also presented a number of concrete design proposals for USIN syntax and technological components that can support a global USIN system. In particular, a uniform naming model has been presented based on hierarchical naming of serial publications and hierarchical numbering of serial items. Two important systems in support of the USIN concept have been proposed, specifically, a USIN Global Registry and a USIN Global Database. Designs for each of these systems have been presented at a level that illustrates how specific architectural features can interact to meet the requirements of publishers, librarians and scholars.
There is a great deal more work required to fully realize the USIN concept. The author would be most appreciative of your help.
Andrew Walenstein has helped greatly by providing valuable feedback on several drafts of this paper. Jim Cole, while still questioning some issues from a serials cataloguing perspective, has been a source of considerable encouragement. I am also grateful to the anonymous referees for many constructive criticisms and helpful suggestions.
With no other formal denotation known for this work, it might only
be denotable by reference to this paper. Possible eventual USIN:
S.BCS/JoDI:1(3)$1|ref(1). This assumes that BCS
becomes assigned to the British Computer Society in the international
domain of scholarly societies, and that JoDI is
reserved by BCS to to denote the Journal of Digital Information.
Suggested initial USIN: ISSN.8756-0860/Z39.44-1986.
Possible eventual form US.ANSI/ANS:Z39.44-1986.
Suggested initial USIN: RDNS."isoc.org"/RFC:1630.
Possible eventual form I.ISOC/RFC:1630.
Suggested initial USIN: RDNS."isoc.org"/RFC:1738.
Possible eventual form I.ISOC/RFC:1738, where
ISOC might uniquely denote the Internet Society in
a domain I of International organizations.
Here, RFCs are identified in the domain for the Internet Society,
the principal sponsor of the series.
Technically, the "RFC Editor", chartered by the Internet Society,
is said to be the publisher. However, it seems clear enough
that RFC will remain an unambigous code for this series in the
context of Internet Society sponsored publications.
Suggested initial USIN: RDNS."sfu.ca".CMPT/TR:94-08.
Possible eventual form CA.SFU.CMPT/TR:94-08.
Suggested initial USIN: RDNS/"firstmonday.dk":2(4)$4.
Here, the article number ($4) is determined by
counting. Eventually, the form P.Munksgaard/FirstMonday:2(4)$4
may be used, where Munksgaard is the code
for Munksgaard International Publishers in an international
publishers domain. Another possibility is J.FirstMonday:2(4)$4
based on the concept of a global journal domain J operated by a
publisher consortium.
Suggested initial USINs: ISSN/0001-0782:30@933,
RDNS."acm.org"/CACM:30@933.
Possible eventual form S.ACM/CACM:30@933.
An interesting point to note is that issue numbers are not
required for CACM prior to volume 33.
Suggested initial USIN: RDNS."isoc.org"/RFC:2169.
Possible eventual form I.ISOC/RFC:2169.
Suggested initial USIN: RDNS."isoc.org"/RFC:2168.
Possible eventual form I.ISOC/RFC:2168.
ISBN/0-521-56413-1 and
ISBN/0-521-56474-3. These codes use ISBNs for
the hardback and paperback versions, respectively.
Choosing the code for the hardback version as canonical
may be appropriate.
Possible eventual USIN: S.BCS/JoDI:1(3)$1|ref(11).
Initial USINs: ISBN/0-8186-0525-1@465 (paper), ISBN/0-8186-4525-3@465 (microfiche),
ISBN/0-8186-8525-5@465 (casebound).
Possible eventual form I.IEEE/Compcon:28@465.
Suggested initial USIN: ISSN/0169-7552:27@193.
Possible eventual form P.Elsevier/COMNET:27@193.
Here, the code COMNET is used by Elsevier for
this journal.
Possible eventual USIN:
S.BCS/JoDI:1(3)$1|ref(14).
Suggested initial USIN: ISSN/0001-0782:37(2)@30.
Possible eventual form S.ACM/CACM:37(2)@30.
Suggested initial USIN: ISSN/0098-7913:22(4)@1,
RDNS."jaipress.com"/SR:22(4)@1.
The code SR is speculative.
Possible eventual form P.JAI/SR:22(4)@1.
Suggested initial USIN: RDNS."isoc.org"/RFC:2141.
Possible eventual form I.ISOC/RFC:2141.
Suggested initial USIN: RDNS."isoc.org"/RFC:1034.
Possible eventual form I.ISOC/RFC:1034.
This is an interesting case which is published in the
National Information Standards series (ISSN 1041-5653) of
NISO. It has also been given an ISBN. But the code
Z39.56-1996 represents its numbering as an
American National Standard.
Suggested initial USIN: ISSN.1041-5653/Z39.56-1996.
Possible eventual form US.ANSI/ANS:Z39.56-1996.
ISBN/0-89347-055-4.
Suggested initial USIN: ISSN/0953-1513:10@135.
Learned Publishing is published by the
Association of Learned and Professional Society Publishers.
On the path towards mnemonic identification, the USIN
form RDNS."alpsp.org.uk"/LP:10@135 may
temporarily be used before an international domain structure is in
place. Eventually, the canonical form may become
S.ALPSP/LP:10@135 based on a domain S
of scholary societies.
Suggested initial USINs: ISSN/0361-526X:28@367 and
RDNS."haworth.com"/SL:28@367.
Possible eventual form P.Haworth/SL:28@367.
Suggested initial USIN: RDNS."isoc.org"/RFC:1737.
Possible eventual form I.ISOC/RFC:1737.
Suggested initial USIN: ISSN/1080-2711:3(1)$5.
Possible eventual form EDU.UMICH.PRESS/JEP:3(1)$5.