header_logo
 
  • Contents
  • » Home
  • » PILIN Project
  • » Handle System
  • » Implementations
  • » Project Documents
    • » Policy Documents
    • » Technical Documents
    • » Presentations
    • » Community Requirements
    • » Community Guidelines and Considerations
    • » PILIN Glossary
    • » PILIN Ontology
    • » PILIN SUM
    • » Non Software Products
  • » Stakeholders
  • » Acronyms
  • » PILIN Team
  • » Closure Report (PDF)
Contents > » Project Documents > » PILIN SUM
  PDF version

PILIN Service Usage Model (SUM)

  • 1 Identifier Service Usage Model
    • 1.1 Rationale
  • 2 Notation
  • 3 Description
    • 3.1 Identifiers
    • 3.2 Registered identifiers
    • 3.3 Association and Resolution
    • 3.4 Contexts
    • 3.5 Use and maintenance of identifiers
  • 4 Business Process Modelling
    • 4.1 Classes of Processes and Users
    • 4.2 Management Processes specific to identifiers (Domain processes)
    • 4.3 Identifier Usage (Actioning)
    • 4.4 Provisioning Processes
  • 5 SUM Diagram
  • 6 Usage Scenarios
    • 6.1 Publish identifier, given name and association
    • 6.2 Create unpublished identifier, given name and association
    • 6.3 Register Association
    • 6.4 Reserve a Batch of Names for future use
    • 6.5 Brand Thing with Name before Registering Association
    • 6.6 Generate Label in User Interface
    • 6.7 Verify Identifier URL
    • 6.8 Obtain Arbitrary Copy
    • 6.9 Obtain Appropriate Copy
    • 6.10 Other usage scenarios
  • 7 Applicability
  • 8 Functionality
    • 8.1 Restrictions on choreography of functionality
    • 8.2 Domain Functionality: Provisioning Identifier Contexts
    • 8.3 Domain Functionality: Provisioning Identifiers
    • 8.4 Domain Functionality: Maintain Identifiers
    • 8.5 Domain Functionality: Actioning Identifiers
  • 9 Structure & Arrangement
  • 10 Applicable Standards
  • 11 Design Decisions & Tradeoffs
  • 12 Implementation Guidance & Dependencies
  • 13 Known Uses
  • 14 Data Sources Used
  • 15 Related SUMs
  • 16 Services Used
    • 16.1 Identifier-Specific Services
    • 16.2 Generic Data Source Services
    • 16.3 Services from external Service Usage Models
  • 17 CORE SUMs Used
  • 18 References
  • 19 Glossary
PILIN Service Usage Model (SUM)

graphics1 

web: http://resolver.net.au/hdl/102.100.272/0N8J991QH

email: policy@pilin.net.au

Change Log

Version

Date

Status & Changes

Expression identifiers

V1.0

2008-01-18

RELEASE

PILIN/WG79P7SQH
hdl:102.100.272/ WG79P7SQH

To cite the latest version of this work use http://resolver.net.au/hdl/102.100.272/L8ZDW6PQH
To cite this version of this work, use
http://resolver.net.au/hdl/102.100.272/WG79P7SQH

This document is a work in progress and may contain open questions not resolved during the timeline of the PILIN project. It represents the thinking of the PILIN team as at December 2007.

1 Identifier Service Usage Model

1.1 Rationale

The identifier service usage model (SUM) describes the range of possible services available from a generic identifier management system. The SUM is written at the service genre level, describing abstract capabilities not tied to a particular identifier technology, interface, protocol, or scheme.

Since identifiers are pervasive in data processing, it is expected that this service usage model will be used to

  • inform development of systems and other SUMS that use identifiers

  • inform development of concrete service expression level SUMs for specific identifier management systems

For example, the PILIN project uses this service usage model as a basis for its development of an identifier management system based on Handle.

2 Notation

This service usage model uses vocabulary defined in the PILIN Identifier Systems Ontology [1]. The following are core concepts:

  • A Label is a symbol.

  • A Context is an entity with a particular purpose for organising and managing labels. Contexts have policies regarding labels, and a label can be in one or more contexts. (A label is unique in a given context.)

  • A Name is a pairing of a label with a context the label is in.

  • An Identifier is an association of a name with a thing (the name identifies the thing).

  • Labels, contexts, names and identifiers all have representations, which are usually strings.

    graphics2

3 Description

The identifier service usage model (SUM) specifies the expected behaviour of an identifier management system, which provides identifier infrastructure for other applications. The identifier management system exposes services both for managing identifiers, and for providing actors with access to identifiers.

Access to services may be subject to authentication and authorisation. The authentication and authorisation infrastructure is out of scope of this SUM; it is specified in a separate Identity SUM.

The SUM is written at the service genre level, and is not specific to any one technology, interface, protocol, or identifier scheme. It does however assume that identifiers are used in the digital realm, with at least some registration and actioning of identifiers performed digitally.

3.1 Identifiers

Identifiers are an association of a name with a thing: identifiers distinguish individual things (not necessarily digital objects) from all other things in the world. This may be for access, manipulation, processing, or description of the thing. The use of identifiers for things is very common in computer applications.

3.2 Registered identifiers

This SUM describes registered identifiers: identifiers are stored in an identifier management system from which they can be retrieved, and at least some actions on the identifiers are mediated through the identifier management system. Through registration in the system, identifiers are well managed: they are accountable and reproducible. Identifier uniqueness can be guaranteed within the identifier management system. Identifiers can also become persistent, although this quality is not presupposed by this SUM.

By contrast, an unregistered identifier has a name that is either derived from the object at runtime, or else is decided by the namer to be a name without registration. Neither guarantees reproducibility, accountability, persistence or uniqueness.

Identifiers are entered into the identifier registry on creation; this includes a number of possible discrete steps, enumerated under the "Functionality" element below. Once registered, any information associated with the identifier may be updated.

3.3 Association and Resolution

The association between an identifier and a thing, as well as other data necessary in the operation of the identifier, is registered in the identifier registry. The association takes the form of information about the thing, used to distinguish it from other things (and typically to access the thing).

The main service exposed to external actors is Resolution, which returns information about the thing identified by an identifier.

Services on an identifier are not restricted to Resolution, and indeed may not involve the thing at all. For example, this SUM specifies services for creating labels and registering names, independently of how those names are associated with things.

3.4 Contexts

Identifiers have contexts for their names. A context differentiates labels used for distinct purposes and with different authorities ("owners"). The combination of a label and a context for the label gives a name, and the same label can be used in different contexts to give different names. Any label is necessarily unique in its context. Contexts impose policies on the labels in the context, including association policy: so the context of a name determines how the name is interpreted as an identifier. Contexts can also impose policies on what labels are allowed in the context (label format policy); policies on who can perform what actions on a name in that context (access policy); and a policy on registering authority metadata for the name (provenance policy). Contexts for identifiers incorporate information on what the current authority over an identifier is. Identifier management systems also manage contexts, and this is modelled here through a distinct context registry.

3.5 Use and maintenance of identifiers

Compound services may act on an identifier, in the form of actors (end-users and end-services) consuming services exposed by this SUM. The main such service exposed by this SUM is Resolution. The identifier may also be validated throughout its lifespan: various transactions may take place to ensure that the identifier performs as expected, including verifying that it is still correctly actionable, and inspecting the information registered in the identifier. All this functionality is covered by this SUM.

4 Business Process Modelling

4.1 Classes of Processes and Users

Identifiers are pervasive in computer systems: they allow computer systems to manipulate a simple representation of a thing (the identifier) rather than the thing itself.

This SUM is restricted to a subset of all possible identifiers: It only addresses identifiers that are passed between systems. Example uses of such identifiers include:

  • Providing access to a thing to a user outside a computer system through an identifier

  • Retrieving a thing identified by an identifier

  • Processing or manipulating a thing identified by an identifier

  • Attaching metadata to a thing through its identifier

  • Claiming a relation between two things through their identifiers

  • Tracking attempts to retrieve a thing through its identifier

  • Validating the integrity of a thing through its identifier

  • Referring to a thing through its identifier in a communication

Because of the pervasiveness of identifiers, there is a huge number of workflows in which identifiers can be used. We do not intend to model all possible uses of identifiers, nor all the workflows they can be involved in. Instead, we distinguish two types of process, according to whether they change identifier state or not:

  • Identifier Management processes correspond to Curatorial Actions on identifiers and contexts. That is, processes involving creation, management, maintenance, and validation that involve changing the state of identifiers in the system.

  • Identifier Usage processes can originate in other computer systems relying on the identifier system. They correspond to Non-Curatorial Actions that do not change the state of identifiers; instead, they use identifiers to realise goals outside the identifier system.

These types of process define two classes of users in turn:

  • System administrators are authorised to initiate both identifier management processes and identifier usage processes. Administrators carry out management processes within the curation boundary of the system, as defined in the PILIN ontology.

  • External users and services (actors) are not authorised to initiate identifier management processes, but are authorised to initiate identifier usage processes.

The distinction between authorised administrators and unauthorised end users means that an identifier system requires authorisation functionality to determine if an actor can use a service. Such functionality is described in an Identity SUM, and is not further detailed here.

Our discussion of processes is exhaustive for identifier management, since that is the exclusive concern of identifier management systems. Our discussion of processes for identifier usage, which involve both the identifier management system and external systems, is only illustrative. It involves the two main processes through which identifier management systems interact with external systems: resolution and retrieval.

Other identifier usage processes and workflows are possible, but are not in scope for this SUM. Such processes typically involve the same services provided by the identifier management system, but different services contributed by external systems. Limiting discussion of identifier usage to resolution and retrieval already provides a fairly complete description of how identifier management systems interact with the outside world.

Identifier management processes can be further broken down into processes specific to individual identifiers (domain processes), and processes involving the identifier management system as a whole (provisioning processes to make the system accessible, and management processes to maintain its performance). While the latter two classes of process are essential to running an identifier management system, they are applicable to any registry, and are not specific to identifier management systems. For that reason, their functionality not described in any detail in this SUM, and should be described in a separate Registry SUM.

4.1.1 Processes covered by this SUM

The types of processes covered by this SUM are broken down as follows:

  • Identifier Usage: Enable end actors to carry out actions involving identifiers.

    • Resolve, Retrieve, Get Appropriate Copy…

  • Identifier Management:

    • Domain processes: Provide functionality specific to identifiers registered in the identifier management system.

      • Register identifiers: Create new identifiers and register them in the system.

      • Update identifiers: Update the information recorded with identifiers in the system.

      • Publish identifiers: Make an identifier accessible to a class of end actors through a particular service.

      • Register contexts: Create new identifiers contexts and register them in the system.

      • Update contexts: Update the information recorded with contexts in the system.

      • Maintain identifiers: Maintain the accuracy of individual identifiers.

    • Provisioning processes: Provide infrastructure to make the identifier management system a functioning, accessible registry (users, authentication and authorisation policies; business rules and system data structures governing identifiers).

4.1.2 Processes not covered by this SUM

Although publishing identifiers is presented here as a domain process (in scope of our business process modelling), it is realised in this SUM through authorisation infrastructure that provides access to identifier services. So while the business process of publishing an identifier is in scope of the SUM, services supporting the functionality of publishing an identifier is not.

Maintaining an identifier management system may also involve System Management functionality, to maintain the performance of the identifier management system as a whole (performance monitoring, analysis and tuning, content inspection, backup and mirroring). This functionality does not involve a discrete identifier business process, and is not discussed any further here.

4.1.3 Identifier Model

This SUM presupposes the PILIN identifier ontology as its underlying conceptual model, as summarised under "Notation".

The SUM describes systems for managing identifiers through data. The following illustrate the data required for managing identifiers:

graphics3

  • context data = who has authority over the context, date created

  • name data = who has authority over the name, date created

  • association data = access information for the thing (e.g. A URI if the thing is in web-space).

  • thing data = type of thing being identified, general thing metadata

  • identifier data = e.g. authority metadata

The higher order groups of business functions may be broken down into individual business processes as follows:

4.2 Management Processes specific to identifiers (Domain processes)

Domain processes for managing identifiers involve not only the identifier system manager as an actor, but also identifier managers, managers of things, and end users and systems.

Identifier managers, if they exist, are delegated the responsibility of managing individual identifiers by the manager of the overall identifier management system.

The things identified by the identifiers are usually managed by other parties, who create, curate, or publish the things through a distinct data source (the content data source). There may be coordination between the manager of the thing being identified and the identifier manager, as described under "Updating Identifiers" below.

Identifier system managers and identifier managers are system administrators: they can change the state of identifiers on the system. Managers of things and end users are external users: they cannot change identifier state directly. The manager of the thing can arrange with the identifier manager to have the identifier updated, and indeed the identifier manager may rely on the manager of the thing to provide this updated information; but the update is initiated only by the identifier manager, as system administrator.

There are four main processes involving identifiers:

  • Register identifier

  • Update identifier

  • Maintain identifier

  • Publish identifier

Publishing processes are discussed with provisioning processes, since publishing is realised through provisioning of access to the identifier.

4.2.1 Identifier Provisioning: Registering Identifiers

Registering identifiers is a process triggered by identifier managers, who have authority to register identifiers. Registering identifiers is an Identifier Provisioning process: it provides accurate data (including names and association data) to the registry in connection with the identifier.

Registering an identifier requires the assembly of several pieces of data: the name, the association, the context, the authority. Adding each of these to the identifier has distinct business motivations, and imbues the identifier with distinct qualities; so we can differentiate them as different business events—although they do not constitute independent business processes, as they do not have their own business goals:

  • Register name

  • Register association

  • Augment identifier information

  • Register identifier

There can be problems in synchronising the availability of this information, so the process of registering an identifier can be regarded either as a single process, or broken up into steps for adding the name, the association, and other information to the object in the identifier system.

Registering the name in isolation from the association reserves it for future use through association: so long as the system supports name uniqueness, registering the name to one identifier manager prevents it from being used as an identifier by an independent identifier manager.

The label used in a name may need to be generated through some procedure, to conform with the name format prescribed in the identifier’s context (if any). Therefore label creation may be exposed as a separate service used in a range of processes. Moreover, label creation need not be specific to a single identifier system.

Although registering a name and registering an association are typically tightly coupled in registering an identifier, they may be uncoupling in circumstances such as the following:

  • A name is required for the thing to be created, but association data about the thing is not available until after the thing is created. For instance: a thing is branded with a name on creation (e.g., included amongst its metadata); but the thing’s authoritative locator will not be known until after the thing is created and ingested into a repository as a complete object. So the name is treated as a reserved identifier until the thing is completely created.

  • One or more names have been reserved for use in identifiers, but are only associated with things when required. This may occur:

    • For performance reasons: the identifier system manager does not launch a separate process of name creation for each thing to be identified, but instead reserves (and registers) a batch of names, and may register the associations for those names in a batch as well.

    • To realise a subdivided namespace: a number of institutions share a common name context. In order to prevent collisions between the institutions, distinct runs of names are allocated beforehand to each institution. These runs of names are thus reserved for each institution, by registering them in the shared identifier system.

4.2.2 Identifier Provisioning: Updating Identifiers

Updating identifiers is a process triggered by identifier managers, who have authority to update identifiers, as an identifier provisioning process.

Like registering, updating may affect only some pieces of data associated with the identifier; for instance, the authority for an identifier may be updated without updating its association. Updating each separate piece of information can been seen as a distinct business event in the overall update process.

The manager of the thing being identified may become an actor in the update process in order to ensure accuracy. For example, whoever is managing the thing may provide up to date association data when the thing moves (e.g., a new URL). The identifier manager then needs to update the association data, to maintain the resolution capability of the identifier. The coordination between the two actors could be as either a push or a pull process.

Identifier deletion may also be included in the updating identifier process, since it causes a change in the registered identifier’s status. For persistent identifiers, the deletion process should not be used. Such identifiers persist as long as the identifier management system supporting them, even if the things they are associated with are no longer accessible.

4.2.3 Identifier Provisioning: Registering Contexts

The identifier management system registers names by associating labels with contexts; so it also needs to manage contexts as digital objects. This is particularly necessary if the identifier management system realises multiple contexts, but is also applicable if it only realises a single context subject to change in its metadata. Management of contexts includes managing their names and policy profiles.

4.2.4 Identifier Provisioning: Updating Contexts

The context information registered in the identifier management system needs to be updated as the context changes. This is not typically a recurring or automated process, so it rarely involves a service as opposed to manual configuration.

The context provides the framework for the authority over an identifier. (For example, the context enumerates who currently has authority over the context, and by implication the identifiers it contains.) So changes in the authority over identifiers are modelled as updates to the context.

Since an identifier management system can manage several contexts, it can cease managing a context without being decommissioned as a system. This means that there is also a business process for deleting contexts from a system.

4.2.5 Identifier Maintenance: Verification

Identifier Verification is a maintenance process triggered by either the identifier system manager, or individual identifier managers. The aim of the process is to confirm that information is being maintained in the system for that identifier, and that the data stored is valid. If the data is invalid, the process may trigger an update process. Verification is needed to guarantee persistence for an identifier, if this is desirable.

Verification may depend on the name context, the authority metadata, or the thing associated with the identifier.

The typical focus of verification is confirming that the thing being identified can be accessed using association data: of the data associated with identifiers, access information changes the most, and their change has the greatest impact on identifier usefulness. Accordingly, the most common identifier object verification is that it remains resolvable (“link rot checker”), and that it is resolvable to the correct thing.

While maintaining identifiers is a process undertaken by system administrators, only the update function requires authentication. A verification service may be available to external actors without compromising identifier management, as it is a read-only function.

Context objects are also subject to verification through similar mechanisms.

4.2.6 Identifier Maintenance: Querying

Identifier Querying is a maintenance process triggered by either the identifier system manager, or individual identifier managers. The aim of the process is to enable managers to inspect identifier records for either troubleshooting of existing problems, or to identify potential causes of disruption in system function.

The process involves inspecting the system for any information associated with an identifier (see Query Identifier in “Functionality” element below).

4.3 Identifier Usage (Actioning)

Actioning an identifier involves processes triggered by external users and systems that use identifiers without changing their state. Actions on the identifier need not be provided by the identifier system: such a service can be provided by any party, once the identifier is registered, and an identifier management system cannot limit the uses to which the identifier can be put outside the system. This SUM does not attempt to describe the full range of such uses.

However, typical identifier actioning processes depend on the data registered with the identifier, including but not limited to association data. So the identifier management system must allow external actors to look up data registered with the identifier, for identifier actioning to be possible. This means that identifier usage processes are distributed between two systems: the identifier management system, which does the lookup, and the external system that uses that lookup to realise the action.

Looking up registered data is an action triggered by end actors. Access may or may not be authenticated.

4.3.1 Obtain Object: Resolve + Retrieve

The most common identifier usage process is one of obtaining a presentation of an object, given the identifier for the object. The object is obtained from a content data source distinct from the identifier management system. Therefore the business process is realised through two system functions. The lookup function on the identifier management system is Resolve, which accesses association data for the identifier; and the function using the lookup in the external system (accessing the thing identified from the content data source) is Retrieve.

There are several types of Obtain processes an actor may require, each of which imposes different behaviour on the Resolve and Retrieve functions:

  • Single copy retrieval: There is a single instance of the thing identified and the actor wishes to obtain it.

  • Arbitrary copy retrieval: Obtain any of the available instances of the thing. No distinction is made between available copies, so any will do.

  • Appropriate copy retrieval: Obtain the instance of the thing most appropriate given the actor’s circumstances, depending for instance on location or format.

  • Multiple copy retrieval: The actor wishes to obtain all available instances of the thing identified.

  • User selected retrieval: Allow the actor to explicitly select the instance to retrieve out of all available instances

Although all these types of Obtain process may operate on the same identifier, they result in different behaviour as the Resolve and Retrieve functions are set up differently.

4.3.2 Examples of non-resolution actions

Other actioning processes made possible by identifiers exist. For example, an external user may annotate a thing by creating a record of the annotation keyed to the thing’s identifier. Another example is to create a collection of things by gathering together their identifiers. In both cases, neither resolution nor retrieval is directly involved (although the user knows enough about the association data to know what they are annotating or collecting). Nonetheless the activity occurs outside the identifier management system, and uses the identifier for a distinct business goal; so it counts as actioning the identifier.

4.4 Provisioning Processes

With the exception of publishing (discussed below), provisioning processes are initiated in order to configure the identifier management system: they do not relate to individual identifiers, and do not fulfil a distinct business goal (other than the trivial goal of “make the identifier system work”). Provisioning processes provide the identifier management system with new users, authentication rules, authorisation policies, business rules, and system data structures.

Provisioning is invoked when an identifier management system is first set up; it can then be applied in order to update identifier system infrastructure at any time. All provisioning processes are triggered by the identifier system manager. User and authentication provisioning in particular may be exposed as services to external systems: an identifier system is often managed within an institution, in which case it will leverage the institution’s identity infrastructure. Provisioning processes may also be realised through configuration rather than as parameterised services. However the business rules and system data structures should whenever possible be captured explicitly in formal, machine readable artefacts, in order to ensure validation and interoperability.

User, authorisation and authentication provisioning is not specific to identifier management, but occurs whenever authorisation needs to be provided for any service. This functionality is captured in an identity service usage model, and is out of scope for this SUM.

4.4.1 Publishing Identifiers

Publishing an identifier means enabling access to an identifier service by an external actor (outside the “curation boundary”): such actors do not normally have access to the identifier when it is first registered. Since all access to identifier information is mediated through some service, identifiers are published only with reference to a service (typically resolution). So publishing an identifier is modelled here as a business process of provisioning authorisation to access a service: a target user group becomes authorised to apply a (resolution) service to that identifier.

Registering an identifier is often orchestrated with publishing the identifier. However, registering and publishing an identifier may be uncoupled in circumstances like the following:

  • Not all information to be registered with the identifier is available yet: the identifier is to be augmented with further information before it meets the system’s business requirements for publication. (For instance annotations, or additional instances for multiple resolution.)

  • The thing associated with the identifier is not yet available for retrieval by external actors. The identifier should not be published until the manager of the thing authorises external resolution to information about the thing.

  • The publishing services of the identifier registry, which expose identifier information, are currently unavailable.

  • The name is meaningful, and publication is delayed until the name is validated against the proposed association.

Access may be enabled for a service in general, or for a particular (service, identifier) pair. For instance, a target user becomes authorised to resolve identifiers for PDFs, but not identifiers for datasets.

An identifier can also be unpublished: the target group is no longer authorised to apply the given service to the given identifier.

5 SUM Diagram

graphics4

6 Usage Scenarios

A range of scenarios is possible for the creation, updating, and actioning of identifiers, requiring several distinct interfaces to the identifier system to be exposed. Each scenario corresponds to a single business process.

More detailed narratives involving the resolution and other actioning of identifiers are available separately, both as Usage Scenarios and as Abstract Use Cases [2].

6.1 Publish identifier, given name and association

An identifier is to be created as an association between a name and a thing, and made available for resolution to a target user group. Both the name and the association information are given. The target user group is a small group of external users, which gains access to resolution through authentication. The identifier management system requires identifier uniqueness and identifier accountability (as is typical of identifier systems).

The workflow is as follows:

  • Confirm that the name has not already been registered in the identifier registry for that context.

  • Authenticate and authorise a request to add content to the identifier registry.

  • Create an object in the identifier registry of the identifier as an association of name and thing.

  • Add to the object authority metadata for the identifier.

  • Publish the identifier (i.e. authorise resolution queries on the identifier by the target user group).

Under this scenario, all information required to create a resolvable identifier is available at the time the identifier is registered. This is typically the case. This scenario corresponds to the single “Create Identifier” process.

6.2 Create unpublished identifier, given name and association

An identifier is to be created as an association between a name and a thing. Both the name and the association information are given. The identifier is not made available for resolution outside the management system until some other condition is met, such as were listed under "Publish Identifier".

The workflow for Create Unpublished Identifier is as above, but without the final step of publication.

6.3 Register Association

An association is registered between a thing and a name which has already been registered. The registration of the name and the association are uncoupled (for reasons discussed under "Register Identifier"); only when both have been registered is the identifier itself deemed registered. The identifier management system requires accountability of association.

The workflow is as follows:

  • Confirm that the name has already been registered in the identifier registry for that context.

  • Retrieve the identifier object which already exists in the identifier registry, corresponding to the name registered.

  • Register an association of the given name and the thing through the identifier object.

  • Add to the identifier object authority metadata for the identifier (to document the accountability of the association).

  • Save the updated identifier object.

6.4 Reserve a Batch of Names for future use

In this scenario, an identifier manager wishes to reserve a batch of names for future use; this involves registering those names without registering any association for them at this stage. The identifier management system requires identifier uniqueness, and accountability of names.

This scenario uncouples the registration of the name and the association; only when both have been registered is an identifier deemed registered. This occurs under circumstances already given for “Register Identifier”.

The workflow is as follows:

  • Generate a range of labels.

  • For each name:

    • Confirm that the label has not already been registered in the identifier registry for that context.

    • Create a record in the identifier registry of the name (i.e. the label associated with that context).

    • Add to the record authority metadata for the name.

6.5 Brand Thing with Name before Registering Association

In this scenario, a name is required for the thing to be created, but association data about the thing is not available until after the thing is created. The identifier management system requires identifier uniqueness, and accountability of names.

This scenario uncouples the registration of the name and the association; only when both have been registered is an identifier deemed registered. This occurs under circumstances already given for “Register Identifier”.

The workflow is as follows:

  • Confirm that the name has not already been registered in the identifier registry for that context.

  • Create a record in the identifier registry of the name.

  • Add to the record authority metadata for the name.

  • Include name with information used to create thing

  • Create thing

  • Publish thing

  • Obtain association data from service publishing the thing

  • Register association data for name, converting name into identifier

6.6 Generate Label in User Interface

The label for an identifier may need to be generated through some procedure, to conform with the name format prescribed in the identifier’s context (if any). In this scenario, label creation is integrated with a user interface for registering identifiers in an identifier management system.

The workflow is as follows:

  • Identifier manager requests a new identifier to be registered in a specific context, through an identifier management system.

  • The identifier manager allows the identifier management system to generate a label for the identifier, instead of providing their own label.

  • The identifier management system generates an arbitrary label, which meets the requirements of the identifier management system and the specific context.

  • The identifier management system presents the generated label to the identifier manager through the interface for confirmation.

  • The identifier manager accepts the proposed label.

  • The generated label registered in the specified context as a new name.

6.7 Verify Identifier URL

To ensure the identifier management system is working properly, an identifier manager verifies a given identifier. The identifier identifies a digital object, and uses a URL as its association data. Verification involves confirming that the URLs registered with the identifier are all live (“linkrot checker”).

The workflow is as follows:

  • Identifier manager request verification on an identifier.

  • Verifier resolves the identifier to all registered URLs for the identifier (as for multiple copy retrieval).

  • Verifier confirms that each URL is retrievable, through an appropriate attempt to access the specified network location. (e.g. retrieve HTTP header for HTTP URLs)

  • If any URLs are not retrievable, verification fails, and the identifier is considered invalid; report this to the identifier manager.

6.8 Obtain Arbitrary Copy

A published identifier is resolvable to association data, which can be used as a retrieval key by a retrieve process. The association data registered for the identifier allow access to several instances of the thing identified; the end user does not care which instance ends up retrieved.

The workflow is as follows:

  • Resolve the given identifier to association data, listing ways of accessing all available instances of the thing identified.

  • Select a random instance out of the instances listed.

  • Request to retrieve the selected instance from its content data source, using the association data as a retrieval key.

  • Accept the retrieved representation of the instance of the thing identified.

6.9 Obtain Appropriate Copy

A published identifier is resolvable to association data, which can be used as a retrieval key by a retrieve process. The association data registered for the identifier allow access to several instances of the thing identified. The end user provides information about their current physical location. A selection service uses this information to select the optimal instance for retrieval, out of the instances available.

The workflow is as follows:

  • Resolve the given identifier to assocation data, listing ways of accessing all available instances of the thing identified.

  • Use the requester location to select the optimal instance for retrieval, out of the instances listed.

  • Request to retrieve the selected instance from its content data source, using the association data as a retrieval key.

  • Accept the retrieved representation of the instance of the thing identified.

6.10 Other usage scenarios

Other possible scenarios not described above include:

  • representing the association of two things through an association between their identifiers;

  • aggregating a group of things into a single thing by associating the aggregate’s identifier with the components’ identifiers;

  • associating arbitrary metadata with a thing through keying the metadata to the thing’s identifier; this includes document integrity information

7 Applicability

The identifier system described here is applicable to any identifiers relying on identifier registration, and for which actors interact with the identifier system through exposed services. Because the system is intended to be accessed by external actors, it is intended for identifiers which can be used meaningfully across systems, via some network, rather than identifiers used only within a single system. So while an identifier system as described could be used to manage program variables, or sequence numbers in a database, such uses would not be optimal. The identifier system described here is properly intended for use with networked identifiers such as URIs.

This description is not constrained to identifier systems with particular qualities, though it may support the implementation of those qualities. In particular, it MAY be used to support identifiers that are:

  • Accountable via the inclusion of provenance and authority metadata with registration of names or associations.

  • Verified via processes to ensure that an identifier continue to be associated with the correct thing, or at least with a consistent class of thing (Validate Identifier), or via processes to ensure that any actions associated with an identifier remain accessible.

  • Persistent via processes that ensure that an identifier’s association data or other metadata is updated promptly when their values change in the world (Update Identifier) and via processes that ensure that the specification of any actions associated with an identifier be updated, when the deployment or configuration of that action changes in the world.

On the other hand, two identifier qualities are assumed for identifier systems: identifiers in such systems must be:

  • Unique by enforcing that any name be registered only once in the registry, or any name have only one association registered with it.

  • Shared by allowing identifiers to be used (at minimum resolved) by external actors.

The identifier system described also allows several contexts to be managed in the one system, each with its own policy and access profiles. These may be presented to external actors as distinct concrete contexts, through providing different identifiers for each managed context. (For instance, the same system managed Handles of the form hdl:102.100.6/… and hdl:1159.1/… ) Alternatively, these may be presented as a single context, by having a common identifier for all managed contexts. (For instance, all PURLs managed in the system are of the form http://purl.org/lib/... , but of those PURLs some belong to a humanities context and some to a sciences context.) In either case, those managed contexts are still subcontexts of the overall context defined by the identifier management system. This means that the policy and access profiles of the subcontexts cannot conflict with those defined for the overall system.

The identifier system is not applicable unless the contexts it manages are unique within the identifier system. The system cannot successfully associate identifiers with one of two identical contexts.

The identifier system described in this SUM supports resolution of identifiers (and in that regard at least, actioning of identifiers). The Resolve genre is part of the essential functionality of the identifier system, and there is an expectation that any identifier have some association data registered with it that can be accessed. However the requirement for resolvability does not extend any further than the well-ordered retrieval of association data. There is no requirement for instance that the resolution enable access to a digital representation of the thing associated with the identifier, or that the resolution data is of a form that can be actioned through Internet protocols (e.g. a URL). A prose description of the thing still counts as resolution under this definition.

Moreover, since the identifier management system describes services which interact with identifiers as digital objects, the SUM only supports identifiers that are Internet Citable. This constraint is not as restrictive as the constraint that identifiers be web citable (which this SUM does not impose); and we are not aware of any identifier systems which meaningfully fail this requirement.

The SUM is only applicable to identifiers that are registered. Registration may place constraints on an implementation and policies for an identifier management system. This SUM is not applicable if the association data used in the identifiers is so time-unstable that a reasonable registration process would be unable to keep up with changes in the name values. An example of this would be if the coordinates of a randomly moving object were used to identify that object, rather than any more time-stable attributes or an arbitrary name.

At any rate, if an attribute derived from an object at runtime is used as a name, then independently registering that name is redundant.

8 Functionality

The following low-level system functions are provided by the SUM to realise the high-level business processes described above.

In this model of domain functionality, labels, names and identifiers are conceptually treated as the same identifier object by the registry. A name is an identifier object without association data, and a label is an identifier object without association or context data (and by definition not registered, since the act of registration intrinsically provides a context). Contexts are represented by distinct context objects, in a potentially distinct context registry.

8.1 Restrictions on choreography of functionality

Not all functions can be applied to all identifiers at all times: identifiers change state depending on what functions have been applied to them, which constrains the possible functions that can be applied next. As a result, functionality can only apply to an identifier in a certain sequence. Identifiers have three possible states:

  • Identifiers are Registered after the Register identifier function.

    • Identifiers leave Registered state when they are deleted.

  • Identifiers are Published after a Publish event takes place. Publish is Provisioning rather than Domain functionality, since it enables access to the identifier through the authorisation policies of the identifier management system. External actors only have access to Published identifiers, through Actioning functions—which cannot change identifier state.

    • Identifiers leave Published state and go back to being Registered when they are unpublished.

  • Identifiers may become Invalid after validation, which requires either an Update or Deletion for the identifier. Validation may instead find that the identifier is valid, in which case there is no change to its state

The following State UML diagram captures the expected sequence of functions for an identifier, and how they affect identifier state. The functions line up straightforwardly with the business processes.

graphics5

8.2 Domain Functionality: Provisioning Identifier Contexts

Register Context Object: Register a context object in the context registry. This object describes a context to be managed through the identifier management system. The object includes an identifier for the context, an identifier for the context owners and/or managers, and a representation of the policies that the particular context imposes.
Basic workflow steps (without error handling) are:

  • Provide the representations of the policies of the context.

  • Provide the identifiers of the owners or managers of the context.

  • Provide the identifier of the context to register.

  • Provide the identifiers of the business rules and data structures specific to the context.

  • Authenticate and authorize (via access controls) the request.

  • Validate the request.

    • This includes validating that the identifier is unique within the context registry, and does not refer to a pre-existing context. We do not make that requirement for identifiers in general, but we do make it for the context registry.

  • Validate the name against the applicable system data structures for the context registry.

  • Create a context object, populated with the supplied data.

  • Add any authority metadata to the context object required by business rules, and following conventions given in system data structures.

  • Register the identifier object in the context registry.

  • Log the activity.

  • Return status results.

Business Process: Register context

Update Context Object: Change an attribute of a context object in the context registry. This allows any attribute to be changed. The user is not restricted by the system design from changing the name value (“patching the identifier”), which occasionally proves necessary if the identifier name is meaningful and an error in its semantics has been made. Typically business rules specific to the identifier management system will prevent such a change.
Basic workflow steps (without error handling) are:

  • Provide the context name, and the values to update (or populate).

  • Authenticate and authorize (via access controls) the request.

  • Validate the request.

  • Validate the values to be updated against the applicable system data structures.

  • Populate the context object attributes with the new values, overwriting any old values.

    • Old values may have been left blank.

    • Both the old and the new values may be aggregates. This will be the case for the policy profile.

  • Add any authority metadata to the context object required by business rules, and following conventions given in system data structures.

  • Update the context object in the context registry.

  • Log the activity.

  • Return status results.

Business Process: Update context

Delete Context Object: Cease to provide access to a context object from the context registry. Business rules may constrain when this can happen, if at all; for example, the identifier system may be persistent, in which case contexts are never deleted (during the time span of the persistence). At a minimum, a context should not be deleted if the identifier management system is still managing identifiers that are in that context: identifiers should not be “orphaned”. Those identifiers need to be reassigned to another context first, or eliminated.

Despite the name of the system function, the function need not be implemented through actually deleting the context object, as opposed to ceasing to provide access to it. Whether context objects should actually be deleted to realise this function depends on identifier system policy.
Basic workflow steps (without error handling) are:

  • Provide the request to delete.

  • Authenticate and authorize (via access controls) the request.

  • Validate the request against business rules and system data structures.

  • Cease to provide access to (or Delete) all identifier objects associated with the context object.

  • Cease to provide access to (or Delete) the context object in the context registry.

  • Log the activity.

  • Return status results.

Business Process: Delete context

8.3 Domain Functionality: Provisioning Identifiers

Create Label: Create a label, following a given context’s label policy. The label is a candidate for registration in an identifier object, as part of a name in the given context. This function makes no guarantee of uniqueness; uniqueness can only be policed through name registration for the given context.

Label creation may be stateful (e.g. using sequence numbers); maintaining state requires a distinct name state data source. Examples of labels generated by a system include:

  • Sequence numbers (which reflect the internal state of a counter, unrelated to the thing being named)

  • Random numbers, or strings based on random numbers (including: string encodings of random numbers; or dictionary lookups indexed on random numbers)

  • Timestamps, or strings based on timestamps

  • Requester environment variables (e.g. Ethernet address of requester computer)

  • Any combination of the above.

Basic workflow steps (without error handling) are:

  • Provide a request to create a label. The request may specify a strategy or convention to generate the label. The request may specify a context for the label.

  • Validate the request.

  • Create the label.

  • Validate the label against the applicable system data structures for the identifier registry, and against the label policy for the specified context (if any).

  • Log the activity.

  • Return label.

Business Process: Register name, Register identifier

Register Identifier Object: Register an identifier object in the identifier registry. This system function is applicable if an identifier is to be registered, containing both a name and association data, and both are available. However, this SUM models registered names as identifier objects with their association data unpopulated. As a result, this system function can also be used to realise the Register Name business function, with the same service end point and registry as for registering identifiers proper.
Basic workflow steps (without error handling) are:

  • Provide the name to register—i.e. a label, and an identifier for the context the label is in.

  • Provide any other data to be registered in the identifier object. This can include: association data (for resolution), thing data (other metadata about the thing), and identifier data (e.g. authority metadata).

  • Authenticate and authorize (via access controls) the request.

  • Validate the request.

  • Validate the label against the applicable system data structures for the identifier registry and the given context.

  • Create an identifier object, with its name value populated by the given name.

  • If registering an identifier and not just a name:

    • Validate that the association information meets the identifier system requirements.

    • Populate the identifier object association with the given association data, if available.

  • Add any authority metadata to the identifier object required by business rules, and following conventions given in system data structures and policies in the given context.

  • Register the identifier object in the identifier registry.

  • Log the activity.

  • Return status results.

Business Process: Register name, Register identifier

Update Identifier Object: Change an attribute of an identifier object in the identifier registry. This allows any attribute to be changed, including incidental metadata (added for Augment Identifier Information); the association data; the label value, and the context value. If the identifier object has been created without association data (i.e., registered as a name and not as an identifier proper), this functionality is used to provide a value to populate the association data (Register Association). The user is not restricted by the system design from changing the name (label, context) values (“patching the identifier”), which occasionally proves necessary if the identifier name is meaningful and an error in its semantics has been made. Typically business rules specific to the identifier management system will prevent such a change.

System functionality makes no distinction between the business events of Augment Identifier Information, Update Identifier, and Register Association: all are modelled as changing the values of attributes of an identifier object. The functionality allows an attribute to have multiple values, as an aggregate (cf. Multiply Resolve service genre): adding a value without overwriting an existing value can be modelled as overwriting the existing value with an aggregate of the old and new values.
Basic workflow steps (without error handling) are:

  • Provide the identifier name, and the values to update (or populate).

    • If the identifier system does not impose name uniqueness, some other mechanism may be necessary to identify the specific identifier object to update.

  • Authenticate and authorize (via access controls) the request.

  • Validate the request.

  • Validate the values to be updated against the applicable system data structures.

  • Populate the identifier object attributes with the new values, overwriting any old values.

    • Old values may have been left blank.

    • Both the old and the new values may be aggregates.

  • Add any authority metadata to the identifier object required by business rules, and following conventions given in system data structures.

  • Update the identifier object in the identifier registry.

  • Log the activity.

  • Return status results.

Business Process: Register association, Update identifier, Augment Identifier Information

Delete Identifier Object: Cease to provide access to an identifier object from the identifier registry. Business rules may constrain when this can happen, if at all; for example, the identifier may need to have already been unpublished (not resolvable); or the identifier system may be persistent, in which case identifiers are never deleted.
Basic workflow steps (without error handling) are:

  • Provide the request to delete.

  • Authenticate and authorize (via access controls) the request.

  • Validate the request against business rules and system data structures.

  • Cease to provide access to (or Delete) the identifier object in the identifier registry.

  • Log the activity.

  • Return status results.

Business Process: Delete Identifier

8.4 Domain Functionality: Maintain Identifiers

Verify Identifier Object: Verify that information is being maintained in the system for that identifier, and that the data stored is valid.

This functionality verifies that one or more attributes or qualities of an identifier object in the identifier registry are correct, for any attribute that admits verification. (Existence counts as a quality, so verifying for existence confirms that the given identifier is being maintained in the identifier management system.) The method, the success conditions, and the scope of verification are domain- and business-specific.

The most common identifier object verification is that it remains resolvable (“link rot checker”), and that it is resolvable to the correct thing. If the thing identified is stored on a content data source, verification involves access to that data source. Correct resolution is a stronger condition than just resolvability, and may rely on such mechanisms as watermarking and message digesting to confirm correctness: since the thing is managed separately from the identifier, the identifier management system has no direct way of knowing that a different object has not been swapped for the thing the identifier is supposed to identify.

Basic workflow steps (without error handling) are:

  • Provide the identifier name to verify.

  • Provide the attributes and qualities to be verified.

  • Authenticate and authorize (via access controls) the request (if necessary).

  • Validate the request.

  • Confirm that the system allows verification for the nominated attributes and qualities.

  • For each nominated attribute or quality of the identifier object:

    • Retrieve the current value of the attribute or quality.

    • Determine whether the value is valid for the given identifier.

  • Log the activity.

  • Return status results.

Business Process: Maintain Identifier

Query Identifier Object: Retrieve some or all information the identifier management system holds relating to a given identifier. In the architecture proposed in this SUM, this may involve the identifier registry, the context registry, and the activity log, but the functionality applies to any data sources that the identifier management system writes data to, that is keyed to identifier names, and that the identifier system manager decides should be queryable.

The need for this function arises because information associated with the identifier is not restricted to the data added to the registry. Some information is instead keyed to the identifier through external data sources. The boundary between the identifier registry and other data sources is a design decision, and depends on identifier system policy.
Basic workflow steps (without error handling) are:

  • Provide the identifier name to query.

  • Authenticate and authorize (via access controls) the request.

  • Validate the request.

  • For each applicable data source that the identifier management system uses:

    • Retrieve all data keyed to the identifier name.

  • Filter out the information the user actually requires from the query.

  • Log the activity.

  • Return the retrieved data.

Business Process: Maintain Identifier

Verify Context Object: Verify that one or more attributes or qualities of a context object in the context registry are correct, for any attribute that admits verification. (Existence counts as a quality, so verifying for existence confirms that the given identifier is being maintained in the identifier management system.) The method, the success conditions, and the scope of verification are domain- and business-specific. The typical attributes to verify are the ownership of the context, its existence, and the continuing policing of the individual context policies through the identifier management system.
Basic workflow steps (without error handling) are:

  • Provide the context name to verify.

  • Provide the attributes and qualities to be verified.

  • Authenticate and authorize (via access controls) the request.

  • Validate the request.

  • Confirm that the system allows verification for the nominated attributes and qualities.

  • For each nominated attribute or quality of the context object:

    • Retrieve the current value of the attribute or quality.

    • Determine whether the value is valid for the given context.

  • Log the activity.

  • Return status results.

Business Process: Verify Context

Query Context Object: Retrieve all information the identifier management system holds relating to a given context. In the architecture proposed in this SUM, this may involve the context registry and the activity log, but the functionality applies to any data sources that the identifier management system writes data to, and that is keyed to context names, and that the identifier system manager decides should be queryable. For instance, a context object query could return all identifiers managed by the identifier management system for that context.
Basic workflow steps (without error handling) are:

  • Provide the context identifier to query.

  • Authenticate and authorize (via access controls) the request.

  • Validate the request.

  • For each applicable data source that the identifier management system uses:

    • Retrieve all data keyed to the context name.

  • Log the activity.

  • Return the retrieved data.

Business Process: Query Context

8.5 Domain Functionality: Actioning Identifiers

As already seen, identifier usage processes are realised through functions of both the identifier management system, and external systems. The main instance of those functions is Resolve, provided by the identifier management system, and Retrieve, often provided by an external system (but included in this description).

Different requirements of information from the identifier management system mean that several identifier lookup functions are possible; we discuss the following:

  • Resolve Identifier: look up a unique piece of association data

  • Multiply Resolve identifier: look up a range of association data

  • Metadata Search on Identifier: look up an identifier given metadata registered with the identifier

Resolve Identifier Object: As noted, Resolve functionality is typically invoked by an external actor in order to obtain an object identified by an identifier. To that end, Resolve provides the external actor with association data from the identifier management system, describing how to access the thing identified. For example, a UUID may be mapped by Resolve to a URL; the UUID is the identifier for the thing, and the URL describes how to access the thing. Resolution is distinct from Query, which can retrieve any data associated with the identifier in the system, and which is not typically configured to be consumed by another system.

The resolve function retrieves and returns from the identifier registry a single instance of association data registered with the identifier. The decision on which instance to retrieve when more than one instance is available is determined by business rules. The decision on which attribute or attributes to resolve to (i.e., to consider as association data) is captured in the system data structures and the context object.
Basic workflow steps (without error handling) are:

  • Provide the identifier name to resolve.

  • Authenticate and authorize (via access controls) the request (if applicable).

  • Validate the request.

  • Retrieve one association data instance for the identifier keyed to the name from the identifier registry, according to business rules.

  • Log the activity.

  • Return the association data.

Business Process: Deliver Content from Identifier

Multiply Resolve Identifier Object: Often there is only one piece of association data per identifier available (e.g. a URL), and only the Resolve function is applicable. But there can be multiple registered pieces of association data, describing different ways of accessing the same thing. (This is especially common if the thing identified is abstract, and the association data provides access to different instances of the thing identified.) The Multiply Resolve function returns all instances of association data. This is used in obtain processes which select one of the available ways of accessing the thing as optimal (an appropriate copy service), given parameters such as location or object format.

Retrieve and return from the identifier registry all instances of association data registered with the identifier. The decision on which attribute or attributes to resolve to (i.e. to consider as association data) is captured in the system data structures and the context object.
Basic workflow steps (without error handling) are:

  • Provide the identifier name to resolve.

  • Authenticate and authorize (via access controls) the request (if applicable).

  • Validate the request.

  • Retrieve all association data instances for the identifier keyed to the name from the identifier registry.

  • Log the activity.

  • Return the association data instances as an aggregate object.

Business Process: Deliver Content from Identifier

Metadata Search on Identifier: Given a piece of metadata, return the identifier registered with that metadata. This function can be used in conjunction with Resolve and Retrieve, to obtain an object given metadata about the object (which the identifier management system is aware of). This constitutes a variant workflow for Obtain Object which is closer to how users actually go about this:

  • Provide the metadata field (e.g. title)

  • Search the identifier registry for identifiers registered with that metadata

  • Return identifiers matching the search field

  • Select an identifier out of those returned (if there is more than one)

  • Resolve the selected identifier to its resolution data

  • Retrieve the thing identified given the resolution data

Search functionality is usually provided by external systems instead of identifier management systems; but it is possible for this functionality to reside in some identifier management systems (e.g. Handle system with a relational database backend).
Basic workflow steps (without error handling) are:

  • Provide the metadata field to search.

  • Authenticate and authorize (via access controls) the request (if applicable).

  • Validate the request.

  • Retrieve all identifiers with metadata fields matching the search field.

  • Log the activity.

  • Return the matching identifiers.

Business Process: Deliver Content from Identifier

Retrieve Object: The Retrieve function takes a retrieval key, and returns from the content data source a presentation of the thing corresponding to the retrieval key. For Resolve and Retrieve to be integrated, the association data returned by Resolve should be a retrieval key which Retrieve can process.

Resolve and Retrieve are often integrated seamlessly, so users often have trouble differentiating the two. Moreover, an external user can invoke Retrieve directly with a retrieval key, bypassing any Resolve function: since from the user’s viewpoint Retrieve and Resolve + Retrieve have the same result, they conflate identifiers and retrieval keys as having the same function. (As a result, retrieval keys are typically identifiers themselves, and Resolve involves mapping one kind of identifier to another—e.g. ARK URL, as persistent identifier, to local URL, as retrieval key.)
Basic workflow steps (without error handling) are:

  • Provide the retrieval key to retrieve data with.

  • Authenticate and authorize (via access controls) the request (if applicable).

  • Validate the request.

  • Retrieve the object.

  • Transform the object for presentation. (Optional)

  • Log the activity.

  • Return the object.

Business Process: Deliver Content from Identifier

9 Structure & Arrangement

The functionality outlined under Domain Functionality is realised through services as follows:

Register Context Object:
End Point: Register context object{context registry}
Supporting Service Genre(s): Identity::authenticate {agent}, Identity::authorize {access control policy/authorization data}, activity log {workflow log}
Supporting Service Usage Model(s): identity
Primary Resource: context registry
Secondary Resource(s): system data structures, business rules, workflow log

Update Context Object:
End Point: Update context object{context registry}
Supporting Service Genre(s): Identity::authenticate {agent}, Identity::authorize {access control policy/authorization data}, activity log {workflow log}
Supporting Service Usage Model(s): identity
Primary Resource: context registry
Secondary Resource(s): system data structures, business rules, workflow log

Delete Context Object:
End Point: Delete context object{context registry}
Supporting Service Genre(s): Identity::authenticate {agent}, Identity::authorize {access control policy/authorization data}, activity log {workflow log}
Supporting Service Usage Model(s): identity
Primary Resource: context registry
Secondary Resource(s): system data structures, business rules, workflow log

Create Label:
End Point: Create Label{}
Supporting Service Genre(s): activity log {workflow log}
Supporting Service Usage Model(s):
Primary Resource: name state [optional]
Secondary Resource(s): system data structures, workflow log, context registry

Register Identifier Object:
End Point: Register identifier object{identifier registry}
Supporting Service Genre(s): Identity::authenticate {agent}, Identity::authorize {access control policy/authorization data}, activity log {workflow log}
Supporting Service Usage Model(s): identity
Primary Resource: identifier registry
Secondary Resource(s): system data structures, business rules, workflow log, context registry

Update Identifier Object:
End Point: Update identifier object{identifier registry}
Supporting Service Genre(s): Identity::authenticate {agent}, Identity::authorize {access control policy/authorization data}, activity log {workflow log}
Supporting Service Usage Model(s): identity
Primary Resource: identifier registry
Secondary Resource(s): system data structures, business rules, workflow log, context registry

Delete Identifier Object:
End Point: Delete identifier object{identifier registry}
Supporting Service Genre(s): Identity::authenticate {agent}, Identity::authorize {access control policy/authorization data}, activity log {workflow log}
Supporting Service Usage Model(s): identity
Primary Resource: identifier registry
Secondary Resource(s): system data structures, business rules, workflow log, context registry

Verify Identifier Object:
End Point: Verify identifier object{identifier registry}
Supporting Service Genre(s): Identity::authenticate {agent}, Identity::authorize {access control policy/authorization data}, activity log {workflow log}
Supporting Service Usage Model(s): identity
Primary Resource: identifier registry
Secondary Resource(s): system data structures, business rules, workflow log, context registry, content data source

Query Identifier Object:
End Point: Query identifier object{identifier registry}
Supporting Service Genre(s): Identity::authenticate {agent}, Identity::authorize {access control policy/authorization data}, activity log {workflow log}
Supporting Service Usage Model(s): identity
Primary Resource: identifier registry, workflow log
Secondary Resource(s):

Verify Context Object:
End Point: Verify context object{context registry}
Supporting Service Genre(s): Identity::authenticate {agent}, Identity::authorize {access control policy/authorization data}, activity log {workflow log}
Supporting Service Usage Model(s): identity
Primary Resource: context registry
Secondary Resource(s): system data structures, business rules, workflow log

Query Context Object:
End Point: Query context object{context registry}
Supporting Service Genre(s): Identity::authenticate {agent}, Identity::authorize {access control policy/authorization data}, activity log {workflow log}
Supporting Service Usage Model(s): identity

Primary Resource: context registry, workflow log
Secondary Resource(s): identifier registry

Resolve Identifier Object:
End Point: resolve{identifier registry}
Supporting Service Genre(s): Identity::authenticate {agent}, Identity::authorize {access control policy/authorization data}, activity log {workflow log}
Supporting Service Usage Model(s): identity
Primary Resource: identifier registry, workflow log
Secondary Resource(s): business rules, system data structure, context registry

Multiply Resolve Identifier Object:
End Point: multiply resolve{identifier registry}
Supporting Service Genre(s): Identity::authenticate {agent}, Identity::authorize {access control policy/authorization data}, activity log {workflow log}
Supporting Service Usage Model(s): identity
Primary Resource: identifier registry, workflow log
Secondary Resource(s): system data structure, context registry

Retrieve Identifier Object:
End Point: Retrieve Identifier Object{identifier registry}
Supporting Service Genre(s): Identity::authenticate {agent}, Identity::authorize {access control policy/authorization data}, activity log {workflow log}, filter {identifier object}
Supporting Service Usage Model(s): identity
Primary Resource: identifier registry, workflow log
Secondary Resource(s): system data structure, context registry

Retrieve Content:
End Point: Retrieve{content data registry}
Supporting Service Genre(s): Identity::authenticate {agent}, Identity::authorize {access control policy/authorization data}, activity log {workflow log}, filter {identifier object}
Supporting Service Usage Model(s): identity
Primary Resource: content data source
Secondary Resource(s): —

Once identifiers are registered in a data source, they become digital objects, and the services that this SUM provides are a special case of services interacting with digital objects in a single data source. This SUM therefore describes the services which act as the identifier registry and context registry interfaces. The other data sources used in the SUM are ancillary: they provide authentication, validation and authority for the digital objects going in and out of the registry.

The services outlined do not define an overall application. The domain-specific identifier provisioning functions may be consumed independently of any application, and integrated into other systems’ workflows. Operations may require transactional- and session-based controls, particularly for identifier provisioning functions run in batch mode.

Because the identifier registry and context registry are structurally like any other data source, the services interfacing with the identifier registry to provision and manage it, as defined above, are generic to any data source, and are not described any further here. Only the domain-specific functionality of the SUM is described: registering, updating, maintaining and verifying identifiers.

The following UML interaction diagram summarises the interaction between data sources through services required by the SUM. Authentication and authorisation is not displayed for brevity.

graphics6

The following UML activity diagrams give the choreography of services to realise the functionality of the SUM.

graphics7

Create label. Label generation is not authenticated. Label creation is not logged.

graphics8

Register name. Registration is authenticated. Uniqueness and Authority are provided.

graphics9

Register association. Name is already registered. Uniqueness and Authority are provided.

graphics10

Sample workflow ingesting and identifying an object in an external repository.

graphics11

Create unpublished identifier, given name and association. Registration of identifier in single step.

graphics12

Publishing identifier—namely, provisioning authorisation data source with authorisation for a given user profile to resolve an identifier with a given identifier name.

graphics13

Update Identifier association data: operation triggered from the content repository which stores the thing identified—and which maintains the association data. (i.e., Update [push] request.)

graphics14

Augment Identifier operation.

graphics15

Identifier Resolution, orchestrated with external Obtain operation. The association data retrieved through resolution is assumed to map straightforwardly to a retrieval key appropriate for Obtain (e.g. URL).

graphics16

Multiple Resolution, orchestrated with external (Appropriate Copy) Select and Obtain operations.

graphics17

Retrieve Identifier Data: allow retrieval of arbitrary identifier object attribute by attribute name.

graphics18

Query Identifier: retrieve identifier-related information from identifier registry and logs.

graphics19

Verify Identifier: the association data held for an identifier is validated against the content repository which the association data indicates is pertinent.

10 Applicable Standards

None. No standards are directly applicable to the SUM as a whole.

The service genres that are a part of the SUM shall be defined in terms of applicable standards and specifications. Applicable standards include identifier schemes (Handle, PURL, DOI, URI), Internet transfer protocols (HTTP, HTTPS, Handle), registry description standards (ISO 2146, JSR 170), authentication standards (PKI, SAML) and authorisation descriptions (XACML).

11 Design Decisions & Tradeoffs

The services exposed by this SUM may be consumed by systems geographically distant from the identifier system. The system consuming the services needs to make provision for caching requests, should the identifier system become unavailable on the network. The system consuming the services may request asynchronous execution of services. If possible, the workflow should be arranged so that the external system need not pause for successful status return from the remote identifier system.

The services exposed by this SUM may be consumed at extreme volumes by external systems. The workflow of such systems may likewise need to allow for asynchronous execution, and may need to locally cache service requests, executing them later in batch mode.

The notion of a "curation boundary", discussed in PILIN project policy documentation [1] allows for a delay between object creation and object identification, especially if the identifier is intended for external rather than internal use.

Because identifier system resolution services are subjected to heavy user load, the amount of data throughput should be minimised if possible. As a result, the information contained in the identifier object, and available through the Retrieve Identifier Data service, should be kept light: there should be a small range of information, and the information should be succinct. (This also improves the likelihood of persistence for the identifier object information, if this is a desired quality for identifiers.) Extensive information about the thing that the identifier is associated with should be managed and published separately from the identifier.

Identifier systems are commonly described as being private, shared, or public; the distinction is whether identifiers can be actioned only in the context of a particular system (location, data source, or application), or whether they can be actioned more broadly, which theoretically ranges to any agent with access to the internet. The openness of an identifier system depends on several design choices, which implementers need to consider in light of their business requirements:

  • Whether resolution of identifiers is subject to authorisation or not.

  • Whether access to resolution services from external actors is allowed.

  • Whether resolution of identifiers can be transacted across widely deployed protocols (e.g. HTTP) or application-specific protocols.

  • Whether resolution services are exposed for use outside of fixed applications, or whether resolution can only be transacted within a fixed application.

  • Whether the identifier policies and services are publicly documented.

  • Whether the context (namespace) of the identifier name is made explicit in the identifier system’s chosen identifier encoding.

Note that the openness of an identifier system depends on the accessibility of resolution services. It does not require that Identifier Provisioning services be similarly open-access, or application- or location-independent. That said, there is a related concern for designers on whether access to Identifier Provisioning should be shared across a range of organisations or identifier contexts: in particular, how to prevent collisions in the registering or updating of identifiers.

Identifier universality, whereby a single identifier is used in all contexts to refer to the same thing, is a highly desirable quality. Since universality is unenforcable, however, identifier systems may need to provide the ability to map between each other: an actor may have access to an identifier for a thing from system A, but may require an identifier for the same thing from system B instead. This may be for reasons of openness (the actor has access to resolution services in B and not A); for reasons of precedent (the actor uses systems relying on identifier system B rather than A—this includes community-specific preferences); or for other differences in identifier qualities (e.g. the actor may seek out, or avoid, semantically transparent identifiers). The common strategy of using URLs as association data is such a mapping: URLs are identifiers, albeit semantically transparent and non-persistent; but they have the advantage of being natively actionable through HTTP to content delivery, an attribute other identifier systems often need to exploit.

Mapping between identifiers, and notions of identifier canonicity, are beyond the scope of this SUM. However, if an identifier system also provides mapping to another identifier system, it needs to take steps to ensure that the resolution services for the target identifier system are also available to its users. It must also ensure that its encoding of the other identifier system’s identifiers is explicit enough to prevent confusion: this applies both to the interface for any resolution services, and to the representation of the identifiers’ namespace, which must establish what the target identifier system is.

If identifiers are semantically transparent, the identifier provisioning and verification functionality needs to provide more stringent guarantees that the identifier name accurately match the thing it associates with. This may mean that the publishing of an identifier is delayed, as identifiers are notoriously difficult to "patch" (have their name changed) once they are available outside a well-controlled context (curation boundary).

Allowing logs to be access through a Query Identifier service should not be allowed by default: this can lead to unmanageably large result sets, and a significant performance hit on maintaining logs in a form accessible to queries. Moreover, publishing access logs in particular raises privacy concerns.

12 Implementation Guidance & Dependencies

Identifier systems are expected to be reliable: resolution and other services should be available over a long period of time without interruption. In order to meet this requirement, implementations should not allow single points of failure: the registry should preferably be mirrored across multiple locations, and such factors as load balancing should be incorporated in implementation. If the identifiers are to be persistent, there must be policy provision for the relocation of the system to another host institution, if the identifiers are to outlive the institution’s hosting of the system. Such relocation may include the migration of the identifier objects to a distinct identifier system.

An identifier system may be used to manage a very large number of identifiers (hundreds of millions). Implementations need to be designed to scale in the size of the identifier registry that they support and in providing performance that is predictable over the operational scale of the registry.

The types and range of things that identifiers identify in an identifier management system are determined through an identifier association policy specific to that system. The identifier association policy expresses a particular information model. Any choice of association instantiated through an identifier realises an information model, so the identifier manager should expose their information model publicly, as a way of establishing identifier trustworthiness.

Establishing and ensuring the identity of the thing identified, and the persistence of that identity, is determined through policy, and is contingent on the decisions made in an information model. The choice of association data used to represent the thing identified is also contingent on the identifier association policy. Some choices of association data mean that the value of the registered association data changes, even if the thing they refer to do not; for instance, a URL is used as association data, but the thing identified is not an object instance at a specific location, but an abstract object. If association data is changeable, verifying identity becomes especially difficult. Strategies for ensuring the identity of things identified are outside the scope of this SUM.

Persistence as a quality can apply both to identifier association data, and to the services resolving to that association data. If an identifier system is to provide persistent resolution, not only the identifier registry but the services resolving from the registry must ensure a long-term presence.

To prevent performance degradation of the identifier management system, the system data sources should undergo the types of system management functions common to all registries:

  • Index identifier system

  • Monitor identifier system

  • Inspect identifier system

  • Analyse identifier system

  • Back up identifier system

  • Mirror identifier system

All these functions may be triggered by the identifier system manager at any time. Functions may also be triggered automatically within the identifier system following a schedule, and the identifier system manager is notified if anything needs intervention. Indexing needs to be based on identifier names, since identifier associations will be looked up in the identifier registry by identifier name. Other information stored in the identifier system might also be indexed for lookup (e.g. to provide metadata-based discovery of identifiers), although this is lower priority functionality. Since the identifier system is expected to have high network throughput, it is desirable to distribute the delivery points for identifier system lookup operations across the network; this can be achieved through provisioning and maintaining mirroring for the system.

System Management functions are initiated in order to maintain the identifier management system: they do not relate to individual identifiers, and do not fulfil a distinct business goal (other than the trivial goal of “make the identifier system work”). Moreover, such functionality is not specific to identifier systems, but occurs whenever a data source is available online for discovery requests. As a result, these functions are not documented any further in this SUM, and should be detailed in a separate Repository Manager SUM.

13 Known Uses

Identifier services such as are outlined in this SUM, or in service expression instantiations of it, are expected to be widely used in other SUMs. Instances known as of this writing include:

  • Repository Federation

  • OpenURL + Handle Appropriate Copy SUM

  • JADL Integrated Prototype Architecture SUM

  • D3UI Content Authoring SUM

  • D3UI Discovery-to-Delivery SUM

In practice, any data source uses some kind of identifier system, although though not all data sources formalise it and expose it in the way described here.

14 Data Sources Used

  • Identifier Registry. The Identifier Registry contains the object identifiers and of metadata permitting identification of the identifier referent.

  • Context Registry. The Context Registry contains the contexts being managed in the identifier management system, along with identification of the ownership and management of the context. It also contains identification of the policies to be applied for a given context by the system.

  • Authentication Data. This data source contains data through which access to the identifier system is authenticated.

  • Authorisation Data. This data source contains data through which access to the identifier system is authorised.

  • Business Rules. This data source contains the business rules applied in the creation and updating of identifiers.

  • System Data Structures. This data source contains the system data structures (e.g. schemata, vocabularies) applied in the creation, updating, and interpretation of identifier objects.

  • Log. Log of transactions on the identifier system

  • Name state. Record any stateful information required for the generation of names. Distinct from name registration, which is transacted only on the identifier registry.

  • Context data source. The data source(s) on which things identified are stored; is the target of retrieval services.

15 Related SUMs

  • Identity. The Identity service usage model is an integrated set of service genres that provides the entire user identity infrastructure used by this SUM. Functionality includes the creation of user rights and roles, user authentication and user authorization. Note that authentication and authorization are combined within a single service usage model.

  • Repository Manager. The Repository Manager service usage model is an integrated set of service genres that provides the entire infrastructure used by this SUM to support management of the identifier system. It provides functionality to manage a repository, rather than managing individual objects contained within a repository (e.g. individual identifier objects). The functionality it provides includes indexing, monitoring, analysing, backing up, and mirroring a repository. The Repository Manager Service Usage Model is used by this SUM to manage user identity infrastructure (Repository Manager {Identity}), and identifier infrastructure (Repository Manager {Identifier}).

  • Repository Provisioning. The Repository Provisioning service usage model is an integrated set of service genres that provision repositories with the infrastructure needed to enable their interaction with external users, as opposed to the content that those users access. This includes provisioning users, authentication, authorisation policies, business rules, and system data structures. The Repository Provisioning Service Usage Model is used by this SUM to provision identity infrastructure (Repository Provisioning {Identity}) and identifier infrastructure (Repository Provisioning {Identifier}).

  • Browse. The browse service usage model provides an integrated set of service genres used to develop browser-based interfaces for the identifier registry resources.

16 Services Used

16.1 Identifier-Specific Services

  • Register Context Object. Create a context object with at least the name attribute populated, and deposit it in a context registry. Arguably a special case of the deposit service genre.

  • Update Context Object. Update the attributes of a context object in a context registry. A special case of the update and the augment service genres.

  • Register Identifier Object. Create an identifier object with at least the name attribute populated, and deposit it in an identifier registry. Arguably a special case of the deposit service genre.

  • Update Identifier. Update the attributes of an identifier object in an identifier registry. A special case of the update and the augment service genres.

  • Create Label. Create a label that can be used as a candidate for a name, meeting certain policy guidelines set by the contexts to contain the label, and a given algorithm type.

  • Resolve. Given an identifier name return a single instance of a given type of information about the thing that the identifier is associated with.

  • Multiply Resolve. Given an identifier name return all available instances of a given type of information about the thing that the identifier is associated with.

  • Retrieve Identifier Data. Given an identifier name return all accessible information contained in the identifier object stored on the identifier registry.

  • Verify Identifier. Confirm that a particular piece of information contained in an identifier object is accurate.

  • Query Identifier. Retrieve from an identifier management system all available information related to a given identifier.

  • Delete Context Object. Delete a context object from a context registry. Arguably a special case of the delete service genre.

  • Delete Identifier. Delete an identifier object from an identifier registry. Arguably a special case of the delete service genre.

16.2 Generic Data Source Services

  • Provision. Add and manage data and settings to control overall registry behaviour.

  • Workflow Log. Log an activity or action in a log object.

  • Filter. Filter (via provided data) the object (or set of objects) as part of a workflow.

  • Validate. Validate the representation of an object.

  • Retrieve. Deliver an object given a retrieval key.

16.3 Services from external Service Usage Models

  • Identity

    • authenticate. Authenticate a party (including a service acting on behalf of a user) using information from the authentication data store.

    • authorise. Authorize a user (or a service on behalf of a user) to perform a particular behaviour or access a particular resource using business rules and policies from a nominated data store.

  • Browse

    • browse. Retrieve an object (or a set of objects) from a registry to enable browsing of the registry.

  • Repository Manager

    • analyze. Analyze the components of an identifier system.

    • backup. Backup the components of an identifier system..

    • monitor. Monitor the behaviour and performance of an identifier system. Query for performance and behaviour data results.

    • index. Index or reindex the components of an identifier system. to support discovery from the registry.

    • inspect. Inspect the components of an identifier system.. Query for registry settings and values.

    • mirror. Mirror an identifier system to a second system.

17 CORE SUMs Used

18 References

[1] PILIN Ontology for identifiers and identifier services. http://resolver.net.au/hdl/102.100.272/G9JR4TLQH

hdl: 102.100.272/G9JR4TLQH

[2] PILIN Community Requirements. http://www.pilin.net.au/Project_Documents/Community_Requirements/Requirements.htm

19 Glossary

See PILIN Glossary, http://resolver.net.au/hdl/102.100.272/HHYMV8JQH

Copyright © Monash University

graphics20 

This work is licensed under the Creative Commons Attribution-Share Alike 2.5 Australia License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.5/au/

This work was created as part of the PILIN project. The PILIN project is funded by the Australian Commonwealth Department of Education, Science and Training, (DEST) under the Systemic Infrastructure Initiative (SII) as part of the Commonwealth Government’s Backing Australia’s Ability – An Innovation Action Plan for the Future (BAA) under the ARROW Project.