Introduction
Purpose
This specification defines the Open Science Archive (OSA) protocol—an open, interoperable standard for the deposition, validation, curation, publication, and discovery of scientific data.
OSA is designed to enable multiple implementations while ensuring compatibility through:
- Standardized wire formats for data exchange
- Well-defined behavioral contracts for actors in the system
- Pluggable validation and curation via OCI containers
- Federated registries for sharing reusable components
Motivation
Scientific data infrastructure follows a common pattern. Successful platforms like the Protein Data Bank (PDB), UniProt, Gene Expression Omnibus (GEO), and services at EMBL-EBI all implement the same core workflow: structured deposition, automated validation, expert curation, and programmatic access.
Despite this shared pattern, each new scientific domain rebuilds this infrastructure from scratch. This fragmentation results in:
- Duplicated effort: Generic pipeline logic is reimplemented rather than reused
- Inconsistent quality: Ad-hoc validation rules vary wildly across repositories
- Poor interoperability: Custom APIs prevent unified tooling and federated access
- High barriers: Emerging fields lack resources to build “PDB-quality” infrastructure
The OSA protocol addresses this by separating infrastructure from domain logic. The protocol defines how data flows through deposition, validation, curation, and publication—the universal “shape” of scientific data management. Domain-specific rules (what makes a protein structure valid vs. a materials dataset) are injected as pluggable components, not hard-coded into the platform.
This separation enables:
- Reusable implementations: A reference implementation serves as “PDB-in-a-box” for any domain
- Shared tooling: A dataset browser built for biology works immediately in physics
- Quality transparency: Machine-readable SemanticGuarantees let consumers filter by verified properties, not just file types
- Institutional flexibility: Existing platforms can expose data via OSA adapters without migration
By standardizing the infrastructure layer, OSA allows scientific domains to focus resources on what matters: their specific validation rules, curation workflows, and discovery interfaces—not rebuilding basic data pipelines.
Scope
This specification is implementation-agnostic. It defines what must be observable over the network, not how systems are organized internally.
In Scope
- Protocol resources: Structure and semantics of Depositions, Records, Profiles, and Tools
- State machines: Valid state transitions and invariants
- API contracts: HTTP endpoints and request/response formats
- Execution contracts: OCI container interfaces for Validators and Curation Tools
- Registry protocols: Discovery and versioning of shared components
Out of Scope
- Internal architecture: How implementations organize code, services, or databases
- Storage mechanisms: Whether data is stored on S3, local disk, or elsewhere
- Identity providers: OSA assumes external OIDC-compatible authentication
- Performance characteristics: Caching strategies, indexing approaches, etc.
- Domain-specific validators: The protocol defines how to package validators, not what they check
Conformance Language
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
Audience
This specification is intended for:
- Implementers building OSA-compliant ArchiveNodes or ViewNodes
- Tool developers creating Validators or Curation Tools
- Client developers building applications that interact with OSA nodes
- Governance bodies establishing policies for OSA ecosystems
Architecture
Actors
The OSA protocol defines five types of actors:
ArchiveNode : A service that accepts data submissions, orchestrates validation and curation, and publishes immutable Records. The primary write-path actor.
ViewNode : A read-only service that indexes and presents Records from one or more ArchiveNodes, enabling search and discovery.
Validator : An OCI container that performs automated quality checks on datasets, testing them against SemanticGuarantees.
Curation Tool : An interactive OCI container that provides web-based interfaces for human reviewers to inspect and modify datasets.
Client : Any application (web app, CLI, library) that interacts with ArchiveNodes or ViewNodes to submit or retrieve data.
High-Level Flow
A typical dataset journey through OSA involves:
- Submission: A Client creates a Deposition on an ArchiveNode, uploads files, and submits for review.
- Validation: The ArchiveNode executes Validators (defined by the SubmissionProfile) to test the data against SemanticGuarantees.
- Curation: A human curator uses a Curation Tool (proxied by the ArchiveNode) to review, annotate, or fix issues.
- Publication: Once approved, the ArchiveNode creates an immutable Record with a permanent identifier.
- Discovery: ViewNodes index Records from multiple ArchiveNodes, enabling federated search.
┌────────┐
│ Client │
└────┬───┘
│ (1) Create Deposition
▼
┌─────────────┐
│ ArchiveNode │◄──(2) Run Validators
└─────────────┘
│
│ (3) Proxy Curation Tool
▼
┌──────────────┐
│ Curator │
└──────────────┘
│
│ (4) Approve → Create Record
▼
┌──────────┐ (5) Index
│ ViewNode │◄─────────────┐
└──────────┘ │
▲ │
│ │
└────────────────────┘
(6) Search/Discover
Key Concepts
Separation of Write and Read Paths : ArchiveNodes handle mutable, workflow-heavy submissions. ViewNodes handle immutable, query-optimized discovery. This separation enables specialized implementations for each concern.
Pluggable Domain Logic : Instead of hard-coding validation rules or curation interfaces, OSA uses OCI containers that can be developed, versioned, and shared independently.
Registry-Based Discovery : Schemas, Validators, and Curation Tools are registered with SRNs, enabling reuse across archives and federated governance.
Provenance by Default : Every Record links back to its source Deposition, the Validators that checked it, and the curator who approved it.
Terminology
Resources
Deposition : A mutable, in-progress dataset submission. The primary resource on the write path. Transitions through states (DRAFT, SUBMITTED, UNDER_REVIEW, APPROVED) before becoming a Record.
Record : An immutable, versioned, published dataset. The primary resource on the read path. Created from an approved Deposition.
SubmissionProfile : A template that defines requirements for a type of submission, including the Schema, required SemanticGuarantees, and available Curation Tools.
SemanticGuarantee : A verifiable assertion about data quality or correctness (e.g., “All dates are ISO 8601”, “No missing required fields”). Enforced by a Validator.
ValidationRun : The result of executing a Validator against a Deposition. Contains pass/fail status and diagnostic messages.
Executables
Validator : An OCI container that takes a dataset as input and produces a pass/fail result, testing conformance to a SemanticGuarantee.
Curation Tool : An OCI container that exposes a web interface for human inspection and modification of Depositions.
CurationToolRef : A registry entry describing a Curation Tool: its OCI image, exposed port, and capabilities (read-only vs read-write).
Infrastructure
Structured Resource Name (SRN)
: A URN-based, globally unique, location-independent identifier for any OSA resource (e.g., urn:osa:example-archive:rec:abc123@v1).
Registry : A versioned, append-only collection of Schemas, SemanticGuarantees, CurationToolRefs, and SubmissionProfiles. Enables discovery and reuse.
NodeID : A globally unique identifier for an ArchiveNode, used as the authority component in SRNs.
Resources
This section specifies what resources exist in the OSA protocol and what properties they MUST have. Implementations MAY add additional properties but MUST include all required fields.
Deposition
A Deposition represents a dataset in progress. It is mutable until submitted for validation.
Required Properties
| Property | Type | Description |
|---|---|---|
srn | SRN | Unique identifier (type: dep) |
status | string | Current state: DRAFT, SUBMITTED, UNDER_REVIEW, or APPROVED |
profile | SRN | The SubmissionProfile this Deposition targets |
metadata | object | User-provided descriptive metadata (structure defined by Profile’s Schema) |
files | array | List of file objects (see below) |
created_at | ISO 8601 datetime | Timestamp of creation |
updated_at | ISO 8601 datetime | Timestamp of last modification |
File Object Structure
Each entry in files MUST include:
| Property | Type | Description |
|---|---|---|
name | string | Filename |
size | integer | Size in bytes |
checksum | string | SHA-256 hash (hex-encoded) |
uploaded_at | ISO 8601 datetime | Upload timestamp |
Optional Properties
Implementations MAY include:
validation_runs: Array of ValidationRun objectscurator_id: User ID of assigned curator (when in UNDER_REVIEW state)submitted_at: Timestamp when status changed to SUBMITTED
Record
A Record represents an immutable, published dataset. It is created from an approved Deposition.
Required Properties
| Property | Type | Description |
|---|---|---|
srn | SRN | Unique identifier (type: rec) with version (e.g., @v1) |
status | string | Current state: PUBLIC, EMBARGOED, or WITHDRAWN |
profile | SRN | The SubmissionProfile (copied from source Deposition) |
metadata | object | Final descriptive metadata |
files | array | List of published file objects (same structure as Deposition files) |
provenance | object | Origin information (see below) |
published_at | ISO 8601 datetime | Timestamp of publication |
Provenance Object Structure
| Property | Type | Description |
|---|---|---|
source_deposition | SRN | The Deposition this Record was created from |
approved_by | string | User ID of the approving curator |
approved_at | ISO 8601 datetime | Approval timestamp |
guarantees | array | List of SemanticGuarantee SRNs with passing ValidationRuns at time of approval |
The guarantees field enables filtering and discovery based on verified data quality properties. This allows consumers to programmatically select datasets that meet specific semantic requirements (e.g., “all timestamps are ISO 8601 compliant”) without inspecting individual files.
Record Versioning
Records are immutable. If changes are needed, a new version MUST be created with an incremented version number (e.g., @v1 → @v2). The new version MUST reference the previous version in its provenance.
SubmissionProfile
A SubmissionProfile bundles requirements for a submission type.
Required Properties
| Property | Type | Description |
|---|---|---|
srn | SRN | Unique identifier |
title | string | Human-readable name (e.g., “Crystallography Dataset”) |
schema | SRN | Reference to a Schema definition |
guarantees | array | List of requirement objects (see below) |
curation_tools | array | List of CurationToolRef SRNs available for this profile |
Guarantee Requirement Object
| Property | Type | Description |
|---|---|---|
guarantee_srn | SRN | Reference to a SemanticGuarantee |
required | boolean | If true, this guarantee MUST pass before approval |
SemanticGuarantee
A SemanticGuarantee defines a testable assertion about data quality.
Required Properties
| Property | Type | Description |
|---|---|---|
srn | SRN | Unique identifier |
title | string | Human-readable name |
description | string | What this guarantee asserts |
validator | SRN | Reference to the Validator OCI image that tests this guarantee |
CurationToolRef
A CurationToolRef describes an interactive review tool.
Required Properties
| Property | Type | Description |
|---|---|---|
srn | SRN | Unique identifier |
title | string | Human-readable name (e.g., “3D Molecule Viewer”) |
image | string | OCI image reference (e.g., docker.io/osa/ngl-viewer:1.2.0) |
default_port | integer | Port the container’s web server listens on |
capabilities | array | List of strings: read-only and/or read-write |
Identifiers
All resources in the OSA protocol MUST be addressable by a Structured Resource Name (SRN).
SRN Grammar
SRNs MUST follow this URN-based grammar:
urn:osa:{node-id}:{type}:{local-id}[@{version}]
Components
urn:osa
: The fixed scheme prefix. All OSA identifiers begin with this.
{node-id}
: The globally unique identifier of the originating ArchiveNode (e.g., osa-registry, imperial-mat-sci). Node IDs MUST be DNS-safe (alphanumeric and hyphens only).
{type}
: A short string identifying the resource type:
dep— Depositionrec— Recordschema— Schema definitiontool— Curation Toolval— Validatorguarantee— SemanticGuaranteeprofile— SubmissionProfile
{local-id}
: A node-unique, opaque identifier. Implementations MAY use UUIDs, sequential IDs, or other schemes. Local IDs MUST be URL-safe.
@{version} (optional)
: A version identifier. REQUIRED for Records, OPTIONAL for other resources. Versions SHOULD follow Semantic Versioning 2.0 (e.g., @v1.0.0, @v2.3.1).
Examples
urn:osa:osa-registry:profile:crystallography@v1.0.0
urn:osa:imperial-mat-sci:dep:xyz789
urn:osa:imperial-mat-sci:rec:xyz789@v1
urn:osa:osa-registry:guarantee:iso8601-dates
Versioning Semantics
Records MUST include versions: Every Record SRN MUST include a version component (e.g., @v1). This enables immutable references.
Other resources MAY include versions: Schemas, Profiles, and Tools MAY use versions to track evolution while maintaining backwards compatibility.
Version resolution: When an SRN without a version is dereferenced (e.g., in a Profile reference), the registry MUST return the latest version.
Lifecycles
Deposition Lifecycle
A Deposition progresses through the following states:
┌─────────┐ submit ┌───────────┐ ┌──────────────┐
│ DRAFT │─────────────▶│ SUBMITTED │─────────────▶│ UNDER_REVIEW │
└─────────┘ └───────────┘ (curator └──────────────┘
▲ claims) │
│ │
│ ▼
│ ┌──────────┐
│ │ APPROVED │
│ └──────────┘
│ │
│ ▼
└──────────────────────────────────────────── [Record Created]
(request changes)
State: DRAFT
Entry conditions: Automatically entered when a Deposition is created.
Permitted operations:
- Metadata MAY be modified
- Files MAY be uploaded or deleted
- Validators MAY be run (for pre-submission checks)
Transition to SUBMITTED: A Client MAY request transition to SUBMITTED. The ArchiveNode MUST validate that required metadata fields are present before allowing the transition.
State: SUBMITTED
Entry conditions: Depositor has indicated the submission is complete.
Observable requirements:
- The Deposition MUST be immutable to the original depositor
- All required Validators (as defined by the SubmissionProfile) MUST be executed
- ValidationRuns MUST be created and linked to the Deposition
Transition to UNDER_REVIEW: The ArchiveNode MAY transition to UNDER_REVIEW when:
- All required SemanticGuarantees have passing ValidationRuns, OR
- The SubmissionProfile requires manual curation regardless of validation status
Transition to DRAFT: If validation fails and the SubmissionProfile allows resubmission, the ArchiveNode MAY allow a curator to request changes, returning the Deposition to DRAFT with feedback.
State: UNDER_REVIEW
Entry conditions: The Deposition is awaiting or undergoing human review.
Permitted operations:
- A curator MAY instantiate Curation Tools to inspect the data
- Curators MAY modify metadata or files (via Curation Tools) to fix issues
- The curator MAY add annotations or comments
Transition to APPROVED: The curator MAY approve the Deposition. Before approval, the ArchiveNode MUST verify that all required SemanticGuarantees are satisfied.
Transition to DRAFT: The curator MAY request changes from the depositor.
State: APPROVED
Entry conditions: A curator has approved the Deposition.
Observable requirement: The ArchiveNode MUST create a Record from the approved Deposition. This is a terminal state for the Deposition.
Validation Gate
Before a Deposition can transition to APPROVED, the following invariant MUST hold:
For every SemanticGuarantee in the SubmissionProfile where
required: true, there MUST exist a ValidationRun withstatus: "pass".
Implementations MAY cache ValidationRuns and reuse them if the Deposition data has not changed. Implementations MUST re-run Validators if files or metadata have been modified since the last run.
Record Lifecycle
Records are immutable after creation. They have a simpler lifecycle:
State: PUBLIC
The Record is openly accessible. This is the default state for newly created Records.
State: EMBARGOED
The Record exists but is not publicly discoverable until a specified date. Embargoes are optional and implementation-specific.
State: WITHDRAWN
The Record has been retracted. The metadata remains visible (with a “WITHDRAWN” marker), but files are no longer accessible. Withdrawals MUST include a reason in the metadata.
Registries
OSA Registries provide authoritative, versioned collections of reusable protocol resources.
Registry Responsibilities
A Registry is a service that:
- Stores Schemas, SemanticGuarantees, CurationToolRefs, and SubmissionProfiles
- Resolves SRNs to their resource definitions
- Enforces versioning using Semantic Versioning 2.0
- Maintains immutability (entries are append-only; versions cannot be modified after publication)
Registry Discovery
Every ArchiveNode MUST publish a Node Document at /.well-known/osa-node.json that includes:
{
"node_id": "imperial-mat-sci",
"registries": [
"https://registry.osa.org",
"https://imperial-mat-sci.ac.uk/osa-registry"
]
}
When resolving an SRN, clients SHOULD:
- Check if the SRN’s
node-idmatches a known registry - Query that registry’s resolution endpoint
- Fall back to the Global Registry (
urn:osa:osa-registry)
Registry Types
Global Registry
: The canonical registry at urn:osa:osa-registry, maintained by the OSA governance body. Contains broadly applicable Schemas, Guarantees, and Tools.
Local Registries
: Individual ArchiveNodes or institutions MAY maintain their own registries for domain-specific resources. Local registry SRNs use the node’s own node-id as authority.
Registry Entries
All registry entries MUST include:
srn: The resource’s identifierversion: Semantic version (e.g.,1.0.0)published_at: ISO 8601 timestampschema: Resource-specific schema (e.g., SubmissionProfile structure)
Governance
Changes to Global Registry entries (especially breaking changes to core Schemas or Profiles) MUST follow the OSA Enhancement Proposal (OEP) process (see §Extensibility).
ArchiveNode HTTP API
This section defines the HTTP API that conforming ArchiveNodes MUST implement.
General Conventions
Base URL: All endpoints are relative to the ArchiveNode’s base URL (e.g., https://archive.example.org/api/v1).
Authentication: Requests MUST include a Bearer token in the Authorization header:
Authorization: Bearer <token>
Content Type: Request and response bodies MUST use application/json unless otherwise specified.
Error Responses: Errors MUST return appropriate HTTP status codes and a JSON body:
{
"error": "error_code",
"message": "Human-readable description"
}
Deposition Endpoints
Create Deposition
POST /depositions
Creates a new Deposition in DRAFT state.
Request Body:
{
"profile": "urn:osa:osa-registry:profile:crystallography@v1.0.0"
}
Response (201 Created):
{
"srn": "urn:osa:example-archive:dep:abc123",
"status": "DRAFT",
"profile": "urn:osa:osa-registry:profile:crystallography@v1.0.0",
"metadata": {},
"files": [],
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-15T10:30:00Z"
}
Get Deposition
GET /depositions/{id}
Retrieves a Deposition by its local ID.
Response (200 OK): Full Deposition object (as above).
Update Deposition Metadata
PATCH /depositions/{id}
Updates metadata fields. Only valid in DRAFT state (or UNDER_REVIEW if the requester is a curator).
Request Body:
{
"metadata": {
"title": "Crystal Structure of Protein X",
"authors": ["Alice", "Bob"]
}
}
Response (200 OK): Updated Deposition object.
Upload File
POST /depositions/{id}/files
Uploads a file to the Deposition.
Request: multipart/form-data with a file field.
Response (201 Created):
{
"name": "data.cif",
"size": 1048576,
"checksum": "sha256:abcdef123456...",
"uploaded_at": "2024-01-15T10:35:00Z"
}
Delete File
DELETE /depositions/{id}/files/{filename}
Removes a file from the Deposition. Only valid in DRAFT state.
Response (204 No Content)
Submit for Review
POST /depositions/{id}/actions/submit
Transitions the Deposition to SUBMITTED state, triggering validation.
Response (200 OK):
{
"status": "SUBMITTED",
"message": "Validation in progress"
}
List Validation Runs
GET /depositions/{id}/validations
Returns all ValidationRuns for this Deposition.
Response (200 OK):
{
"validations": [
{
"guarantee": "urn:osa:osa-registry:guarantee:iso8601-dates",
"status": "pass",
"executed_at": "2024-01-15T10:36:00Z",
"messages": ["All date fields are valid ISO 8601"]
}
]
}
Record Endpoints
List Records
GET /records
Returns paginated list of public Records.
Query Parameters:
page: Page number (default: 1)per_page: Results per page (default: 20, max: 100)
Response (200 OK):
{
"records": [
{
"srn": "urn:osa:example-archive:rec:xyz789@v1",
"status": "PUBLIC",
"metadata": { "title": "..." },
"published_at": "2024-01-15T12:00:00Z"
}
],
"pagination": {
"page": 1,
"per_page": 20,
"total": 150
}
}
Get Record
GET /records/{id}
Retrieves a specific Record by local ID. If no version is specified, returns the latest version.
GET /records/{id}@{version}
Retrieves a specific version of a Record.
Response (200 OK): Full Record object.
Download Record File
GET /records/{id}/files/{filename}
Downloads a file from a Record.
Response (200 OK): File contents (with appropriate Content-Type and Content-Disposition headers).
Curation Endpoints
List Available Tools
GET /depositions/{id}/tools
Returns Curation Tools available for this Deposition (as defined by its SubmissionProfile).
Response (200 OK):
{
"tools": [
{
"srn": "urn:osa:osa-registry:tool:ngl-viewer@1.2.0",
"title": "3D Molecule Viewer",
"capabilities": ["read-only"]
}
]
}
Start Curation Session
POST /depositions/{id}/sessions
Starts a Curation Tool session (launches OCI container and returns proxy endpoint).
Request Body:
{
"tool_srn": "urn:osa:osa-registry:tool:ngl-viewer@1.2.0"
}
Response (201 Created):
{
"session_id": "sess_abc123",
"status": "provisioning",
"proxy_endpoint": "/curation/sessions/sess_abc123",
"expires_at": "2024-01-15T14:00:00Z"
}
Access Curation Tool
GET /curation/sessions/{session_id}/{path}
Proxies requests to the running Curation Tool container.
The ArchiveNode MUST:
- Verify the requester owns the session
- Forward requests to the container’s web server
- Rewrite paths according to
OSAP_BASE_PATH
Stop Curation Session
DELETE /curation/sessions/{session_id}
Terminates the Curation Tool container.
Response (204 No Content)
ViewNode Protocol
A ViewNode provides read-only, indexed access to Records from one or more ArchiveNodes.
Responsibilities
A conforming ViewNode MUST:
- Fetch Records from ArchiveNodes via their HTTP API
- Index metadata to enable search
- Expose a Search API (defined below)
- Update its index when new Records are published
ViewNodes MAY:
- Maintain local copies of Record files
- Compute derived metadata (e.g., thumbnails, text extracts)
- Aggregate Records from multiple ArchiveNodes
Search API
Search Records
GET /search
Searches indexed Records.
Query Parameters:
q: Query string (implementation-specific syntax)filters: JSON-encoded filter object (optional)guarantees: Comma-separated list of SemanticGuarantee SRNs (returns only Records satisfying all listed guarantees)page: Page number (default: 1)per_page: Results per page (default: 20, max: 100)
Response (200 OK):
{
"results": [
{
"srn": "urn:osa:example-archive:rec:abc123@v1",
"title": "Crystal Structure of Protein X",
"published_at": "2024-01-15T12:00:00Z",
"archive_node": "https://example-archive.org",
"guarantees": [
"urn:osa:osa-registry:guarantee:iso8601-dates",
"urn:osa:osa-registry:guarantee:valid-metadata"
]
}
],
"pagination": {
"page": 1,
"per_page": 20,
"total": 42
}
}
Example with guarantee filtering:
GET /search?q=protein+structures&guarantees=urn:osa:osa-registry:guarantee:iso8601-dates,urn:osa:osa-registry:guarantee:valid-cif
Returns only protein structure Records that have validated against both the ISO 8601 date guarantee and the CIF format guarantee.
Get Indexed Record
GET /records/{srn}
Retrieves a Record by its full SRN (URL-encoded).
Example: GET /records/urn%3Aosa%3Aexample-archive%3Arec%3Aabc123%40v1
Response (200 OK): Full Record object, with additional field:
{
"srn": "urn:osa:example-archive:rec:abc123@v1",
...,
"source_archive": "https://example-archive.org/api/v1"
}
Synchronization
ViewNodes MUST implement one of the following strategies:
Pull-based: Periodically poll ArchiveNodes for new Records via GET /records.
Push-based: ArchiveNodes MAY notify ViewNodes via webhooks (implementation-specific).
Federation protocol (optional): A future version of this spec may define a standardized sync protocol.
Validator Contract
This section defines the execution contract for Validator OCI containers.
Purpose
Validators are headless, automated programs that test datasets against SemanticGuarantees. They run in sandboxed OCI containers and communicate via files.
Container Requirements
Packaging
Validators MUST be packaged as OCI-compliant container images (compatible with Docker, Podman, etc.).
Entrypoint
The container’s entrypoint MUST:
- Read input data from
$OSAP_IN - Perform validation checks
- Write results to
$OSAP_OUT/result.json - Exit with code 0 (regardless of pass/fail; the result determines pass/fail)
Environment Variables
The ArchiveNode MUST inject:
| Variable | Description |
|---|---|
OSAP_IN | Path to input directory containing metadata.json and data files |
OSAP_OUT | Path to output directory (writable) |
Input Format
The $OSAP_IN directory contains:
metadata.json: The Deposition’s metadata object- One or more data files (as uploaded by the depositor)
Output Format
Validators MUST write $OSAP_OUT/result.json:
{
"status": "pass",
"messages": [
"Checked 500 rows, all valid.",
"No missing required fields."
]
}
Required fields:
status: Either"pass"or"fail"messages: Array of human-readable strings (diagnostic info)
Optional fields:
errors: Array of specific error objects (structure is validator-defined)
Example Validation Run
# ArchiveNode prepares input
$ ls $OSAP_IN
metadata.json data.csv
# ArchiveNode runs container
$ docker run --rm \
-v /path/to/input:/input:ro \
-v /path/to/output:/output \
-e OSAP_IN=/input \
-e OSAP_OUT=/output \
myregistry.io/osa/csv-validator:1.0.0
# Validator writes result
$ cat $OSAP_OUT/result.json
{
"status": "pass",
"messages": ["All 1000 rows validated successfully"]
}
Sandboxing
ArchiveNodes MUST run Validators with:
- No network access (no outbound connections)
- Read-only input (
$OSAP_INmounted read-only) - Limited resources (CPU/RAM limits)
- Isolated filesystem (no access to host filesystem beyond input/output)
ArchiveNodes SHOULD set execution timeouts (e.g., 10 minutes) to prevent runaway validators.
Error Handling
If the Validator container:
- Exits with non-zero code → Treat as
status: "fail"with message “Validator crashed” - Fails to write
result.json→ Treat asstatus: "fail"with message “No result produced” - Times out → Treat as
status: "fail"with message “Validation timeout exceeded”
Curation Tool Contract
This section defines the execution contract for Curation Tool OCI containers.
Purpose
Curation Tools are interactive web applications that allow human curators to inspect, annotate, and modify Depositions. Unlike Validators, they are long-running and expose HTTP interfaces.
Container Requirements
Packaging
Curation Tools MUST be packaged as OCI-compliant container images.
Web Server
The container MUST run a web server listening on the port specified in its CurationToolRef.default_port (e.g., 8080).
Environment Variables
The ArchiveNode MUST inject:
| Variable | Description |
|---|---|
OSAP_DEPOSITION_ID | The SRN of the Deposition being curated |
OSAP_API_URL | The internal URL of the ArchiveNode’s API (e.g., http://localhost:5000/api/v1) |
OSAP_API_TOKEN | A temporary Bearer token with write access to the Deposition |
OSAP_BASE_PATH | The public path prefix (e.g., /curation/sessions/sess_abc123) |
Base Path Handling
All assets (HTML, CSS, JS, images) MUST be served relative to $OSAP_BASE_PATH.
Example: If the tool serves a stylesheet at /style.css, and OSAP_BASE_PATH=/curation/sessions/sess_abc123, the client must access it at:
https://archive.example.org/curation/sessions/sess_abc123/style.css
Many web frameworks support base path configuration (e.g., Flask’s APPLICATION_ROOT, Express’s app.use(basePath, ...)).
Data Access
Reading Data
Curation Tools MAY access Deposition data via:
- API calls to
$OSAP_API_URLusing$OSAP_API_TOKEN - Mounted files (if the ArchiveNode mounts data at
/dataread-only)
Modifying Data
To modify the Deposition (e.g., fix metadata, delete files), the tool MUST:
- Make API calls to
$OSAP_API_URL(e.g.,PATCH /depositions/{id}) - Include
Authorization: Bearer $OSAP_API_TOKEN
Tools MUST NOT attempt to write state to the container’s local filesystem. Any data written locally will be lost when the session ends.
Security
Token Scope
The provided OSAP_API_TOKEN:
- MUST grant read access to the Deposition being curated
- MUST grant write access ONLY if the tool’s
capabilitiesincludesread-write - MUST expire when the curation session ends
- SHOULD be limited to the specific Deposition (not all Depositions)
Isolation
Curation Tool containers MUST be isolated:
- Network restrictions: Outbound internet access SHOULD be blocked unless explicitly required
- Ephemeral storage: Any data written to the container’s filesystem MUST be discarded when the session ends
- Resource limits: CPU/RAM limits SHOULD be enforced
Example Curation Session
# ArchiveNode starts container
$ docker run --rm -d \
-p 8080:8080 \
-e OSAP_DEPOSITION_ID=urn:osa:example:dep:abc123 \
-e OSAP_API_URL=http://host.docker.internal:5000/api/v1 \
-e OSAP_API_TOKEN=eyJhbGc... \
-e OSAP_BASE_PATH=/curation/sessions/sess_xyz \
myregistry.io/osa/molecule-viewer:1.2.0
# ArchiveNode proxies requests
# User visits: https://archive.example.org/curation/sessions/sess_xyz
# ArchiveNode forwards to: http://localhost:8080/ (with path rewriting)
Client Requirements
A Client is any application that interacts with ArchiveNodes or ViewNodes. This includes web applications, command-line tools, libraries, and scripts.
Required Behaviors
Conforming Clients MUST:
-
Use SRNs for resource references: When referencing Depositions, Records, or Profiles, use full SRNs (not local IDs alone).
-
Authenticate via Bearer tokens: Include
Authorization: Bearer <token>in all requests to ArchiveNodes. -
Handle standard HTTP status codes:
401 Unauthorized→ Prompt for authentication403 Forbidden→ Insufficient permissions404 Not Found→ Resource does not exist422 Unprocessable Entity→ Validation errors (check response body for details)
-
Respect rate limits: If an ArchiveNode returns
429 Too Many Requests, back off exponentially.
Optional Behaviors
Clients MAY:
- Cache Record metadata locally (but SHOULD check
ETagorLast-Modifiedheaders) - Support multiple ArchiveNodes simultaneously
- Implement retry logic for transient failures (5xx errors)
Example: Submitting a Dataset
import requests
API_BASE = "https://archive.example.org/api/v1"
TOKEN = "your-bearer-token"
# 1. Create Deposition
resp = requests.post(
f"{API_BASE}/depositions",
headers={"Authorization": f"Bearer {TOKEN}"},
json={"profile": "urn:osa:osa-registry:profile:crystallography@v1.0.0"}
)
deposition = resp.json()
dep_id = deposition["srn"].split(":")[-1] # Extract local ID
# 2. Upload file
with open("data.cif", "rb") as f:
requests.post(
f"{API_BASE}/depositions/{dep_id}/files",
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": f}
)
# 3. Update metadata
requests.patch(
f"{API_BASE}/depositions/{dep_id}",
headers={"Authorization": f"Bearer {TOKEN}"},
json={"metadata": {"title": "My Crystal Structure"}}
)
# 4. Submit
requests.post(
f"{API_BASE}/depositions/{dep_id}/actions/submit",
headers={"Authorization": f"Bearer {TOKEN}"}
)
Security & Privacy
Authentication
ArchiveNodes MUST support Bearer token authentication. Tokens SHOULD be obtained via an external OIDC-compatible identity provider.
ArchiveNodes MAY support additional authentication methods (e.g., API keys, mTLS) but MUST support Bearer tokens for interoperability.
Authorization
ArchiveNodes MUST enforce:
- Depositor isolation: Users can only read/modify Depositions they created (unless they have curator privileges)
- Curator privileges: Curators can view and modify any Deposition in UNDER_REVIEW state
- Public read access: Records in PUBLIC state SHOULD be readable without authentication
Node Identity
Each ArchiveNode MUST have a globally unique NodeID and MUST publish a Node Document at:
https://{domain}/.well-known/osa-node.json
Example:
{
"node_id": "imperial-mat-sci",
"version": "1.0.0",
"api_base": "https://imperial-mat-sci.ac.uk/osa/api/v1",
"registries": [
"https://registry.osa.org",
"https://imperial-mat-sci.ac.uk/osa-registry"
]
}
Curation Tool Security
Proxy Authentication
The ArchiveNode’s proxy endpoint (/curation/sessions/{session_id}) MUST verify that the requesting user is the owner of the session.
Tool Isolation
Curation Tools MUST be isolated:
- Network restrictions: Outbound internet access SHOULD be blocked
- Ephemeral storage: Any data written to the container’s filesystem MUST be discarded when the session ends
- No persistent state: Tools MUST NOT maintain state between sessions
CSRF Protection
Because Curation Tools are proxied on the ArchiveNode’s domain, they share the same origin as the main application. ArchiveNodes MUST:
- Enforce
SameSite=Stricton session cookies - Validate CSRF tokens on all state-changing operations
- Use separate session tokens for curation (not the main user session)
Data Privacy
ArchiveNodes SHOULD:
- Encrypt data at rest and in transit (TLS 1.3+)
- Provide mechanisms for embargoing sensitive data
- Support metadata redaction for withdrawn Records
- Log all access to private Depositions for audit purposes
Conformance
Conformance Classes
ArchiveNode
A conforming ArchiveNode MUST:
- Implement all endpoints in §ArchiveNode HTTP API
- Enforce the Deposition lifecycle (§Lifecycles)
- Execute Validators according to §Validator Contract
- Proxy Curation Tools according to §Curation Tool Contract
- Support SRN resolution (§Identifiers)
- Publish a Node Document at
/.well-known/osa-node.json
ViewNode
A conforming ViewNode MUST:
- Implement the Search API (§ViewNode Protocol)
- Index Records from at least one ArchiveNode
- Index the
guaranteesfield from Record provenance to enable quality-based filtering - Support filtering by SemanticGuarantees via the
guaranteesquery parameter - Resolve SRNs (§Identifiers)
- Return results in the specified JSON format
Validator
A conforming Validator MUST:
- Be packaged as an OCI container
- Read input from
$OSAP_IN - Write
result.jsonto$OSAP_OUTwith required fields - Exit with code 0
- Operate without network access
Curation Tool
A conforming Curation Tool MUST:
- Be packaged as an OCI container
- Run a web server on the specified port
- Serve assets relative to
$OSAP_BASE_PATH - Use
$OSAP_API_TOKENfor all state-changing operations - Not write persistent state to local disk
Client
A conforming Client MUST:
- Use SRNs for resource references
- Authenticate via Bearer tokens
- Handle standard HTTP status codes
Compliance Testing
The OSA project maintains a test suite at https://github.com/open-science-archive/compliance-tests.
Implementations MUST pass all tests in the relevant conformance class to claim OSA compliance.
Version Compatibility
This specification is version 0.0.4 (draft).
Breaking changes (as defined by Semantic Versioning 2.0) will increment the major version. Implementations SHOULD advertise their supported spec versions in the Node Document.
Extensibility
OSA Enhancement Proposals (OEPs)
Changes to this specification, the Global Registry, or standard contracts MUST follow the OEP process:
- Draft: Author submits a proposal to the OSA governance repository
- Community Review: 14-day public comment period
- Revision: Author addresses feedback
- Last Call: 7-day final review
- Accepted: Governance committee votes (requires 2/3 majority)
Accepted OEPs are versioned and published at https://oeps.osa.org/.
Namespaced Extensions
Implementations MAY add custom fields to protocol resources using namespaced keys:
{
"metadata": {
"title": "My Dataset",
"x-institution-grant-id": "GRANT-12345",
"x-institution-internal-id": "abc-xyz-789"
}
}
Extension keys MUST:
- Start with
x-followed by a unique namespace identifier - Use lowercase and hyphens (e.g.,
x-my-org-field-name) - Not conflict with standardized keys
Extensions MUST NOT:
- Change the semantics of required protocol fields
- Break interoperability (other implementations MUST be able to ignore unknown fields)
Future Directions
Topics under consideration for future versions:
- Federation protocol: Standardized push-based synchronization between ArchiveNodes and ViewNodes
- Access control policies: Fine-grained permissions (e.g., group-based access, time-limited embargoes)
- Provenance chains: Tracking derived datasets and lineage across Records
- Binary attachment format: Standardized packaging for Record exports (e.g., BagIt integration)