1. ISA Abstract Model

This ISA specification defines an Abstract Model of the metadata framework. The ISA Abstract Model has been implemented in two format specifications, ISA-Tab and ISA-JSON, both of which have supporting tools and services associated with them. The format specifications are also available for additional tooling to take advantage of ISA-formatted content.

The concept map below shows the ISA objects/entities and their relation to one another:

Concept map showing ISA objects/entities and their relationships.

Note

The concept ontology reference depicted above refers to a combination of the Ontology Annotation and Ontology Source concepts as described below.

1.1. Investigation, Study, Assay

The ISA model consists of three core entities to capture experimental metadata:
  • Investigation
  • Study
  • Assay

An Investigation contains all the information needed to understand the overall goals and means used in an experiment; experimental steps (or sequences of events) are described in a Study and Assay . For each Investigation there may be one or more Study associated with it; for each Study there may be one or more Assay.

1.1.1. Investigation

An Investigation is intended to:

  1. to record metadata relating to a given investigation
  2. to link related Study objects under an Investigation (this only becomes necessary when two or more Study objects need to be grouped)

An Investigation is used to record metadata relating to the description of the investigation context, such as the title and description of the investigation as well as about related people and scholarly publications. Study and Assay objects are grouped within an Investigation to record other metadata within the relevant contexts.

An Investigation SHOULD record the following:

Property Datatype Description
Identifier String A identifier or an accession number provided by a repository. This SHOULD be locally unique.
Title String A concise name given to the investigation.
Description String A textual description of the investigation.
Submission Date Representation of a ISO8601 date The date on which the investigation was reported to the repository.
Public Release Date Representation of a ISO8601 date The date on which the investigation was released publicly.
Publications A list of Publication A list of Publications relating to the investigation.
Contacts A list of Contact A list of Contacts relating to the investigation.

1.1.2. Study

A Study is a central concept containing information on the subject under study, its characteristics and any treatments applied.

A Study contains contextualising information for one or more Assay. Metadata about the study design, study factors used, and study protocols are recorded in Study objects, as well as information similarly to the Investigation including title and description of the study, and related people and scholarly publications.

A Study SHOULD record the following:

Property Datatype Description
Identifier String A identifier or an accession number provided by a repository. This SHOULD be locally unique.
Title String A concise name given to the investigation.
Description String A textual description of the investigation.
Submission Date Representation of a ISO8601 date The date on which the investigation was reported to the repository.
Public Release Date Representation of a ISO8601 date The date on which the investigation was released publicly.
Publications A list of Publication A list of Publications relating to the study.
Contacts A list of Contact A list of Contacts relating to the study.
Design Type Ontology Annotation A classifier of the study based on the overall experimental design, e.g cross-over design or parallel group design.
Factor Name String The name of one factor used in the Study and/or Assay files. A factor corresponds to an independent variable manipulated by the experimentalist with the intention to affect biological systems in a way that can be measured by an assay. The value of a factor is given in the Study or Assay file, accordingly.
Factor Type Ontology Annotation An classification of this factor into categories.

In a Study object we record the provenance of biological samples, from source material through a collection process to sample material, represented with directed acyclic graphs (direct graphs with no loops/cycles). The pattern of nodes is usually formed of a source material node, followed by a sample collection process node, followed by a sample material node.

For example:

(source material)->(sample collection)->(sample material)

These study graphs MAY split and pool depending on how the samples are collected.

In a splitting example, multiple samples might be derived from the same source:

(source material 1)->(sample collection)->(sample material 1)
(source material 1)->(sample collection)->(sample material 2)

In a pooling example, multiple sources may be used to create a single sample:

(source material 1)->(sample collection)->(sample material 1)
(source material 2)->(sample collection)->(sample material 1)

1.1.3. Assay

An Assay represents a test performed either on material taken from a subject or on a whole initial subject, producing qualitative or quantitative measurements.

An Assay groups descriptions of provenance of sample processing for related tests. Each test typically follows the steps of one particular experimental workflow described by a particular protocol.

Assay-related metadata includes descriptions of the measurement type and technology used, and a link to what study protocol is applied. Where an assay produces data files, links to the data are recorded here.

An Assay SHOULD record the following:

Property Datatype Description
Measurement Type Ontology Annotation An Ontology Annotation to qualify the endpoint, or what is being measured (e.g. gene expression profiling or protein identification).
Technology Type Ontology Annotation An Ontology Annotation to identify the technology used to perform the measurement, e.g. DNA microarray, mass spectrometry.
Technology Platform String The manufacturer and platform name, e.g. Bruker AVANCE, of the technology used.

In an Assay we record the provenance of biological samples, from sample material through an experimental workflow, represented with directed acyclic graphs. Assay graphs usually follow the pattern of a sample material, followed by a series of process and material/data nodes.

For example, to show a sample that goes through some extraction process (e.g. nucleic acid extraction) through to producing some sequenced data, we might produce something like:

(sample material)->(extraction process)->(extract)->(sequencing process)->(raw data file)

Like with the study graphs, splitting and pooling can occur where appropriate in assay graphs.

1.1.4. Study and Assay graphs

Experimental graphs relating to Study and Assay objects are made up of specific types of nodes.

Experimental graphs MUST be directed and acyclic (i.e. MUST NOT contain loops/cycles).

All nodes in Study and Assay graphs MUST be uniquely identifiable. User-defined identifiers MAY also be used.

Experimental graphs MUST be composed of the following node types

Material nodes

Material nodes can also be used as a generic structure to describe materials consumed or produced during an experimental workflow. Material nodes SHOULD record the following:

Property Datatype Description
Characteristics A list of Characteristic A list of material characteristics that may be qualitative or quantitative in description. Qualitative values MAY be Ontology Annotations, while quantitative values MAY be qualified with a Unit definition.
Material Type Ontology Annotation An Ontology Annotation describing the material.

Source nodes are a special kind of Material node and are considered as the starting biological material used in a study. Source nodes SHOULD be followed by a Process node describing a sample collection process, and SHOULD only appear in Study graphs.

Sample nodes are a special kind of Material node and represent major outputs resulting from a protocol application. Sample nodes in the Study graphs SHOULD be preceded by a Process node describing a sample collection process. Sample nodes in the Assay graphs SHOULD be followed by a Process node and SHOULD NOT be preceded by any node.

Data nodes

Data nodes represent outputs resulting from a protocol application that corresponds to some process that produces data, typically in the form of data files. Data nodes SHOULD record the following:

Property Datatype Description
File name String A file name or full path referencing a data file produced by the related process that MAY be packaged with, or is accessible via, the ISA reference implementation content.

Data nodes SHOULD be preceded by a Process node describing a data-producing process, such as NMR scanning or DNA sequencing.

Process nodes

Process nodes represent the application of a protocol to some input material (e.g. a Source) to produce some output (e.g.a Sample).

Process nodes SHOULD record the following:

Property Datatype Description
Parameter Values A list of Parameter Value Reporting on the values taken by parameters when applying a protocol. A protocol description in the Study SHOULD declare the required parameters, where here the values applied are recorded.
Performer String Name of the operator who carried out the protocol. This allows account to be taken of operator effects and can be part of a quality control data tracking.
Date Representation of an ISO8601 date The date on which a protocol is performed. This allows account to be taken of day effects and can be part of a quality control data tracking.

Process nodes SHOULD be preceded by zero or more Material or Data nodes, and followed by zero or more Material or Data nodes.

1.2. Ontology Annotation

For a given value, an Ontology Annotation SHOULD qualify this value with an accession number taken from an Ontology Source.

An Ontology Annotation SHOULD record the following:

Property Datatype Description
Accession Number String or URI The accession number or reference from the Ontology Source associated with the selected term.

1.3. Ontology Source

An Ontology Source describes the resource from which the value of an Ontology Annotation is derived from. An Ontology Source SHOULD be referenced by an Ontology Annotation. An Ontology Source should contain enough information on which to be able to ascertain the provenance of an Ontology Source.

An Ontology Source SHOULD record the following:

Property Datatype Description
Name String The name of the source of a term; i.e. the source controlled vocabulary or ontology. These names will be used to reference the Ontology Source from an Ontology Annotation.
File String A file name or a URI of an official resource.
Version String The version number of the Term Source to support terms tracking.

1.4. Unit

A Unit is used to classify dimensional data, and used accordingly with relevant values.

A Unit SHOULD be implemented as an Ontology Annotation.

1.5. Publication

A Publication SHOULD record the following:

Property Datatype Description
PubMed ID Representation of a PubMed ID The PubMed IDs of the described publication(s) associated with this investigation.
DOI Representation of a DOI A Digital Object Identifier (DOI) for that publication (where available).
Author List A list of Strings The list of authors associated with that publication.
Title String The title of publication associated with the investigation.
Status Ontology Annotation An Ontology Annotation describing the status of that publication (i.e. submitted, in preparation, published).

1.6. Contact

A Contact SHOULD record the following:

Property Datatype Description
Name String The name of a person.
Email Representation of an email The email address of a person.
Phone Representation of a phone number The telephone number of a person.
Address Multi-line string The address of a person.
Affiliation String The organization affiliation for a person.
Roles A list of Ontology Annotations Ontology Annotations to classify the roles performed by this person in the context of an Investigation or Study.