Schema for a generic (data) provenance annotation

This schema builds on the Thing schema, and diversifies it to capture the key provenance concepts Agent, Activity, Entity, and their relationships. It does that by drawing on (a subset) of the PROV data model PROV-DM.

Central paradigm of the schema is the qualified relation pattern. The three Thing derived key concepts Agent, Activity, and Entity build on the Thing slot relations (symmetric property) for inline declaration of related things, and extend the basic means of qualifying a relationship via is_characterized_by and with provenance specific slots, and qualified_relations for declaring the roles of related things.

Specifically the latter approach takes the place of all of the various qualified* properties defined in PROV. Rather than supporting association classes like prov:Derivation or prov:Generation, such relationships are annotated by linking to things via slots like was_derived_from or was_generated_by, and characterizing the relationship via qualified_relations, and a (custom) Role declaration.

The schema is entity-centric, meaning that entity properties are preferred over activity properties expressing the same (inverse) relationship. For example, an entity's was_generated_by is implemented, but not an activity's generated.

With Person, Organization, and SoftwareAgent, the three essential Agent types included in PROV are supported. However, these classes provide no additional slots over their base class. Their purpose is only semantic differentiation.

A custom CURIE prefix email: is defined and emitted to allow for using emails as suitable schema identifiers for an email address (e.g., email:me@example.com).

The schema definition is available as

URI: https://concepts.datalad.org/s/prov/unreleased

Name: prov-schema

Schema Diagram

erDiagram Agent { uriorcurie id uriorcurie schema_type uriorcurie type } Activity { W3CISO8601 ended_at W3CISO8601 started_at uriorcurie id uriorcurie schema_type uriorcurie type } Entity { uriorcurie id uriorcurie schema_type uriorcurie type } Person { uriorcurie id uriorcurie schema_type uriorcurie type } Organization { uriorcurie id uriorcurie schema_type uriorcurie type } SoftwareAgent { uriorcurie id uriorcurie schema_type uriorcurie type } ThingMixin { uriorcurie schema_type uriorcurie type } ValueSpecificationMixin { uriorcurie range string value } AttributeSpecification { uriorcurie schema_type uriorcurie type uriorcurie range string value } Property { uriorcurie id uriorcurie schema_type uriorcurie type } Statement { } Thing { uriorcurie id uriorcurie schema_type uriorcurie type } ValueSpecification { uriorcurie range string value uriorcurie id uriorcurie schema_type uriorcurie type } Identifier { uriorcurie creator string notation string schema_agency } DOI { uriorcurie creator string notation string schema_agency } Role { uriorcurie id uriorcurie schema_type uriorcurie type } Relationship { } Location { uriorcurie id uriorcurie schema_type uriorcurie type } InstanteneousEvent { W3CISO8601 at_time uriorcurie id uriorcurie schema_type uriorcurie type } Agent ||--}o Agent : "acted_on_behalf_of" Agent ||--|o Location : "at_location" Agent ||--}o Identifier : "identifiers" Agent ||--}o Relationship : "qualified_relations" Agent ||--}o Thing : "relations" Agent ||--}o AttributeSpecification : "has_attributes" Agent ||--}o Statement : "is_characterized_by" Activity ||--|o Location : "at_location" Activity ||--}o Identifier : "identifiers" Activity ||--}o Relationship : "qualified_relations" Activity ||--}o Agent : "was_associated_with" Activity ||--}o Activity : "was_informed_by" Activity ||--}o Thing : "relations" Activity ||--}o AttributeSpecification : "has_attributes" Activity ||--}o Statement : "is_characterized_by" Entity ||--}o Identifier : "identifiers" Entity ||--}o Relationship : "qualified_relations" Entity ||--}o Agent : "was_attributed_to" Entity ||--}o Entity : "was_derived_from" Entity ||--}o Activity : "was_generated_by" Entity ||--}o Thing : "relations" Entity ||--}o AttributeSpecification : "has_attributes" Entity ||--}o Statement : "is_characterized_by" Person ||--}o Agent : "acted_on_behalf_of" Person ||--|o Location : "at_location" Person ||--}o Identifier : "identifiers" Person ||--}o Relationship : "qualified_relations" Person ||--}o Thing : "relations" Person ||--}o AttributeSpecification : "has_attributes" Person ||--}o Statement : "is_characterized_by" Organization ||--}o Agent : "acted_on_behalf_of" Organization ||--|o Location : "at_location" Organization ||--}o Identifier : "identifiers" Organization ||--}o Relationship : "qualified_relations" Organization ||--}o Thing : "relations" Organization ||--}o AttributeSpecification : "has_attributes" Organization ||--}o Statement : "is_characterized_by" SoftwareAgent ||--}o Agent : "acted_on_behalf_of" SoftwareAgent ||--|o Location : "at_location" SoftwareAgent ||--}o Identifier : "identifiers" SoftwareAgent ||--}o Relationship : "qualified_relations" SoftwareAgent ||--}o Thing : "relations" SoftwareAgent ||--}o AttributeSpecification : "has_attributes" SoftwareAgent ||--}o Statement : "is_characterized_by" ThingMixin ||--}o AttributeSpecification : "has_attributes" ThingMixin ||--}o Statement : "is_characterized_by" AttributeSpecification ||--|| Property : "predicate" AttributeSpecification ||--}o AttributeSpecification : "has_attributes" AttributeSpecification ||--}o Statement : "is_characterized_by" Property ||--}o Thing : "relations" Property ||--}o AttributeSpecification : "has_attributes" Property ||--}o Statement : "is_characterized_by" Statement ||--|| Thing : "object" Statement ||--|| Property : "predicate" Thing ||--}o Thing : "relations" Thing ||--}o AttributeSpecification : "has_attributes" Thing ||--}o Statement : "is_characterized_by" ValueSpecification ||--}o Thing : "relations" ValueSpecification ||--}o AttributeSpecification : "has_attributes" ValueSpecification ||--}o Statement : "is_characterized_by" Role ||--}o Thing : "relations" Role ||--}o AttributeSpecification : "has_attributes" Role ||--}o Statement : "is_characterized_by" Relationship ||--|| Thing : "object" Relationship ||--}| Role : "had_roles" Location ||--}o Identifier : "identifiers" Location ||--}o Relationship : "qualified_relations" Location ||--}o Thing : "relations" Location ||--}o AttributeSpecification : "has_attributes" Location ||--}o Statement : "is_characterized_by" InstanteneousEvent ||--}o Identifier : "identifiers" InstanteneousEvent ||--}o Relationship : "qualified_relations" InstanteneousEvent ||--}o Thing : "relations" InstanteneousEvent ||--}o AttributeSpecification : "has_attributes" InstanteneousEvent ||--}o Statement : "is_characterized_by"

Classes

Class Description
AttributeSpecification An attribute is conceptually a thing, but it requires no dedicated identifier (id). Instead, it is linked to a Thing via its has_attributes slot and declares a predicate on the nature of the relationship.
Identifier An identifier is a label that uniquely identifies an item in a particular context. Some identifiers are globally unique. All identifiers are unique within the scope of their issuing agency.
        DOI Digital Object Identifier (DOI; ISO 26324), an identifier system governed by the DOI Foundation, where individual identifiers are issued by one of several registration agencies.
Relationship An association class for characterizing the relation between two things with the role(s) the object had with respect to the subject. A relationship is always between two things only, but can be annotated with multiple roles (for example, a person having both an author role with respect to a dataset, and also being the person who is legally responsible contact for it).
Statement An RDF statement that links a predicate (a Property) with an object (a Thing) to the subject to form a triple. A Statement is used to qualify a relation to a Thing referenced by its identifier. For specifying a qualified relation to an attribute that has no dedicated identifier, use an AttributeSpecification.
Thing The most basic, identifiable item. In addition to the slots that are common between a Thing and an AttributeSpecification (see ThingMixin), two additional slots are provided. The id slot takes the required identifier for a Thing. The relation slot allows for the inline specification of other Thing instances. Such a relation is unqualified (and symmetric), and should be further characterized via a Statement (see is_characterized_by). From a schema perspective, the relation slots allows for building self-contained, structured documents (e.g., a JSON object) with arbitrarily complex information on a Thing.
        Activity An activity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities.
        Agent Something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity.
                Organization A social or legal instititution such as a company, a society, or a university.
                Person Person agents are people, alive, dead, or fictional.
                SoftwareAgent Running software.
        Entity A physical, digital, conceptual, or other kind of thing with some fixed aspects; entities may be real or imaginary.
        InstanteneousEvent A moment of a transition from one particular state of the world to another.
        Location A location can be an identifiable geographic place (ISO 19112), but it can also be a non-geographic place such as a directory, row, or column. As such, there are numerous ways in which location can be expressed, such as by a coordinate, address, landmark, and so forth.
        Property An RDF property, a Thing used to define a predicate, for example in a Statement.
        Role A role is the function of a resource or agent with respect to another resource, in the context of resource attribution or resource relationships.
        ValueSpecification A Thing that is a value of some kind. This class can be used to describe an outcome of a measurement, a factual value or constant, or other qualitative or quantitative information with an associated identifier. If no identifier is available, an AttributeSpecification can be used within the context of an associated Thing (has_attributes).
ThingMixin Mix-in with the common interface of Thing and AttributeSpecification. This interface enables type specifications (rdf:type) for things and attributes. This is complemented by the schema_type slot (also rdf:type) that serves as a type designator for specialized schema classes, to enable targeted validation and data transformation. A thing or attribute can be further describe with statements on qualified relations to other things (is_characterized_by), or inline attributes (has_attributes).
ValueSpecificationMixin Mix-in for a (structured) value specification. Two slots are provided to define a (literal) value (value) and its type (range).

Slots

Slot Description
acted_on_behalf_of Assign the authority and responsibility for carrying out a specific activity ...
at_location Associate the subject with a location
at_time Time at which an instanteneous event takes place or took place
creator An agent responsible for making an entity
ended_at End is when an activity is deemed to have been ended by some trigger
had_roles The function of an entity or agent with respect to another entity or resource
has_attributes Declares a relation that associates a Thing (or another attribute) with an ...
id Globally unique identifier of a metadata object, such as a Thing
identifiers An unambiguous reference to the subject within a given context
is_characterized_by Qualifies relationships between a subject Thing and an object Thing with ...
notation String of characters such as "T58:5" or "30:4833" used to uniquely identify a...
object Reference to a Thing within a Statement
predicate Reference to a Property within a Statement
qualified_relations Characterizes the relationship or role of an entity with respect to the subje...
range Declares that the value of a Thing or AttributeSpecification are instance...
relations Declares an unqualified relation of the subject Thing to another Thing
schema_agency Name of the agency that issued an identifier
schema_type Type designator of a schema element for validation and schema structure handl...
started_at Start is when an activity is deemed to have been started by some trigger
type State that the subject is an instance of a particular RDF class
value Value of a thing
was_associated_with An activity association is an assignment of responsibility to an agent for an...
was_attributed_to Attribution is the ascribing of an entity to an agent
was_derived_from Derivation is a transformation of an entity into another, an update of an ent...
was_generated_by Generation is the completion of production of a new entity by an activity
was_informed_by Communication is the exchange of an entity by two activities, one activity us...

Enumerations

Enumeration Description

Types

Type Description
Boolean A binary (true or false) value
Curie a compact URI
Date a date (year, month and day) in an idealized calendar
DateOrDatetime Either a date or a datetime
Datetime The combination of a date and time
Decimal A real number with arbitrary precision that conforms to the xsd:decimal speci...
Double A real number that conforms to the xsd:double specification
Float A real number that conforms to the xsd:float specification
Integer An integer
Jsonpath A string encoding a JSON Path
Jsonpointer A string encoding a JSON Pointer
Ncname Prefix part of CURIE
Nodeidentifier A URI, CURIE or BNODE that represents a node in a model
Objectidentifier A URI or CURIE that represents an object in the model
Sparqlpath A string encoding a SPARQL Property Path
String A character string
Time A time object represents a (local) time of day, independent of any particular...
Uri a complete URI
Uriorcurie a URI or a CURIE
W3CISO8601 W3C variant/subset of IS08601 for specifying date(times)

Subsets

Subset Description