Schema for a generic data distribution record
This schema is centered on the Distribution
class for
describing concrete data distributions, such as an individual file, an archive
of files, or a directory of files.
The schema builds on the elements and principles of the Thing and Provenance schemas, and extends them with elements from DCAT vocabulary.
Through the joint set of included concepts and properties this schema supports the description of
- data versions and composition
- data access methods
- data access rights and policies
- related resources, including topics, data types/formats
- provenance of data and related entities
Importantly, all this information can be represented using the
Distribution
class as a structural container. Hence this schema
is particularly suitable for systems that (only) support attaching metadata to
data objects.
For more information, see the general documentation, and concrete examples on the documentation pages of individual classes. Some noteworthy examples are
- data type annotation
- data format annotation
- dataset as an outcome of a study
- access to a
Distribution
- dataset version in the form of a Git commit
- git-annex remote as a
DataService
The schema is available as
URI: https://concepts.datalad.org/s/distribution/unreleased
Name: distribution-schema
Schema Diagram
Classes
Class | Description |
---|---|
AttributeSpecification | An attribute is conceptually a thing, but it requires no dedicated identifier (id ). Instead, it is linked to a Thing via its has_attributes slot and declares a predicate on the nature of the relationship. |
Checksum | A Checksum is a value that allows to check the integrity of the contents of a file. Even small changes to the content of the file will change its checksum. This class allows the results of a variety of checksum and cryptographic message digest algorithms to be represented. |
DistributionPart | An association class for attaching additional information to a hasPart relationship. |
Identifier | An identifier is a label that uniquely identifies an item in a particular context. Some identifiers are globally unique. All identifiers are unique within the scope of their issuing agency. |
DOI | Digital Object Identifier (DOI; ISO 26324), an identifier system governed by the DOI Foundation, where individual identifiers are issued by one of several registration agencies. |
QualifiedAccess | An association class for attaching additional information to an access_service relationship between a dcat:Distribution and a dcat:DataService . |
Relationship | An association class for characterizing the relation between two things with the role(s) the object had with respect to the subject. A relationship is always between two things only, but can be annotated with multiple roles (for example, a person having both an author role with respect to a dataset, and also being the person who is legally responsible contact for it). |
Statement | An RDF statement that links a predicate (a Property ) with an object (a Thing ) to the subject to form a triple. A Statement is used to qualify a relation to a Thing referenced by its identifier. For specifying a qualified relation to an attribute that has no dedicated identifier, use an AttributeSpecification . |
Thing | The most basic, identifiable item. In addition to the slots that are common between a Thing and an AttributeSpecification (see ThingMixin ), two additional slots are provided. The id slot takes the required identifier for a Thing . The relation slot allows for the inline specification of other Thing instances. Such a relation is unqualified (and symmetric), and should be further characterized via a Statement (see is_characterized_by ). From a schema perspective, the relation slots allows for building self-contained, structured documents (e.g., a JSON object) with arbitrarily complex information on a Thing . |
Activity | An activity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities. |
Agent | Something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity. |
Organization | A social or legal instititution such as a company, a society, or a university. |
Person | Person agents are people, alive, dead, or fictional. |
SoftwareAgent | Running software. |
Entity | A physical, digital, conceptual, or other kind of thing with some fixed aspects; entities may be real or imaginary. |
Distribution | A specific representation of data, which may come in the form of a single file, or an archive or directory of many files, may be standalone or part of a dataset. |
LicenseDocument | A legal document giving official permission to do something with a resource. |
Resource | Resource published or curated by a single agent. |
DataService | A collection of operations that provides access to one or more distributions or data processing functions. |
InstanteneousEvent | A moment of a transition from one particular state of the world to another. |
Location | A location can be an identifiable geographic place (ISO 19112), but it can also be a non-geographic place such as a directory, row, or column. As such, there are numerous ways in which location can be expressed, such as by a coordinate, address, landmark, and so forth. |
Property | An RDF property, a Thing used to define a predicate , for example in a Statement . |
Role | A role is the function of a resource or agent with respect to another resource, in the context of resource attribution or resource relationships. |
ValueSpecification | A Thing that is a value of some kind. This class can be used to describe an outcome of a measurement, a factual value or constant, or other qualitative or quantitative information with an associated identifier. If no identifier is available, an AttributeSpecification can be used within the context of an associated Thing (has_attributes ). |
ThingMixin | Mix-in with the common interface of Thing and AttributeSpecification . This interface enables type specifications (rdf:type ) for things and attributes. This is complemented by the schema_type slot (also rdf:type ) that serves as a type designator for specialized schema classes, to enable targeted validation and data transformation. A thing or attribute can be further describe with statements on qualified relations to other things (is_characterized_by ), or inline attributes (has_attributes ). |
ValueSpecificationMixin | Mix-in for a (structured) value specification. Two slots are provided to define a (literal) value (value ) and its type (range ). |
Slots
Slot | Description |
---|---|
access_service | A data service that gives access to a distribution |
access_url | URL that gives access to the subject |
acted_on_behalf_of | Assign the authority and responsibility for carrying out a specific activity ... |
address | Physical address of the subject, such as a postal address, a bibliographic lo... |
affiliation | An organization that an agent is affiliated with |
algorithm | The algorithm or rules to follow to compute a score, an effective method expr... |
at_location | Associate the subject with a location |
at_time | Time at which an instanteneous event takes place or took place |
byte_size | The size of a distribution in bytes |
checksum | The checksum property provides a mechanism that can be used to verify that th... |
contact_point | Relevant contact information for the subject |
creator | An agent responsible for making an entity |
date_modified | Date on which the resource was (last) changed, updated or modified |
date_published | Date on which the resource was (last) changed, updated or modified |
digest | Lower case hexadecimal encoded checksum digest value produced using a specifi... |
distribution | An available distribution of a resource |
download_url | URL that gives direct access to the subject in the form of a downloadable fil... |
download_url_template | A URL template with placeholders enclosed in braces ({example} ) |
Email address associated with an entity | |
ended_at | End is when an activity is deemed to have been ended by some trigger |
endpoint_description | A description of the services available via the end-points, including their o... |
endpoint_url | The root location or primary endpoint of a service (a Web-resolvable IRI) |
format | The file format of a distribution |
had_roles | The function of an entity or agent with respect to another entity or resource |
has_attributes | Declares a relation that associates a Thing (or another attribute) with an ... |
has_part | A related resource that is included either physically or logically in the des... |
id | Globally unique identifier of a metadata object, such as a Thing |
identifiers | An unambiguous reference to the subject within a given context |
is_characterized_by | Qualifies relationships between a subject Thing and an object Thing with ... |
is_distribution_of | Inverse property of dcat:distribution |
is_part_of | A related resource that is included either physically or logically in the des... |
is_version_of | A related resource of which the described resource is a version |
keyword | One or more keywords or tags describing the resource |
landing_page | A Web page that can be navigated to in a Web browser to gain access to a reso... |
license | A legal document under which the resource is made available |
license_text | A copy of the actual text of a license reference, file or snippet that is ass... |
media_type | The media type of a distribution as defined by IANA |
name | Name of a thing |
notation | String of characters such as "T58:5" or "30:4833" used to uniquely identify a... |
object | Reference to a Thing within a Statement |
predicate | Reference to a Property within a Statement |
qualified_access | Link to a description of a access_service relationship with `dcat:DataServi... |
qualified_part | Qualified a hasPart relationship with another entity |
qualified_relations | Characterizes the relationship or role of an entity with respect to the subje... |
range | Declares that the value of a Thing or AttributeSpecification are instance... |
relations | Declares an unqualified relation of the subject Thing to another Thing |
schema_agency | Name of the agency that issued an identifier |
schema_type | Type designator of a schema element for validation and schema structure handl... |
started_at | Start is when an activity is deemed to have been started by some trigger |
type | State that the subject is an instance of a particular RDF class |
value | Value of a thing |
version | Version indicator (name or identifier) of a resource |
was_associated_with | An activity association is an assignment of responsibility to an agent for an... |
was_attributed_to | Attribution is the ascribing of an entity to an agent |
was_derived_from | Derivation is a transformation of an entity into another, an update of an ent... |
was_generated_by | Generation is the completion of production of a new entity by an activity |
was_informed_by | Communication is the exchange of an entity by two activities, one activity us... |
Enumerations
Enumeration | Description |
---|---|
Types
Type | Description |
---|---|
Boolean | A binary (true or false) value |
Curie | a compact URI |
Date | a date (year, month and day) in an idealized calendar |
DateOrDatetime | Either a date or a datetime |
Datetime | The combination of a date and time |
Decimal | A real number with arbitrary precision that conforms to the xsd:decimal speci... |
Double | A real number that conforms to the xsd:double specification |
EmailAddress | RFC 5322 compliant email address |
Float | A real number that conforms to the xsd:float specification |
HexBinary | hex-encoded binary data |
Integer | An integer |
Jsonpath | A string encoding a JSON Path |
Jsonpointer | A string encoding a JSON Pointer |
Ncname | Prefix part of CURIE |
Nodeidentifier | A URI, CURIE or BNODE that represents a node in a model |
NonNegativeInteger | An integer |
Objectidentifier | A URI or CURIE that represents an object in the model |
Sparqlpath | A string encoding a SPARQL Property Path |
String | A character string |
Time | A time object represents a (local) time of day, independent of any particular... |
Uri | a complete URI |
Uriorcurie | a URI or a CURIE |
W3CISO8601 | W3C variant/subset of IS08601 for specifying date(times) |
Subsets
Subset | Description |
---|---|