About the things schema

This schema aims to define a lean data structure that can express arbitrarily detailed information in a relatively flat structure. It achieves this by following a few basic principles:

  • every "thing" is required to have one unique identifier
  • linking instead of nesting
  • schema class type designator
  • qualified relationships

Each of these aspects is detailed in the following sections.

Every thing must have an identifier

The Thing class, the main class of the schema, has an id slot, which takes a URI or CURIE. This slot is required, hence any instance of such a class must have a URI or CURIE identifier specified to be valid. This is a strong requirement, and an essential pillar for the schema, which will become clear in the following sections.

What if there is no identifier?

For many types of "things" identification systems have been developed (countries, institutions, publications, researchers, etc.). However, not every conceivable "thing" comes with a (globally) unique identifier. In such cases it can help to define an arbitrary, but dedicated namespace, and assign unique identifiers in that space.

A concrete example could be a research consortium and its contributors. Typically, such a consortium has a website with a dedicated domain. If the domain is example.org, and the name of a contributor is Elena Piscopia, a suitable identifier could be https://example.org/contributors/elena-piscopia. If a webpage on that person would be available at this address, the identifier would even resolve to more (human-readable) information, but this is not required.

With this approach, identifiers for anything and everything can be generated. However, this approach also requires to establish a process that guarantees that no two different "things" are assigned the same identifier.

What if there is more than one identifier?

The one-identifier-and-one-identifier-only requirement only applies to the mechanics of the things schema and the id slot of the Thing class in particular. Beyond that, there is no constraint on the nature and number of identifiers associated with a Thing. Such identifiers are attributes of a Thing and can be expressed as such. The schema documentation contains an example showing how this can be done via the has_attributes slot of a Thing. For schema development, the identifiers extension schema is worth a look. It provides a dedicated slot and class for representing identifiers.

Linking, not nesting

In order to keep the structure of data records reasonably shallow, relationships between things are declared by referencing another thing's identifier, rather than by inlining a Thing record. Making this possible is the reason for having a required id slot in the Thing class.

There is only one exception to this rule: the relations slot. Here, other Thing records can be inlined. The purpose of this slot is to largely avoid the need for a top-level, array-like data structure that can hold any number of data records. Using the relations slot, it is possible to represent arbitrarily rich information in a monolithic data structure.

There is one other case of record-inlining in the things schema (besides few association helper classes): attributes. Attributes are information that are not things (which would have an identifier), but nevertheless describe a Thing. An example of such an attribute is the mass of a physical object. A dedicated AttributeSpecification class defines how attributes are expressed.

Type designator slot

The relations slot for inlining Thing records has a range of Thing (expects records of type Thing). This implies a limitation on the validation of such records. For example, a Thing may actually be a InventoryItem, using a data model defined in a derived schema. Such an InventoryItem may define additional slots with their own constraints, and it should be possible to perform a targeted validation of such records.

For this purpose, Thing provides a schema_type slot. This slots takes an identifier of a schema class (including classes in derived schemas), which provide the effective data model for validation.

Qualified relationships

Qualified relation is an essential pattern used by the things schema, and also its derivatives and extensions. Within the things schema it is used by the is_characterized_by slot (and to some degree also for has_attributes) for characterizing the relationship between things. The relationship between two things is qualified via an inline Statement that assigns a predicate to the relationship between a subject-thing, and a related object-thing. See the example for a topic annotation for a concrete demo.