id: https://concepts.datalad.org/s/things-files/v1
name: things-files-schema
version: 1.0.0
status: eunal:concept-status/DRAFT
title: Concept related to files for use with the things schema
description: |
  Files are modeled as a special case of a `Distribution`, a manifestation
  of some information in the form of a concrete electronic file. Files
  have a certain formation, size in bytes, and checksum. They do not, however,
  have a particular file name. The filename is considered contextual
  information, captured an entity containing a file (such as an archive,
  or a directory). With this approach, one and the same `File` object
  can be reused by multiple "containers" without duplication.

  More information may be available on the schema's [about page](about).

  The schema definition is available as

  - [JSON-LD context](../v1.context.jsonld)
  - [LinkML YAML](../v1.yaml)
  - [LinkML YAML (static/resolved)](../v1.static.yaml)
  - [OWL TTL](../v1.owl.ttl)
  - [SHACL TTL](../v1.shacl.ttl)

  Upcoming changes to this schema may be available in an [(unreleased)
  development version](../../things-files/unreleased).

comments:
  - ALL CONTENT HERE IS UNRELEASED AND MAY CHANGE ANY TIME

license: CC-BY-4.0

prefixes:
  dash: http://datashapes.org/dash#
  dcat: http://www.w3.org/ns/dcat#
  dcterms: http://purl.org/dc/terms/
  dlschemas: https://concepts.datalad.org/s/
  dlthings: https://concepts.datalad.org/s/things/v2/
  eunal: http://publications.europa.eu/resource/authority/
  fabio: http://purl.org/spar/fabio/
  foaf: http://xmlns.com/foaf/0.1/
  linkml: https://w3id.org/linkml/
  obo: http://purl.obolibrary.org/obo/
  owl: http://www.w3.org/2002/07/owl#
  rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
  rdfs: http://www.w3.org/2000/01/rdf-schema#
  skos: http://www.w3.org/2004/02/skos/core#
  spdx: http://spdx.org/rdf/terms#
  xsd: http://www.w3.org/2001/XMLSchema#

default_prefix: dlthings

emit_prefixes:
  - dlthings
  - rdf
  - rdfs
  - skos
  - xsd

imports:
  - dlschemas:things-distributions/v1

slots:
  byte_size:
    description: >-
      The size of the subject in bytes.
    range: NonNegativeInteger
    exact_mappings:
      - dcat:byteSize

  checksums:
    description: >-
      The checksum property provides a mechanism that can be used to verify
      that the contents of a file or package have not changed.
    range: Checksum
    multivalued: true
    inlined: true
    inlined_as_list: true
    exact_mappings:
      - spdx:checksum

  format:
    description: >-
      The file format of a distribution.
    range: Thing
    exact_mappings:
      - dcterms:format
    notes:
      - When type of the distribution is defined by IANA, `media_type` should be used.

  media_type:
    description: >-
      The media type of a distribution as defined by IANA
    range: string
    examples:
      - value: text/csv
    see_also:
      - https://www.iana.org/assignments/media-types
    exact_mappings:
      - dcat:mediaType

classes:
  Checksum:
    is_a: Identifier
    description: >-
      A Checksum is a value that allows to check the integrity of the contents
      of a file. Even small changes to the content of the file will change its
      checksum. This class allows the results of a variety of checksum and
      cryptographic message digest algorithms to be represented.
    exact_mappings:
      - spdx:Checksum
    slot_usage:
      creator:
        description: >-
          Identifies the software agent (algorithm) used to produce the subject
          `Checksum`.
        required: true
        exact_mappings:
          - spdx:algorithm
      notation:
        description: >-
          Lower case hexadecimal encoded checksum digest value.
        range: HexBinary
        required: true
        exact_mappings:
          - spdx:checksumValue

  File:
    is_a: Distribution
    description: >-
      A specific representation of a data item in the form of
      an electronic file. The concept of a file here is aligned
      with the broad UNIX philoshopy of "everything is a file".
      An archive, a disk image, a directory are all valid instances
      of a file.
    slots:
      - byte_size
      - checksums
      - format
      - media_type
      - parts
      - part_of
    slot_usage:
      distribution_of:
        multivalued: true
      parts:
        range: NamedFilePart
        inlined: true
      part_of:
        range: File
        multivalued: true
    comments:
      - There is no `name` property or similar, because the focus is on the
        identity of the distribution content, not how it might be named
        in a particular context. For example, an image in JPEG format
        might be names "20250825_102385.jpg" on a camera and
        "me_at_the_party.jpg" elsewhere, but it would be the exact same
        image. Expressing naming in some context should be done within
        the scope of the containing entity (see `named_parts`).
    broad_mappings:
      - fabio:DigitalManifestation

  NamedFilePart:
    description: >-
      An association class for attaching a `locator` as additional information to a
      `hasPart` relationship between two files.
    slots:
      - locator
      - roles
      - object
    slot_usage:
      locator:
        description: A relative path in POSIX notation.
        key: true
        # ensure no leading slash
        pattern: '^[^/]+.*$'
      object:
        # this must be required, if we want to support key-based
        # access to the named distribution parts. this also
        # implies that we cannot support listings without content
        # identifiers
        required: true
        range: File
