Skip to content

Create your first Custom Metadata Model#

Experimental Feature

The described features, configuration and APIs are work in progress.

This tutorial will guide you through the process of creating your first custom metadata model for piveau. You will use piveau Profile and a custom SHACL file to do this. You will call our model DCAT-AP-Simple - a trimmed version of DCAT-AP with only essential properties. When you finished this tutorial you will have a working piveau backend that can be used to build Open Data portals with a very simple and compact metadata schema.

Prerequisites#

  • Basic knowledge of piveau, SHACL and DCAT
  • A running development instance of piveau hub-repo and hub-search (see Quick Start)

Setup and Configuration#

  • To get started you have your local hub-repo and hub-search setup ready. Make sure you start with a vanilla project with empty Elasticsearch and Virtuoso. For now shutdown the services.
  • Create a new directory dcat-ap-simple outside of the working directories of piveau. For this tutorial we assume the path is: /home/alice/dcap-ap-simple.
  • Set the following configuration for hub-repo AND hub-search (usually in the config.json or via environment variables):
{
    "PIVEAU_FEATURE_FLAGS": {
        "piveau_profile": true 
    },
    "PIVEAU_PROFILE": {
        "type": "directory",
        "path": "/home/alice/dcap-ap-simple"
    }
}
  • The feature flag piveau_profile activates the feature.
  • In PIVEAU_PROFILE you declaring that you will provide the profile in a directory on your disk and you pass the path to that directory.

Info

If you have already other feature flags enabled, you need to merge your existing settings into the JSON object.

  • For now there is nothing more to do in your piveau installation.

Create the SHACL File#

A SHACL file is the single point of truth and core of your custom data model.

Adding Prefixed and Meta Information#

  • Browse to the directory dcat-ap-simple, create a Turtle file dcat-ap-simple.ttl and open it in your favorite IDE. (piveau Profile is currently only supporting Turtle as RDF format.)
  • Add the prefixes and some meta information to the file (you find detailed information inline):
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dcatap: <http://data.europa.eu/r5r/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix pv: <https://piveau.eu/ns/voc#> . # (1)!

pv:DCAT_AP_Simple # (2)!
    a pv:PiveauProfile ;
    pv:profileVersion "1" ; # (3)!
    dct:title "A simple DCAT-AP profile for piveau"@en ;
    dct:description "Based and inspired by DCAT-AP 2.1.1"@en ;
    foaf:maker [
        foaf:mbox <mailto:alice@wonderland.com> ;
        foaf:name "Alice" ;
    ] ;
    owl:versionInfo "0.0.1" . # (4)!
  1. Always use this namespace for piveau.
  2. This meta information is important for helping others to understand the purpose of your profile.
  3. Currently only version 1 of the piveau Profile feature exists.
  4. Here you can set the version of your profile. This supports to keep track of changes.

Adding the Catalog Shape#

  • Now you add a shape for a catalog to dcat-ap-simple.ttl - catalog is one of the core classes of DCAT-AP.
  • A shape is more or less a list of properties you want to store and how to manage each of these properties.
  • Your minimal catalog metadata will allow to store title, description (both in multiple languages) and publisher of the catalog.
  • Please refer to the inline comments for detailed explanations.
dcatap:Catalog_Shape # (1)!
    a sh:NodeShape ;    
    sh:name "Catalog"@en ;
    sh:property [ # (2)!
        pv:mappingClass "SimpleMultiLangTitle" ; # (3)!
        pv:mappingName "title" ; # (4)!
        sh:minCount 1 ; # (5)!
        sh:nodeKind sh:Literal ;
        sh:path dct:title ; # (6)!
        sh:severity sh:Violation ;
    ] ;
    sh:property [
        pv:mappingClass "SimpleMultiLang" ;
        pv:mappingName "description" ;
        sh:minCount 1 ;
        sh:nodeKind sh:Literal ; # (7)!
        sh:path dct:description ;
        sh:severity sh:Violation ; # (8)!
    ] ;
    sh:property [
        pv:mappingClass "Agent" ;  # (9)!
        pv:mappingName "publisher" ;
        sh:maxCount 1 ; # (11)!
        sh:minCount 1 ;
        sh:path dct:publisher ;
        sh:severity sh:Violation ;
    ] ;
    sh:property [
        sh:path dcat:dataset ;
        sh:severity sh:Violation ;
        sh:description "Required property for piveau base functionality." ; # (10)!
    ] ;
    sh:property [
        pv:mappingClass "StandardText" ;
        pv:mappingName "id" ;
        sh:maxCount 1 ;
        sh:description "Required property for piveau base functionality." ;
    ] ;
    sh:targetClass dcat:Catalog .
  1. This is the URI of your shape. You will need the full form (http://data.europa.eu/r5r/Catalog_Shape) later as a reference to it.
  2. sh:property adds a property to your model
  3. pv:mappingClass defines how the property is processed and indexed. In this case, SimpleMultiLangTitle is a field that supports multiple languages and autocomplete.
  4. This is the name of the property in the search service.
  5. If you set this you make this property mandatory.
  6. This is the source of this property in RDF. So the value from this property comes from dct:title.
  7. The sh:nodeKind value is currently to used and only informative.
  8. The sh:severity value is currently to used and only informative.
  9. The Agent class allows to parse and index foaf:Agent.
  10. Some properties are mandatory and required for a correct functionality.
  11. If you set sh:maxCount to 1, the property will not be an array.

You find detailed information about the properties here.

Adding the Dataset Shape#

  • Now you add another shape to define the metadata model of a dataset.
  • As you see, the structure repeats - some details are highlighted in the comments:
dcatap:Dataset_Shape
    a sh:NodeShape ;
    sh:name "Dataset"@en ;
    sh:property [
        pv:mappingClass "SimpleMultiLangTitle" ;
        pv:mappingName "title" ;
        sh:minCount 1 ;
        sh:nodeKind sh:Literal ;
        sh:path dct:title ;
        sh:severity sh:Violation ;
    ] ;
    sh:property [
        pv:mappingClass "SimpleMultiLang" ;
        pv:mappingName "description" ;
        sh:minCount 1 ;
        sh:nodeKind sh:Literal ;
        sh:path dct:description ;
        sh:severity sh:Violation ;
    ] ;
    sh:property [
        pv:mappingClass "Keywords" ;  # (1)!
        pv:mappingName "keywords" ;
        sh:nodeKind sh:Literal ;
        sh:path dcat:keyword ;
        sh:severity sh:Violation ;
    ] ;
    sh:property [
        pv:mappingClass "Language" ;
        pv:mappingName "language" ;
        sh:path dct:language ;
        sh:severity sh:Violation ;
    ] ;
    sh:property [
        pv:mappingClass "Agent" ;
        pv:mappingName "creator" ;
        sh:maxCount 1 ;
        sh:path dct:creator ;
        sh:severity sh:Violation ;
    ] ;
    sh:property [
        pv:mappingClass "ContactPoint" ;
        pv:mappingName "contact_point" ;
        sh:path dcat:contactPoint ;
        sh:severity sh:Violation ;
    ] ;
    sh:property [
        pv:mappingClass "DateTime" ;
        pv:mappingName "issued" ;
        sh:maxCount 1 ;
        sh:path dct:issued ;
        sh:severity sh:Violation ;
        sh:shape dcatap:DateOrDateTimeDataType_Shape ;
    ] ;
    sh:property [
        pv:mappingClass "Nested" ; # (2)!
        pv:mappingLink dcatap:Distribution_Shape ; # (3)!
        pv:mappingName "distributions" ;
        sh:path dcat:distribution ;
        sh:severity sh:Violation ;
    ] ;
    sh:property [
        pv:mappingClass "Theme" ;
        pv:mappingName "categories" ;
        sh:path dcat:theme ;
        sh:severity sh:Violation ;
    ] ;
    sh:property [
        pv:mappingClass "StandardText" ;
        pv:mappingName "id" ;
        sh:maxCount 1 ;
        sh:description "Required property for piveau base functionality." ;
    ] ;
    sh:property [
        pv:mappingClass "Nested" ;
        pv:mappingName "catalog_record" ;
        pv:mappingProperty [
            pv:mappingClass "DateTime" ;
            pv:mappingName "issued" ;
            sh:maxCount 1 ;
        ] ;
        pv:mappingProperty [
            pv:mappingClass "DateTime" ;
            pv:mappingName "modified" ;
            sh:maxCount 1 ;
        ] ;
        sh:maxCount 1 ;
        sh:description "Required property for piveau base functionality." ;
    ] ;
    sh:property [
        pv:mappingClass "Nested" ;
        pv:mappingName "catalog" ;
        pv:mappingProperty [
            pv:mappingClass "StandardText" ;
            pv:mappingName "id" ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
        ] ;
        sh:maxCount 1 ;
        sh:description "Required property for piveau base functionality." ;
    ] ;
    sh:property [
        sh:maxCount 1 ;
        pv:mappingClass "SpatialResource" ;
        pv:mappingName "country" ;
        sh:description "Required property for piveau base functionality." ;
    ] ;
    sh:property [
        pv:mappingClass "Metrics" ;
        pv:mappingName "quality_meas" ;
        sh:maxCount 1 ;
        sh:description "Required property for piveau base functionality." ;
    ] ;
    sh:targetClass dcat:Dataset .
  1. The mapping class Keywords allows you to to index keywords.
  2. Nested is a very important class. It allows to link to other shapes. Here the distribution shape, which is introduced in the next section.
  3. In pv:mappingLink the connection is defined. In the search service the referenced model will be embedded.

Adding the Distribution Shape#

  • You finalize the model with the shape for distributions.
  • This shape was referenced in the dataset shape above.
dcatap:Distribution_Shape
    a sh:NodeShape ;
    sh:property [
        pv:mappingClass "StandardText" ;
        pv:mappingName "id" ;
        sh:maxCount 1 ;
        sh:description "Required property for piveau base functionality." ;
    ] ;
    sh:property [
        pv:mappingClass "License" ;
        pv:mappingName "license" ;
        sh:maxCount 1 ;
        sh:path dct:license ;
        sh:severity sh:Violation ;
    ] ;
    sh:property [
        pv:mappingClass "Format" ;
        pv:mappingName "format" ;
        sh:maxCount 1 ;
        sh:path dct:format ;
        sh:severity sh:Violation ;
    ] ;
    sh:property [
        pv:mappingClass "StandardDisabled" ;
        pv:mappingName "access_url" ;
        sh:minCount 1 ;
        sh:nodeKind sh:BlankNodeOrIRI ;
        sh:path dcat:accessURL ;
        sh:severity sh:Violation ;
    ] ;
    sh:property [
        pv:mappingClass "SimpleMultiLang" ;
        pv:mappingName "title" ;
        sh:nodeKind sh:Literal ;
        sh:path dct:title ;
        sh:severity sh:Violation ;
    ] ;
    sh:property [
        pv:mappingClass "StandardDisabled" ;
        pv:mappingName "download_url" ;
        sh:nodeKind sh:BlankNodeOrIRI ;
        sh:path dcat:downloadURL ;
        sh:severity sh:Violation ;
    ] ;
    sh:property [
        pv:mappingClass "SimpleMultiLang" ;
        pv:mappingName "description" ;
        sh:nodeKind sh:Literal ;
        sh:path dct:description ;
        sh:severity sh:Violation ;
    ] ;
    sh:targetClass dcat:Distribution .

Create the piveau.json#

Now you create the entry point of your simple profile.

  • Create a file in the dcat-ap-simple directory with the name piveau.json and the following content:
{
  "version": "1",
  "id": "dcat-ap-simple",
  "core": [
    {
      "id": "dataset",
      "description": "A simple representation of a dataset",
      "path": "dcat-ap-simple.ttl",
      "name": "dataset",
      "shapeUri": "http://data.europa.eu/r5r/Dataset_Shape"
    },
    {
      "id": "catalog",
      "description": "A simple representation of a catalogue",
      "path": "dcat-ap-simple.ttl",
      "name": "catalog",
      "shapeUri": "http://data.europa.eu/r5r/Catalog_Shape"
    }
  ]
}
  • This file connects you SHACL file and the containing shapes with the core entities (catalogs and datasets) of piveau.
  • The value of the path key needs to be set to your SHACL file.
  • The shapeUri has to match the full URI of the shape in the SHACL file - do not use the prefix here.
  • It is also possible to separate the shapes into multiple SHACL files.

Start the Services#

Now you are all setup and ready to start hub-repo and hub-search and test your data model.

  • Start your databases, hub-repo and hub-search.
  • You will notice the following log entries in both services:
INFO  io.piveau.profile.ProfileLoader - Loaded piveau profile 'dcat-ap-minimal'
INFO  i.p.h.search.util.index.IndexManager - Loaded shape successfully for dataset
INFO  i.p.h.search.util.index.IndexManager - Loaded shape successfully for catalogue
  • After successfully launching the service you can browse to http://localhost:8081/profile (both service, hub-repo and hub-search offer this endpoint) to get information about the installed profile.

Profile Endpoint

Testing Everything#

It is now time that you test how the new profile is affecting your piveau instance.

The Search Schema#

Elasticsearch Mapping

  • You will find here all your properties with names as defined in pv:mappingName.
  • If you have a look into the OpenAPI of hub-search you will find that the model of the dataset is also fitting our profile:

OpenAPI

Info

Since hub-repo is schema agnostic, who will not find changes there.

Creating Test Data#

Below you find concrete instances and requests to create a catalog and a dataset that match the defined data model.

Simple Catalog Example#

  • Create a catalog like this:
PUT http://localhost:8081/catalogues/simple-catalog
Content-Type: text/turtle
X-API-Key: {{api-key}}


@prefix dcat:   <http://www.w3.org/ns/dcat#> .
@prefix dct:    <http://purl.org/dc/terms/> .
@prefix foaf:   <http://xmlns.com/foaf/0.1/> .

<https://piveau.io/id/catalogue/dcat-ap-simple-catalog>
    a                  dcat:Catalog ;
    dct:type           "dcat-ap" ;
    dct:title          "DCAT-AP simple Example Catalogue"@en ;
    dct:description    "This catalog holds examples DCAT-AP simple Datasets"@en ;
    dct:publisher      <https://piveau.io/def/publisher/piveau> .

<https://piveau.io/def/publisher/piveau>
    a               foaf:Organization ;
    foaf:homepage   <https://piveau.io> ;
    foaf:mbox       <mailto:info@piveau.de> ;
    foaf:name       "Piveau" .
  • If you query now hub-search (http://localhost:8083/catalogues/simple-catalog) you see an actual instance in hub-search of the simple metadata model:

Catalog Result

Simple Dataset Example#

  • The same way you can also create an instance of dataset:
PUT http://localhost:8081/catalogues/simple-catalog/datasets/origin?originalId=simple-dataset
Content-Type: text/turtle
X-API-Key: {{api-key}}


@prefix dcat:   <http://www.w3.org/ns/dcat#> .
@prefix dct:    <http://purl.org/dc/terms/> .
@prefix dc:     <http://purl.org/dc/elements/1.1/> .
@prefix dcatap: <http://data.europa.eu/r5r/> .
@prefix foaf:   <http://xmlns.com/foaf/0.1/> .
@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix skos:   <http://www.w3.org/2004/02/skos/core#> .
@prefix vcard:  <http://www.w3.org/2006/vcard/ns#> .
@prefix xsd:    <http://www.w3.org/2001/XMLSchema#> .


<https://piveau.io/set/data/dcat-ap-simple-dataset>
    a                   dcat:Dataset ;
    dct:title           "This is a DCAT-AP simple dataset"@en ;
    dct:description     "This is a description of a DCAT-AP simple dataset"@en ;
    dct:language        <http://publications.europa.eu/resource/authority/language/ENG> ;
    dct:creator         <https://piveau.io/def/creator/piveau> ;
    dcat:theme          <http://publications.europa.eu/resource/authority/data-theme/TECH> ;
    dct:issued          "2024-07-31T00:00:00"^^xsd:dateTime ;
    dcat:contactPoint               [   a              vcard:Individual ;
                                    vcard:hasEmail <mailto:john@doe.de> ;
                                    vcard:fn       "John Doe" ;
                                    vcard:hasAddress [ 
                                        vcard:street-address "John Doe Str." ;
                                        vcard:locality "Berlin" ;
                                        vcard:postal-code "12345" ;
                                        vcard:country-name "Germany"
                                    ] ;
                                    vcard:hasTelephone "0049123456789" ;
                                    vcard:hasURL <http://www.johndoe.de> ;
                                    vcard:hasOrganizationName "John Doe Inc." ] ;
    dcat:distribution   <https://piveau.io/set/distribution/1> ;
    dcat:keyword        "piveau"@en, "opendata"@en .

<https://piveau.io/set/distribution/1>
    a                               dcat:Distribution ;
    dct:title                       "Example Distribution "@en ;
    dct:description                 "Example Distribution Description"@en ;
    dcat:accessURL                  <https://myactualdata.com/file> ;
    dcat:downloadURL                <https://myactualdata.com/downdload/file.csv>  ;
    dct:license                     [   
                                        a   dct:LicenseDocument ;
                                        skos:prefLabel "My License" ;
                                        dct:title "This is my custom License" ;
                                        skos:exactMatch  "my-license"
                                    ] ;
    dct:format                      <http://publications.europa.eu/resource/authority/file-type/CSV> .

<https://piveau.io/def/creator/piveau>
    a               foaf:Organization ;
    foaf:homepage   <https://piveau.io> ;
    foaf:mbox       <mailto:info@piveau.de> ;
    foaf:name       "Piveau" .
  • You can query it from here: http://localhost:8083/datasets/simple-dataset

Conclusion#

Now you are able to create your own metadata model for piveau, install it and test it out. You can use the DCAT-AP-Simple example as a starting point for a bigger and more complex model. For further inspiration you can also have a look into the build-in SHACL file that powers piveau by default.