Skip to content

Data Catalogues Done Right#

piveau is an open source metadata catalogue solution. It is highly scalable and covers the essential life cycle of your metadata: harvesting, storage and quality assurance.

piveau was designed and developed around Semantic Web technologies, the W3C standard DCAT and the European standard for Open Data DCAT-AP. It closes the gap between formal metadata specifications and their application in production. piveau puts a strong emphasis on Open Data and is a leading solution for public administrations and non-profit organizations to publish interoperable and flexible metadata catalogues.

Banner

Background#

Datasets#

It is customary in data management to divide data into individual chunks, so called datasets. A dataset holds data about a certain topic. This could be for example the demographic development of a country over a certain period of time or the number of people who have been using the public transportation system of a city during the last months. A dataset contains two things:

  • information about the data itself ("metadata"), such as the time the dataset was created or changed, a title and a description
  • distributions which contain the actual data, they are mostly presented in the form of XLS, CSV or other file formats

DCAT-AP#

One of the most widely adopted standards for the description of datasets is DCAT and its extension DCAT Application profile for data portals in Europe (DCAT-AP). The latter adds metadata fields and mandatory property ranges, making it suitable for use with Open Data management platforms.

piveau Components#

Piveau is based on a microservice architecture and a custom pipeline system, facilitating a flexible and scalable feature composition.

piveau hub#

Hub is the central component to store and register the data. Its persistence layer consists of a Virtuoso triplestore as the principal database, Elasticsearch as the indexing server and a MongoDB for storing binary files.

piveau consus#

Consus is responsible for the data acquisition from various sources and data providers. This includes scheduling, transformation and harmonization.

piveau metrics#

Metrics is responsible for creating and maintaining comprehensive quality information and feeding them back to the Hub.

piveau pipeline (PPL)#

The piveau pipeline can be imaged as a data processing chain which is described by a plain JSON document with a list of segments. These segments correspond with steps that are performed by the piveau services. Every segment includes at least meta-information, targeting the respective service and defining the consecutive service(s). The entire descriptor is passed from service to service as state information.

How is Piveau used?#

The piveau codebase is licensed under Apache 2.0 and can be found in our central GitLab repository.