Data Catalogues Done Right#
piveau is an open source metadata catalogue solution. It is highly scalable and covers the essential life cycle of your metadata: harvesting, storage and quality assurance.
piveau was designed and developed around Semantic Web technologies, the W3C standard DCAT and the European standard for Open Data DCAT-AP. It closes the gap between formal metadata specifications and their application in production. piveau puts a strong emphasis on Open Data and is a leading solution for public administrations and non-profit organizations to publish interoperable and flexible metadata catalogues.
Background#
Datasets#
It is customary in data management to divide data into individual chunks, so called datasets. A dataset holds data about a certain topic. This could be for example the demographic development of a country over a certain period of time or the number of people who have been using the public transportation system of a city during the last months. A dataset contains two things:
- information about the data itself ("metadata"), such as the time the dataset was created or changed, a title and a description
- distributions which contain the actual data, they are mostly presented in the form of XLS, CSV or other file formats
DCAT-AP#
One of the most widely adopted standards for the description of datasets is DCAT and its extension DCAT Application profile for data portals in Europe (DCAT-AP). The latter adds metadata fields and mandatory property ranges, making it suitable for use with Open Data management platforms.
piveau Components#
Piveau is based on a microservice architecture and a custom pipeline system, facilitating a flexible and scalable feature composition.
piveau hub#
Hub is the central component to store and register the data. Its persistence layer consists of a Virtuoso triplestore as the principal database, Elasticsearch as the indexing server and a MongoDB for storing binary files.
piveau consus#
Consus is responsible for the data acquisition from various sources and data providers. This includes scheduling, transformation and harmonization.
piveau metrics#
Metrics is responsible for creating and maintaining comprehensive quality information and feeding them back to the Hub.
piveau pipeline (PPL)#
The piveau pipeline can be imaged as a data processing chain which is described by a plain JSON document with a list of segments. These segments correspond with steps that are performed by the piveau services. Every segment includes at least meta-information, targeting the respective service and defining the consecutive service(s). The entire descriptor is passed from service to service as state information.
How is Piveau used?#
The piveau codebase is licensed under Apache 2.0 and can be found in our central GitLab repository.