Skip to content

User Guide#

piveau-hub offers to methods of interaction:

  • A comprehensive RESTful API, that offers all functionaltiies of piveau-hub
  • A Single-Page-Application (piveau-hub-ui), that is mainly designed for the end user.

Using the API#

The main API and interaction point is offered by the piveau-hub-repo service. You find the OpenAPI documentation on the base route of the service (e.g. http://localhost:8081, if you followed the quick start guide). The documentation offers details about all available endpoints and interaction methods. To get you started we will create a catalogue and dataset by example. This requires you have piveau-hub running and the API key on hand. You find the latter in the configuration of the piveau-hub-repo.

Create a catalogue#

A catalogue is a container for organising and managing your datasets. Every dataset must be included within a catalog; it is not possible for a dataset to be outside of a catalogue.

PUT http://localhost:8081/catalogues/example-catalog
Content-Type: text/turtle
Authorization: {{api-key}}

@prefix dct: <http://purl.org/dc/terms/> .

<https://example.io/id/catalogue/example-catalog>
    a dcat:Catalog ;
    dct:type "dcat-ap";
    dct:title "Example Catalog"@en ;
    dct:description "This is an example Catalog"@en ;
    dct:language  <http://publications.europa.eu/resource/authority/language/ENG> ;
After a successful request, you find your catalogue here:

GET http://localhost:8081/catalogues/example-catalog
The catalogue will also be indexed by the piveau-hub-search service:

GET http://localhost:8083/catalogues/example-catalog

Super Catalogue#

You can utilise the DCAT-AP fields dct:hasPart and dct:isPartOf to organise your catalogues within a designated super catalogue. A super catalogue should include dct:hasPart field to list all the catalogues that belong to it. Similarly, a catalogue should have a dct:isPartOf field to establish a link with its corresponding super catalogue, if applicable. Note that:

  • A super catalogue can contain N catalogues.
  • A super catalogue cannot contain other super catalogues.
  • A super catalogue cannot contain datasets.
  • Each catalogue can only belong to 1 super catalogue.
  • A catalogue does not have to belong to a super catalogue.

The diagram below illustrates the limitation discussed above:

graph TD

SC(Europe) --> C1(Germany)
SC(Europe) --> C2(UK)

C1(Germany) --> D1(Berlin)
C1(Germany) --> D2(NRW)
C2(UK) --> D3(Wales)
C2(UK) --> D4(Northen Ireland)
C3(North Pole) --> D5(Bears)
C3(North Pole) --> D6(Penguins)

SC ~~~ C3

subgraph Super Catalogue
  SC
end
subgraph Catalogues
  C1
  C2
  C3
end
subgraph Datasets
  D1
  D2
  D3
  D4
  D5
  D6
end

classDef green fill:#9f6,stroke:#333,stroke-width:2px;
classDef orange fill:#f96,stroke:#333;
classDef white fill:#fff,stroke:#333;
class SC green
class C1 orange
class C2 orange
class C3 orange
class D1 white
class D2 white
class D3 white
class D4 white
class D5 white
class D6 white

Create a dataset#

After creating a catalogue, now we can add a dataset to our catalogue.

PUT http://localhost:8081/datasets/example-dataset?catalogue=example-catalog
Content-Type: text/turtle
Authorization: {{api-key}}

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .

<https://example.io/set/data/example-dataset>
    a dcat:Dataset ;
    dct:title "Example Dataset 2"@en ;
    dct:description "This is an example Dataset" ;
    dct:issued "2015-08-28T00:00:00"^^xsd:dateTime ;
    dcat:distribution <https://example.io/set/distribution/1> .

<https://example.io/set/distribution/1>
    a dcat:Distribution ;
    dcat:accessURL <http://a-csv-file.com> ;
    dct:format <http://publications.europa.eu/resource/authority/file-type/CSV>  ;
    dct:title "Example Distribution 1" .
After a successful request, you find your dataset here:

GET http://localhost:8081/catalogues/example-catalogue/datasets/example-dataset

Important

In the hub-repo a dataset is always scoped with the catalogue it belongs to. So in most cases you will need to pass the catalogue ID in your requests.

The dataset will also be indexed by the piveau-hub-search service:

GET http://localhost:8083/datasets/example-dataset

Using the UI#

Work in progress...

Using the search API#

Searching#

To simply to a full-text search, you can use the q parameter. In the following example, we search for "health":

GET http://localhost:8083/search?q=health

If you like to do an autocomplete search, you can enable the autocomplete parameter. The administrator can configure which field is used for autocomplete, one is title. In the following example we do an autocomplete search in the catalogues index:

GET http://localhost:8083/search?filter=catalogue&q=hea&autocoplete=true

Note: The filter parameter must be set for these parameters. The index must have a configured autocomplete field. The filter parameter is explaine in the facets section.

Pagination#

By default, a search query returns 10 results per page. To increase the number of results per page, you can use the limit parameter. It allows to increase the number of results up to 1000 results per page. To go through the pages, you can use the page parameter.

It starts counting at 1, so to access the first page with 15 results per page you call:

GET http://localhost:8083/search?page=0&limit=15

To access the second page you would increase page by one:

GET http://localhost:8083/search?page=1&limit=15

By default, the search service returns the first page, i.e. page is set to 0.

Using page and limit is limited to maximum result window. It can be configured by the administrator and increased on demand. If page*limit is higher than the maximum result window, the search service won't return any results.

If you want to iterate through all the search results for a particular query, you should use the scroll API. The scroll API takes a snapshot of the current search result list. When you paginate with page and limit pages could change in between through indexing or deletion.

To use the scroll API, you must set the scroll parameter:

GET http://localhost:8083/search?scroll=true
{
  "result": {
    "count": 1634095,
    "scrollId": "FGluY2x1ZGVfY29udGV4dF91...",
    "results": [...]
  }
}

Afterwards you can use the scrollId to pass it to the scroll API:

GET http://localhost:8083/scroll?scrollId=FGluY2x1ZGVfY29udGV4dF91...

Everytime you call the scroll API with this id you get another page of results.

Sorting#

If you want to sort the search results according to a sort order, you can specify this using the sort parameter. By default, the search service sorts by relevance. Relevance is measured according to a score. To return the score, you can enable the showScore parameter:

GET http://localhost:8083/search?q=health&showScore=true

For a sort you can either sort by relevance or by a field. For both you must define if the sorting is ascending (asc) or (descending).

In the following example, sorting is primarily by relevance in descending order. So most relevant search results come first. "But hey, there is a second and a third sort?" Yes, you can define secondary sorting. In this example, it means that whenever two search results have the same score, they will be sorted in descending order according to the modified field. And if they also have the same modified field, they will be sorted by their English title in ascending order.

GET http://localhost:8083/search?sort=relevance+desc,modified+desc,title.en+asc

Facets#

An important concept of searching are facets. Facets are used for aggregations and filtering. To enable facets in the search result, you must specify the filter parameter. The filter parameter specifies in which search index we are searching. As each search index has its own structure they provide different facets. In the following example, you would search in the datasets index.

GET http://localhost:8083/search?filter=dataset
{
  "result": {
    "index": "dataset",
    "count": 1616281,
    "facets": [
      ...
      {
        "id": "country",
        "title": "Provenance",
        "items": [
          {
            "count": 601663,
            "id": "de",
            "title": "Germany"
          },
          ...
        ]
      }
    ],
    "results": [...]
  }
}

In the example above you can see a part of a facet. A facet always contains an id, a title and a list of items, sorted by the highest counting. Each item contains an id, a count and a title. The counting of the facet is called aggregation. Aggregation allows you to see how the data is distributed and get an overview of the entire search index. It can also guide you if you don't know exactly what you are looking for. You can see the aggregation and get an idea of how you would like to filter.

If you don't need the aggregation, you can disable it. Disabling the aggregation will remove the facets from search result, but saves some time for preparing.

GET http://localhost:8083/search?filter=dataset&aggregation=false

If you only want some of the facets returned, you can use the aggregationAllFields and aggregationFields parameters:

GET http://localhost:8083/search?filter=dataset&aggregationAllFields=false&aggregationFields=country,format

By default, the aggregation values decrease the more you narrow down the search results by searching and filtering. If you want aggregation to be not influenced by it, you can enable the globalAggregation parameter:

GET http://localhost:8083/search?filter=dataset&q=health&globalAggregation=true

To filter the search results by facets, you can use the facets parameter. In the following example, we filter for all records that are located in Germany and have a distribution in CSV or PDF format. The values for filtering are selected from the respective facet objects.

GET http://localhost:8083/search?filter=dataset&facets={"format":["CSV", "PDF"],"country":["de"]}

By default, filtering across facets is combined with AND within a facet and across facets with OR. If you would like to adjust, you can use the facetOperator and facetGroupOperator. As with the default, we have facetOperator=OR and facetGroupOperator=AND.

In the following example, we set facetOperator=AND. Then, we filter for all records that are located in Germany and have a distribution in CSV and PDF format.

GET http://localhost:8083/search?filter=dataset&facets={"format":["CSV", "PDF"],"country":["de"]}&facetOperator=AND

In the following example, we set facetOperator=AND and facetGroupOperator=OR. Then, we filter for all records that are located in Germany or have a distribution in CSV and PDF format.

GET http://localhost:8083/search?filter=dataset&facets={"format":["CSV", "PDF"],"country":["de"]}&facetOperator=AND

The following table list all possible filtering for this given example:

facetOperator=OR facetOperator=AND
facetGroupOperator=OR Filter for all records that are located in Germany or have a distribution in CSV or PDF format. Filter for all records that are located in Germany or have a distribution in CSV and PDF format.
facetGroupOperator=AND Filter for all records that are located in Germany and have a distribution in CSV or PDF format. (Default) Filter for all records that are located in Germany and have a distribution in CSV and PDF format.

More filtering#

Filter by date#

If you like to filter by a date, you can use the minDate and maxDate parameter. Your input must comply with ISO 8601. The administrator can configure which field is used for filtering, common ones are issued, modified or temporal. In the following example, we filter by a minimum and maximum date:

GET http://localhost:8083/search?filter=dataset&minDate=2023-04-01T22:00:00.000Z&maxDate=2024-03-31T00:00:00.000Z

Note: The filter parameter must be set for these parameters. The index must have a configured date field.

Filter by DQV#

If you like to filter by a data quality value, you can use the minScoring and maxScoring parameter. The administrator can configure which field is used for filtering, one is quality_meas.scoring. In the following example, we filter by a minimum and maximum scoring:

GET http://localhost:8083/search?filter=dataset&minDate=2023-04-01T22:00:00.000Z&maxDate=2024-03-31T00:00:00.000Z

Note: The filter parameter must be set for these parameters. The index must have a configured dqv field.

Filter by country data#

If you like to filter by country data, you can enable countryData parameter. The administrator can configure which language codes do not belong to a country, e.g. eu and io. In the following example, we filter by country data:

GET http://localhost:8083/search?filter=dataset&countryData=true

Note: The filter parameter must be set for these parameters. The index must have a configured country field.

Filter by data services#

If you like to filter by data services, you can enable dataServices parameter. The administrator can configure field is used for filtering, one is distributions.access_service. In the following example, we filter by data services:

GET http://localhost:8083/search?filter=dataset&dataServices=true

Note: The filter parameter must be set for these parameters. The index must have a configured data service field.

Filter by bounding box#

If you like to filter by bounding box, you use the bounding box parameters. The administrator can configure field is used for filtering, one is spatial. In the following example, we filter by a bounding box:

GET http://localhost:8083/search?filter=dataset&bboxMinLon=50&bboxMaxLon=60&bboxMinLat=45&bboxMaxLat=50

For the longitude bboxMinLon must be smaller than bboxMaxLon and both are between -180 and 180. For the latitude bboxMinLat must be smaller than bboxMaxLat and both must be between -90 and 90.

Note: The filter parameter must be set for these parameters. The index must have a configured data service field.

Filter by vocabulary#

If you search all vocabulary indices, you must set filter=vocabulary. In the following example, we search in all vocabulary indices:

GET http://localhost:8083/search?filter=vocabulary

If you like to search in a specific vocabulary, you can set the vocabulary parameter.

GET http://localhost:8083/search?filter=vocabulary&vocabulary=access-right

Note: All indexing vocabularies can be listed over GET http://localhost:8083/vocabularies

Reduce#

If you like to reduce the search result list to a set of fields, you can use the includes parameter. In the following example, we only return id and title.

GET http://localhost:8083/search?includes=id,title

Fields#

Select fields#

By default, all fields are searched. If you want to select certains fields for searching, you can use the fields parameter. In the following example, we only search in title and keywords.

GET http://localhost:8083/search?fields=title,keywords

Boost fields#

By default, all fields are weighted equally during a search. The administrator can change this default behaviour, by giving certain values a boost. If you want to specify a different weighting, you can use the boost parameter. In the following example, we set a custom boost for title and keywords.

GET http://localhost:8083/search?boost.title=10&boost.keyword=3