Conciseness and lack of contradictions is an indicator of data quality. This also applies to file types and internal structures. In oder to check their integrity the following steps are performed:
Each distribution is checked for the existence of a download URL. If the property is missing, file download is attempted from the access URL.
The downloaded file is checked for its media type.
If the media type is either JSON, CSV, TSV, XML, or RDF a syntax check is performed.
Two DQV measurements are derived from the above:
Does the specified media type (
DCTerms.format) match the actual file format?
Is the format synactically valid?
This service is a pipe module and can therefore be orchestrated in a pipe. The Format-Checker accepts RDF datasets (including distributions) as a payload. It either augments an existing metrics graph or creates a new one if none has been provided.