Overview • Quick start • Debug • Containers
Requirements:
ckan-mqa
offers a Docker Compose solution for performing Metadata Quality Assessment (MQA) on both CKAN endpoints and the European Data Portal catalogs. MQA is a crucial process to ensure the accuracy, completeness, and reliability of metadata, enhancing the overall data interoperability and accessibility.
This Docker Compose configuration enhances a Python MQA software 1 to integrates the powerful MQA toolset seamlessly with CKAN endpoints and European Data Portal catalogs, enabling users to perform in-depth assessments of metadata quality effortlessly. The setup provides an efficient way to run comprehensive quality checks on various metadata attributes, including data relevance, schema compliance, data format consistency, and adherence to standard vocabularies.
Note
It can be tested with an open data portal of the CKAN type such as: mjanez/ckan-docker2
The MQA measures the quality of various indicators, each indicator is explained in the tables below. The results of the checks are stored as Data Quality Vocabulary (DQV) . DQV is a specification of the W3C that is used to describe the quality of a dataset.
Dimension | Maximal points |
---|---|
Findability | 100 |
Accessibility | 100 |
Interoperability | 110 |
Reusability | 75 |
Contextuality | 20 |
Sum | 405 |
The dimensions are derived from the FAIR principles:
-
Findability The following table describes the metrics that help people and machines in finding datasets. A maximum of 100 points can be scored in this area.
-
Accessibility The following table describes which metrics are used to determine whether access to the data referenced by the distributions is guaranteed. A maximum of 100 points can be scored in this area.
-
Interoperability The following table describes the metrics used to determine whether a distribution is considered interoperable. According to the assumption 'identical content with several distributions', only the distribution with the highest number of points is used to calculate the points. A maximum of 110 points can be scored in this area
-
Reusability The following table describes which metrics are used to check the reusability of the data. A maximum of 75 points can be scored in this area.
-
Contextuality The following table show some light weight properties, that provide more context to the user. A maximum of 20 points can be scored in this area.
The final rating happens via four rating groups. The mapping of the points to the rating category is shown in the table below. The representation of the rating in the MQA is expressed exclusively via the rating categories. This enables providers to achieve the highest rating even with a slight deduction of points.
Rating | Range of points |
---|---|
Excellent | 351 - 405 |
Good | 221 – 350 |
Sufficient | 121 – 220 |
Bad | 0 - 120 |
Dimension | Indicator/property | Count | Population | Percentage | Points | Weight |
---|---|---|---|---|---|---|
Findability | dcat:keyword | 46 | 46 | 1.0 | 30.0 | 30 |
Findability | dcat:theme | 46 | 46 | 1.0 | 30.0 | 30 |
Findability | dct:spatial | 42 | 46 | 0.91 | 18.26 | 20 |
Findability | dct:temporal | 0 | 46 | 0.0 | 0 | 20 |
Accessibility | dcat:accessURL code=200 | 255 | 255 | 1.0 | 50.0 | 50 |
Accessibility | dcat:downloadURL | 0 | 255 | 0.0 | 0 | 20 |
Accessibility | dcat:downloadURL code=200 | 0 | 255 | 0.0 | 0 | 30 |
Interoperability | dct:format | 255 | 255 | 1.0 | 20.0 | 20 |
Interoperability | dcat:mediaType | 255 | 255 | 1.0 | 10.0 | 10 |
Interoperability | dct:format/dcat:mediaType from vocabulary | 378 | 510 | 0.74 | 7.41 | 10 |
Interoperability | dct:format non-proprietary | 131 | 255 | 0.51 | 10.27 | 20 |
Interoperability | dct:format machine-readable | 252 | 255 | 0.99 | 19.76 | 20 |
Interoperability | DCAT-AP compliance | 0 | 46 | 0.0 | 0 | 30 |
Reusability | dct:license | 255 | 255 | 1.0 | 20.0 | 20 |
Reusability | dct:license from vocabulary | 245 | 255 | 0.96 | 9.61 | 10 |
Reusability | dct:accessRights | 46 | 46 | 1.0 | 10.0 | 10 |
Reusability | dct:accessRights from vocabulary | 0 | 46 | 0.0 | 0 | 5 |
Reusability | dcat:contactPoint | 46 | 46 | 1.0 | 20.0 | 20 |
Reusability | dct:publisher | 46 | 46 | 1.0 | 10.0 | 10 |
Contextuality | dct:rights | 255 | 255 | 1.0 | 5.0 | 5 |
Contextuality | dcat:byteSize | 0 | 255 | 0.0 | 0 | 5 |
Contextuality | dct:issued | 46 | 46 | 1.0 | 5.0 | 5 |
Contextuality | dct:modified | 46 | 46 | 1.0 | 5.0 | 5 |
Total points | Rating: Good | 0.69 | 280.31 | 405 |
First copy the .env.example
template as .env
and configure by changing the CKAN_CATALOG_URL
, as well as the DCAT-AP Profile version (DCATAP_FILES_VERSION
), if needed.
cp .env.example .env
Custom ennvars:
CKAN_CATALOG_URL
: URL of the CKAN catalog to be downloaded (i.e.http://localhost:5000/catalog.rdf?q=organization:test
).APP_DIR
: Path to the application folder in Docker.TZ
: Timezone.DCATAP_FILES_VERSION
: DCAT-AP version (Avalaibles: 2.0.1, 2.1.0, 2.1.1).UPDATE_VOCABS
: Update vocabs from the EU Publications Office at start (True
orFalse
).CKAN_METADATA_TYPE
: CKAN Metadata elements type:ckan_uris
for GeoDCAT-AP schema with all elements described by URIs (e.g.dct:format
= http://publications.europa.eu/resource/authority/file-type/XML) orckan
if used a CKAN default schema with label metadata elements (e.g.dct:format
= "XML").
To deploy the environment, docker compose
will build the latest image (ghcr.io/mjanez/ckan-mqa:latest
).
git clone https://github.com/mjanez/ckan-mqa
cd ckan-mqa
docker compose up --build
# Or detached mode
docker compose up -d --build
Note:
Deploy the dev (local build)docker-compose.dev.yml
with:docker compose -f docker-compose.dev.yml up --build
Note:
If needed, to build a specific container simply run:docker build -t target_name xxxx/
Dependencies:
python3 -m pip install --user pipx
pipx install pdm
pdm install --no-self
Run:
pdm run python ckan2mqa/ckan2mqa.py
- Build and run container.
- Attach Visual Studio Code to container
- Start debugging on
ckan2mqa.py
Python file (Debug the currently active Python file
).
List of containers:
Repository | Type | Docker tag | Size | Notes |
---|---|---|---|---|
python 3.11 | base image | python/python:3.11-slim |
45.57 MB | - |
Repository | Type | Docker tag | Size | Notes |
---|---|---|---|---|
mjanez/ckan-mqa | custom image | mjanez/ckan-mqa:v*.*.* |
264 MB | Tag version. |
mjanez/ckan-mqa | custom image | mjanez/ckan-mqa:latest |
264 MB | Latest stable version. |
mjanez/ckan-mqa | custom image | mjanez/ckan-mqa:main |
264 MB | Dev version. |
Copyright (c) the respective contributors. It is open and licensed under the GNU Affero General Public License (AGPL) v3.0 whose full text may be found at: http://www.fsf.org/licensing/licenses/agpl-3.0.html
Footnotes
-
Program to test MQA evaluation: Javier Nogueras ([email protected]), Javier Lacasta ([email protected]), Manuel Ureña ([email protected]), F. Javier Ariza ([email protected]), Héctor Ochoa Ortiz ([email protected]). Trafair Project 2020. ↩
-
A custom installation of Docker Compose with specific extensions for spatial data and GeoDCAT-AP/INSPIRE metadata profiles. ↩