Link Search Menu Expand Document

Licences KG generation pipeline

This project includes resources for the Polifonia Licences KG, containing licence information of the resources from third-parties that the project reused.

In what follows, fx refers to the following command line java -jar sparql-anything-<version>-.jar.

Knowledge Graph Construction

Dalicc licence descriptions

We reuse a catalogue of machine readable licences from the Dalicc project.

fx -q queries/harvest-dalicc.sparql -f TTL -o knowledgegraph/dalicc.ttl

Generate external-datasets-licences.ttl

This part of the dataset comes from a survey of reused datasets in the Polifonia project From the spreadhseet in data/ to the RDF file.

fx -q queries/datasets-kg.sparql -f TTL -o knowledgegraph/datasets-licences.ttl

MusoW licences

This part of the knowledge graph includes a snapshot of the musoW KG: knowledgegraph/musow.ttl

fx -q queries/download-musow.sparql -f TTL -o knowledgegraph/musow.ttl

musoW licence annotations are aligned to Dalicc entities, alignmments are stored at knowledgegraph/musow-alignments.ttl Such alignments are used to generate the musow-licences part of the KG:

fx -q queries/musow-licences.sparql -f TTL -o knowledgegraph/musow-licences.ttl

However, musoW licence annotations are complemented with additional metadata from experimenting with extracting and linking licence information from web resources with the help of LLMs. Results are included in file: knowledgegraph/musow-licences-llm.ttl

Queries

List of can, cannot, and must terms for each dataset

fx -q queries/terms-view.sparql -l knowledgegraph/

Statistics of datasets / actions

fx -q queries/datasets-by-licence.sparql -l knowledgegraph/

fx -q queries/terms-stats.sparql -l knowledgegraph/