Licences KG generation pipeline
This project includes resources for the Polifonia Licences KG, containing licence information of the resources from third-parties that the project reused.
In what follows, fx refers to the following command line java -jar sparql-anything-<version>-.jar
.
Knowledge Graph Construction
Dalicc licence descriptions
We reuse a catalogue of machine readable licences from the Dalicc project.
fx -q queries/harvest-dalicc.sparql -f TTL -o knowledgegraph/dalicc.ttl
Generate external-datasets-licences.ttl
This part of the dataset comes from a survey of reused datasets in the Polifonia project
From the spreadhseet in data/
to the RDF file.
fx -q queries/datasets-kg.sparql -f TTL -o knowledgegraph/datasets-licences.ttl
MusoW licences
This part of the knowledge graph includes a snapshot of the musoW KG: knowledgegraph/musow.ttl
fx -q queries/download-musow.sparql -f TTL -o knowledgegraph/musow.ttl
musoW licence annotations are aligned to Dalicc entities, alignmments are stored at knowledgegraph/musow-alignments.ttl
Such alignments are used to generate the musow-licences part of the KG:
fx -q queries/musow-licences.sparql -f TTL -o knowledgegraph/musow-licences.ttl
However, musoW licence annotations are complemented with additional metadata from experimenting with extracting and linking licence information from web resources with the help of LLMs. Results are included in file: knowledgegraph/musow-licences-llm.ttl
Queries
List of can, cannot, and must terms for each dataset
fx -q queries/terms-view.sparql -l knowledgegraph/
Statistics of datasets / actions
fx -q queries/datasets-by-licence.sparql -l knowledgegraph/
fx -q queries/terms-stats.sparql -l knowledgegraph/