Component id	meetups-knowledge-graph
Type	KnowledgeGraph
Name	MEETUPS Knowledge Graph
Description	MEETUPS Knowledge Graph module with data on historical meetups and related to MEETUPS Pilot
Work package	WP4
Pilot	MEETUPS
Project	polifonia-project
Resource	https://github.com/polifonia-project/meetups-knowledge-graph/
Release date	20/07/2022
Release number	v0.1
Doi	10.5281/zenodo.7924618
Changelog	https://github.com/polifonia-project/meetups-knowledge-graph/releases/tag/v0.1
Licence	CC-BY_v4
Copyright	Copyright (c) 2023 MEETUPS @ The Open University
Contributors	Alba Morales Tirado Enrico Daga Jason Carvalho
Related components	Informed by: MEETUPS Ontology (Ontology) MEETUPS Corpus collection (Software) Generated by: MEETUPS - Historical meetups identification (Software) Persona: Ortenz (Persona) David (Persona) Sophie Reuses: SPARQL Anything Source Code (Software) SPARQL Anything Command Line (CLITool)
Bibliography	Main publication: Morales Tirado, A., Carvalho, J., Mulholland, P. and Daga, E., 2023. Musical Meetups: a Knowledge Graph approach for Historical Social Network Analysis.
Credits	https://github.com/albamoralest https://github.com/enridaga

View this file on Github Download as Schema.org

MEETUPS Knowledge Graph

Musical Meetups Knowledge Graph (MMKG)

The MEETUPS knowledge graph contains data about historical encounters of people in the musical world in Europe from c. 1800 to c. 1945.

Meetups SPARQL Endpoint

https://polifonia.kmi.open.ac.uk/meetups/sparql/

Knowledge Graph description

The KG was built using data collected from Wikipedia. A total of 33309 biographies of musical artist’s web pages in English were collected [1]. We apply knowledge extraction techniques and methods for text processing to recognise, classify and link the entities that are part of a historical meetup, particularly: people, places, time expressions and themes [3].

The MMKG data contains evidence that describes historical meetups according to Meetups Ontology [2]. A historical meetup – mtp:Meetup, is derived from evidence within a biography – mtp:hasEvidenceText. Mentions of at least one or more participants and places are represented by the mtp:Participant and mtp:Location class, respectively. Each mention is an entity (mtp:hasEntity) extracted and linked to DBpedia or Wikidata (Section 4 gives details on the extraction process). To represent the time when the meetup took place, we use the mtp:TimeExpression class. It is composed of start time:hasBeginning and end time:hasEnd dates as well as the text from where it was compiled. Lastly, the purpose of the encounter (mtp:Purpose) is captured by one of the following meetups types: mtp:BusinessCareer, mtp:PersonalLife, mtp:Coincidence, mtp:Education, mtp:PublicCelebration or mtp:MusicMaking.

MMKG is one of the main components of the MEETUPS Pilot. Here are the links to important components related to the KG.

The KG answers the CQs formalised as part of the Meetups Ontology knowledge requirements.

Ortenz

What places did musician Z visited in her career?
Where did she perform?
Where did she live?
Did musician X and performer Y ever meet? Where, when, and why?
In what context the meeting happened?
What is the nature of the event?
Was it a celebration, a festival, a private event?
Was it a religious or a secular event?
Who paid to support the event?
What is the provenance of the event attendees? What and how they happened to be there?
Did they travel to reach the place?
Were they invited? Was the meeting accidental?
How can we characterize the relation among the participants?
Was there a power relation? (e.g., Patreon / Musician)

SPARQL queries developed to answer a set of CQs

Statistics:

To obtain these statistics we build a series of queries available here

queries/statistics_query.sparql

-------------------------------
| key                 | value |
===============================
| "Biographies"       | 33309 |
| "Meetups"           | 45812 |
| "Persons mentioned" | 49170 |
| "Subjects"          | 16748 |
| "Places mentions"   | 7107  |
| "Time expressions"  | 51120 |
-------------------------------

It is possible to use SPARQL Anything to generate the same statistics using the CSV-generated

$ fx -q queries/statistics.sparql -l data/meetups/

KG quick use

The KG is available in TTL and N-quad format.

TTL format: data/meetups_triples
N-quad format: data/meetups_quads

Each file in the repository is named after the DBpedia identifier (E.g., 10085.ttl, meaning it contains information regarding Edward Elgar). Triples can be directly uploaded into any non-relational database.

For a list of all the biographies processed please see list-of-biographies.csv. Column “s” is the DBpedia resource and column “id” the unique identifier give by DBpedia

------------------------------------------------------------
| s	| id	|
============================================================
| "http://dbpedia.org/resource/Edward_Elgar"	| 10085 |
| "http://dbpedia.org/resource/Clara_Schumann"	| 45181 |
| "http://dbpedia.org/resource/Yehudi_Menuhin"	| 57520 |
------------------------------------------------------------

Repository structure

data: contains the KG data in TTL and N-quad format
diagrams: illustrative diagrams of the knowledge extraction process
queries: SPARQL queries for statistics, evaluation, KG construction, miscellaneous

KG construction

To build the KG from scratch we use SPARQL Anything and CONSTRUCT mappings to create triples for each biography. The data used to generate the KG is the evidence (text) that describes an encounter, the entities (people and places), temporal expressions and the purpose of the encounter extracted from a piece of evidence. All this data is the result of the execution of the knowledge construction pipeline detailed in Meetups Pilot.

Pre-requirements

Download and install the SPARQL Anything command line from the project release page. SPARQL Anything requires Java >= 11. We used the sparql-anything-0.8.1 version
Download the CONSTRUCT mapping scripts stored in this repository
- queries/generate-meetups-1.sparql
- queries/generate-meetups-2.sparql
- queries/generate-meetups-3.sparql
Download the list of biographies stored in this repository
- data/list-of-biographies.csv
Download the CSV files generated as output of the Knowledge extraction pipeline repository. These four folders:
- meetupsAnnotations - meetupsFastCorefOutputPP
  - meetupsTimeExpressions
  - extractedMeetupsTypes

Commands

Part1. Generates triples about the meetup evidence, people and purpose of the meetup.

java -jar sparql-anything-0.8.1.jar -q generate-meetups-1.sparql -i list-of-biographies.csv -p "data/meetups_triples/part1/?fileId.ttl" -f TTL

Part2. Generates triples about the meetup places

java -jar sparql-anything-0.8.1.jar -q generate-meetups-2.sparql -i list-of-biographies.csv -p "data/meetups_triples/part2/?fileId.ttl" -f TTL

Part3. Generates triples about the meetup temporal expressions

java -jar sparql-anything-0.8.1.jar -q generate-meetups-3.sparql -i list-of-biographies.csv -p "data/meetups_triples/part3/?fileId.ttl" -f TTL

Last step: generate meetups_quads from meetups_triples and list-of-biographies.csv:

fx -q queries/generate_nq_each.sparql -v data/list-of-biographies.csv -f NQ -p "./data/meetups_quads/?id.nq"

These commands build the whole KG, if you like to build a specific list of biographies from the available ones, just compile a new list-of-biographies.csv file and run the scripts.

*Note: be reminded to change any directory names/structure to reflect your settings.

Meetups KG extraction

KG extraction pipeline

MELODY data stories

To be released in the next deliverable

Acknowledgements

This work was supported by the EU’s Horizon Europe research and innovation programme within the Polifonia project (grant agreement N. 101004746).