This tool is part of the MEETUPS pilot and processes text from music personalities' biographies. It uses DBpedia Spotlight to identify and annotate possible entity mentions from input text. This is an essential process to identify two of the four main elements that define a meetup: people (who participated) and place (where). Along with data of time (when) the meeting happened and the event that took place (what) complete a historical meetup data point.
@book{albamoralest_daga_2023, title={polifonia-project/meetups_pilot: v0.2}, url={https://zenodo.org/record/7875353}, DOI={10.5281/ZENODO.7875353}, abstractNote={To be published in the next ecosystem release}, publisher={Zenodo}, author={Albamoralest and Daga, Enrico}, year={2023}, month={Apr} }
MEETUPS identification of people and places is a tool developed using Python and Jupyter Notebook. This software uses DBpedia Spotlight to identify and annotate possible entity mentions from input text. This is an essential process to extract two of the three main elements that define a meetup: people (who participated), place (where) and time (when).
The implementation is divided into two Jupyter notebooks:
The first notebook is in charge of querying DBpedia Spotlight, processing the responses (JSON format) and store responses locally.
It uses as input the corpus of music personalities generated by the MEETUPS cleaning component https://github.com/polifonia-project/meetups_pilot/blob/main/01_CleaningText.ipynb
Use DBpedia Spotlight to identify and annotate entity mentions from input text.
Retrieve and process JSON format responses from DBpedia Spotlight
Store responses for later processing.
The second notebook uses the responses from DBpedia Spotlight to capture data of people and places:
Uses the responses from DBpedia captured in the previous notebook
Search for two three types of entities:
http://dbpedia.org/ontology/Person
http://dbpedia.org/ontology/MusicalArtist
http://dbpedia.org/ontology/Place
Classify the first two types as “people” and the last one as “place”
Store the results in extractedEntitiesPersonPlaceOnly/
Code location:
|_ 02_queryDbpedia.ipynb
|_ 02_Identify_PP.ipynb
Index data location
Data input:
|_ indexedSentences/
DBpedia Spotlight annotations:
|_ cacheSpotlightResponse/
People and places annotation
Data output:
|_ extractedEntitiesPersonPlaceOnly/
|_ README_people_places_identification.md