Component id | meetups-corpus |
---|---|
Type | Corpus |
Name | MEETUPS Corpus |
Description | This repository contains the corpus of people in the music scene in Europe |
Work package | |
Pilot | |
Project | polifonia-project |
Resource | https://github.com/polifonia-project/meetups_corpus_collection/ |
Release date | 20/07/2022 |
Release number | v1.0 |
Licence |
|
Contributors | |
Related components |
Generated by:
|
MEETUPS Corpus collection
Collecting Wikipedia pages of people in the music scene in Europe
Details of dataset
SPARQL queries to retrieve authors’ names and dbo:wikiPageID information using Dbpedia SPARQL Endpoint https://dbpedia.org/sparql
Query filters:
Categories: <http://dbpedia.org/resource/Category:Music_people>
<http://dbpedia.org/resource/Category:People
Location:
sparqlQueryResults/query.sparql
Query results"
sparqlQueryResults/Q<1>_sparql.csv
Dataset:
Location:
dataset/
Format:
Text files .txt
Name convention:
<Author_wikiPageID>.txt
Total biographies collected:
33,309 authors wikipedia webpage
Summary total biographies collected:
sparqlQueryResults/TOTAL_download_biography.csv
Meetups pilot sample: 1.002
Select random biographies -> sampleBiographies.py