| Component id | meetups-corpus |
|---|---|
| Type | Corpus |
| Name | MEETUPS Corpus |
| Description | This repository contains the corpus of people in the music scene in Europe |
| Work package | |
| Pilot | |
| Project | polifonia-project |
| Resource | https://github.com/polifonia-project/meetups_corpus_collection/ |
| Release date | 20/07/2022 |
| Release number | v1.0 |
| Licence |
|
| Contributors | |
| Related components |
Generated by:
|
MEETUPS Corpus collection
Collecting Wikipedia pages of people in the music scene in Europe
Details of dataset
SPARQL queries to retrieve authors’ names and dbo:wikiPageID information using Dbpedia SPARQL Endpoint https://dbpedia.org/sparql
Query filters:
Categories: <http://dbpedia.org/resource/Category:Music_people>
<http://dbpedia.org/resource/Category:People
Location:
sparqlQueryResults/query.sparql
Query results"
sparqlQueryResults/Q<1>_sparql.csv
Dataset:
Location:
dataset/
Format:
Text files .txt
Name convention:
<Author_wikiPageID>.txt
Total biographies collected:
33,309 authors wikipedia webpage
Summary total biographies collected:
sparqlQueryResults/TOTAL_download_biography.csv
Meetups pilot sample: 1.002
Select random biographies -> sampleBiographies.py