Link Search Menu Expand Document

About the dataset

Corpus title: Ceol Rince na hÉireann

Source: Black, B 2020, The Bill Black Irish tune archive homepage, viewed 5 January 2021.

Contents: 1,195 traditional Irish dance tunes, represented in MIDI and ABC Notation.

Between 1963 and 1999, Irish State publishing companies Oifig an tSolatáthair and An Gúm issued five printed volumes of tunes from the collections of Breadán Breathnach (1912-1985) under the series title Ceol Rince na hÉireann (Dance Music of Ireland, hereafter CRÉ). The five volumes of CRÉ contain 1,208 traditional tunes, a subset of Breathnach’s more extensive personal collection of 5,000+ melodies. The collection has been transcribed into ABC notation by American traditional music researcher Bill Black, and made freely available online via his personal website. Addition of alternative tune versions and variation in numbering of unique melodies has resulted in a total of 1,224 tunes in the Bill Black ABC corpus. This resource has been used in previous research work, for example it makes up part of a larger aggregated corpus used in the Tunepal Music Information Retrieval app. We have created a new cleaned and annotated version of the corpus, from which feature sequence data can be extracted and analysed via Polifonia’s FONN music pattern analysis toolkit.

NOTE: Please see corpus_demo.ipynb for a Jupyter notebook exploring the corpus data.

Deliverable 3.3 of the Polifonia project will describe the context and research in more detail. It will be published on Cordis.

About corpus pre-processing methodology

Bill Black’s ABC version of the CRÉ collection has been manually edited and annotated, and converted to MIDI. This work included:

  • Removal of alternative tune versions, so that the ABC collection more accurately reflects the original print collection.
  • Removal of non-valid ABC notation characters.
  • Editing of repeat markers to ensure accurate MIDI output.
  • Manual assignment of root note (as chromatic pitch class) for every piece of music in the corpus. This data is stored in roots.csv, which is used to derive key-invariant secondary feature sequence data from the MIDI files.

Description of the data

corpus/
  -MIDI/
    -1,195 monophonic MIDI (.mid) files, one representing each tune.
  -abc/
    -1 ABC NOtation corpus file (.abc) containing scores for all 1,195 tunes.
  -roots.csv
  -README.md
  -LICENSE.md

  • corpus directory contains roots.csv, this README.md, and a LICENSE.md file.

  • Roots.csv holds two columns with one row per each MIDI file in the corpus:

    • ‘title’: MIDI file name (tune title)
    • ‘root’: expert-assigned root note of each melody, represented as a chromatic pitch class (i.e.: An integer value from C=0 through B=11).

image

  • To convert corpus form ABC Notation to MIDI format, please download the corpus data and run FONN abc_ingest.py script. Please see FONN README.md for further information.

  • To extract feature sequence data from the MIDI corpus, please download the corpus data and run FONN setup_corpus.py script. Please see FONN README.md for further information.

Attribution

If you use the code in this repository, please cite this software as follow:

@software{diamond_fonn_2022,
	address = {Galway, Ireland},
	title = {{FONN} - {FOlk} {N}-gram {aNalysis}},
	shorttitle = {{FONN}},
	url = {https://github.com/polifonia-project/folk_ngram_analysis},
	publisher = {National University of Ireland, Galway},
	author = {Diamond, Danny and Shahid, Abdul and McDermott, James},
	year = {2022},
}

License

This work is licensed under CC BY 4.0, https://creativecommons.org/licenses/by/4.0/