Link Search Menu Expand Document
View this file on Github

About dataset

Corpus title: Ceol Rince na hÉireann

Source: Black, B 2020, The Bill Black Irish tune archive homepage, viewed 5 January 2021.

Contents: 1,224 traditional Irish dance tunes, each of which is represented as a monophonic MIDI file.

Between 1963 and 1999, Irish State publishing companies Oifig an tSolatáthair and An Gúm issued five printed volumes of tunes from the collections of Breadán Breathnach (1912-1985) under the series title Ceol Rince na hÉireann (Dance Music of Ireland, hereafter CRÉ). The five volumes of CRÉ contain 1,208 traditional tunes, a subset of Breathnach’s more extensive personal collection of 5,000+ melodies. The collection has been transcribed into ABC notation by American traditional music researcher Bill Black, and made freely available online via his personal website. Addition of alternative tune versions and variation in numbering of unique melodies has resulted in a total of 1,224 tunes in the Bill Black ABC corpus. This resource has been used in previous research work, for example it makes up part of a larger aggregated corpus used in the Tunepal Music Information Retrieval app. We have created a new cleaned and annotated MIDI version of the corpus, from which feature sequence data can be extracted and analysed via Polifonia’s FONN music pattern analysis toolkit.

NOTE: Please see corpus_stats.ipynb for a Jupyter notebook exploring the corpus data.

Deliverable 3.2 of the Polifonia project will describe the context and research in more detail. It will be published on Cordis.

About corpus pre-processing methodology

Bill Black’s ABC version of the CRÉ collection has been manually edited and annotated, and converted to MIDI. This work included:

  • Removal of alternative tune versions, so that the ABC collection more accurately reflects the original print collection.
  • Removal of non-valid ABC notation characters.
  • Editing of repeat markers to ensure accurate MIDI output.
  • Conversion to MIDI via EasyABC software.
  • Manual assignment of root note (as chromatic pitch class) for every piece of music in the corpus. This data is stored in the file roots.csv, which is used to derive key-invariant secondary feature sequence data from the MIDI files.

Description of the data

    -1,224 monophonic MIDI files (.mid)

Each melody in the corpus is represented as a monophonic MIDI file, named per the melody title. There are 1,224 files in total, stored in the ./MIDI directory.

The corpus root directory contains a roots.csv file, this readme, and a file. Roots.csv holds two columns with one row per each MIDI file in the corpus: ‘title’: MIDI file title ‘root’: expert-assigned root note of each melody, represented as a chromatic pitch class (i.e.: An integer value from C=0 through B=11).


To extract feature sequence data from the MIDI corpus, please download the corpus data and run setup_corpus.main() from folk_ngram_analysis component. Please see folk_ngram_analysis readme for further information.


  • Danny Diamond
  • Dr. Abdul Shahid Khattak
  • Dr. James McDermott
  • Dr Mathieu d’Aquin


This project is licensed under the MIT License - see file for details