Tycho Brahe Parsed Corpus of Historical Portuguese

Citation »

Galves, Charlotte; Andrade, Aroldo Leal de; and Faria, Pablo (2017, December). Tycho Brahe Parsed Corpus of Historical Portuguese. URL: /corpus/texts/psd.zip.

Apresentação »

The Tycho Brahe Parsed Corpus of Historical Portuguese is an electronic corpus of texts written in Portuguese by authors born between 1380 and 1978.

At present, 88 texts ( 3,544,628 words) are available for research, with a linguistic annotation system in two stages: part-of-specch tagging ( 58 texts, a total of 2,280,819 words); and syntactic annotation ( 27 texts, a total of 1,234,323 words).

The Corpus has been built within the projects:


Acknowledgments »

We are grateful to the following institutions and individuals:

  • Fundação de Amparo à Pesquisa do Estado de São Paulo, FAPESP 04/03643-0, "Rhythmic Patterns, Parameter Setting and Language Change, Phase II".
  • CNPq, 485999/2007-2, "Rhythmic Patterns, Prosodic Domains and Probabilistic Modelling in Portuguese Corpora".
  • Anthony Kroch and Beatrice Santorini, for the inspiration and constant support.
  • Fábio Kepler for allowing us to use his part-of-speech tagger for our work.
  • Dan Bikel for allowing us to use his Penn dissertation parser for our work.

Other corpora »

If you see any strange characters on screen, please check if the browser encoding is set to UTF-8.