Tycho Brahe Parsed Corpus of Historical Portuguese

Galves, Charlotte, and Pablo Faria. 2010. Tycho Brahe Parsed Corpus of Historical Portuguese. URL: http://www.tycho.iel.unicamp.br/~tycho/corpus/en/index.html.

The Tycho Brahe Parsed Corpus of Historical Portuguese is an electronic corpus of texts written in Portuguese by authors born between 1380 and 1881.

At present, 76 texts ( 3,303,196 words) are available for research, with a linguistic annotation system in two stages: part-of-specch tagging ( 44 texts, a total of 1,956,460 words); and syntactic annotation ( 20 texts, a total of 877,247 words).

The Corpus has been built within the projects:

We are grateful to the following institutions and individuals:

  • Fundação de Amparo à Pesquisa do Estado de São Paulo, FAPESP 04/03643-0, "Rhythmic Patterns, Parameter Setting and Language Change, Phase II".
  • CNPq, 485999/2007-2, "Rhythmic Patterns, Prosodic Domains and Probabilistic Modelling in Portuguese Corpora".
  • Anthony Kroch and Beatrice Santorini, for the inspiration and constant support.
  • Fábio Kepler for allowing us to use his part-of-speech tagger for our work.
  • Dan Bikel for allowing us to use his Penn dissertation parser for our work.

