Tycho Brahe Parsed Corpus of Historical Portuguese

Citation »

Galves, Charlotte; Andrade, Aroldo Leal de; and Faria, Pablo (2017, December). Tycho Brahe Parsed Corpus of Historical Portuguese. URL: http://www.tycho.iel.unicamp.br/~tycho/corpus/texts/psd.zip.

Apresentação »

The Tycho Brahe Parsed Corpus of Historical Portuguese is an electronic corpus of texts written in Portuguese by authors born between 1380 and 1881.

At present, 76 texts ( 3,302,811 words) are available for research, with a linguistic annotation system in two stages: part-of-specch tagging ( 44 texts, a total of 1,962,176 words); and syntactic annotation ( 27 texts, a total of 1,234,323 words).

The Corpus has been built within the projects:

Acknowledgments »

We are grateful to the following institutions and individuals:

  • Fundação de Amparo à Pesquisa do Estado de São Paulo, FAPESP 04/03643-0, "Rhythmic Patterns, Parameter Setting and Language Change, Phase II".
  • CNPq, 485999/2007-2, "Rhythmic Patterns, Prosodic Domains and Probabilistic Modelling in Portuguese Corpora".
  • Anthony Kroch and Beatrice Santorini, for the inspiration and constant support.
  • Fábio Kepler for allowing us to use his part-of-speech tagger for our work.
  • Dan Bikel for allowing us to use his Penn dissertation parser for our work.

Other corpora »

|: News & Updates :|
(in Portuguese)
|: Registration :|
|: Contact :|
Access to Texts

[ Computational tools page ]
[ Ordered Lists Catalog ]
[ Query POS files with CorpusSearch ]

Download Complete Corpus
(compacted .zip files):

[ Complete Corpus, syntactic annotation (latest version) ]
[ Complete Corpus, syntactic annotation (Galves & Faria 2010 version)]
[ Complete Corpus, POS tagging ]
[ Complete Corpus, no annotation ]

Edition Guidelines

[ Texts Presentation ]
[ Complete Edition Manual ]
[ Syntactic and POS Tagging Annotation Manuals ]

If you see any strange characters on screen, please check if the browser encoding is set to UTF-8.