Tycho Brahe Parsed Corpus of Historical Portuguese

Citation »

Galves, Charlotte, and Pablo Faria. 2010. Tycho Brahe Parsed Corpus of Historical Portuguese. URL: http://www.tycho.iel.unicamp.br/~tycho/corpus/en/index.html.

Apresentação »

The Tycho Brahe Parsed Corpus of Historical Portuguese is an electronic corpus of texts written in Portuguese by authors born between 1380 and 1845.

At present, 64 texts ( 2,769,403 words) are available for research, with a linguistic annotation system in two stages: part-of-specch tagging ( 33 texts, a total of 1,485,943 words); and syntactic annotation ( 16 texts, a total of 671,694 words).

The Corpus has been built within the projects:


Acknowledgments »

We are grateful to the following institutions and individuals:

  • Fundação de Amparo à Pesquisa do Estado de São Paulo, FAPESP 04/03643-0, "Rhythmic Patterns, Parameter Setting and Language Change, Phase II".
  • CNPq, 485999/2007-2, "Rhythmic Patterns, Prosodic Domains and Probabilistic Modelling in Portuguese Corpora".
  • Anthony Kroch and Beatrice Santorini, for the inspiration and constant support.
  • Fábio Kepler for allowing us to use his part-of-speech tagger for our work.
  • Dan Bikel for allowing us to use his Penn dissertation parser for our work.

Other corpora »

|: News & Updates :|
(in Portuguese)
|: Registration :|
|: Contact :|
Access to Texts

[ Computational tools page ]
[ Ordered Lists Catalog ]
[ Query POS files with CorpusSearch ]

Download Complete Corpus
(compacted .zip files):

[ Complete Corpus, syntactic annotation ]
[ Complete Corpus, POS tagging ]
[ Complete Corpus, no annotation ]

Edition Guidelines

[ Texts Presentation ]
[ Complete Edition Manual ]
[ Syntactic and POS Tagging Annotation Manuals ]

If you see any strange characters on screen, please check if the browser encoding is set to UTF-8.