The project Rhythmic Patterns, Parameter Setting and Language Change, Phase II, is the continuation of the homonymous project ( http://www.tycho.iel.unicamp.br/ ). Its first goal is to broaden and consolidate the Tycho Brahe Parsed Corpus of Historical Portuguese ( http://www.tycho.iel.unicamp.br/~tycho/corpus ), built during the former project, in the following directions:

  • Diversification of the types, periods and spaces of production of the texts. The extended Corpus will contain texts from Portuguese authors born during the 14th and 15th centuries, texts whose authors are uncertain or unknown, non-literary texts, and texts produced in Brazil.
  • Syntactic parsing of the texts using the system developed during the first phase. At the end of the project, the Corpus will have two million words syntactically annotated.
  • Restructuring of the Corpus in XML language, according to the international patterns of Corpus codification.

The second goal of the project is to use the revised Corpus to deepen the study of Middle Portuguese, the intermediary phase between Old Portuguese and the modern variants of the language, with the following questions in mind:

  • What are the grammatical features of Middle Portuguese?
  • What is its trajectory in time?
  • What is the role of prosodic change in the emergence of Modern European Portuguese?

The project is anchored in one of the main research lines of modern Linguistics whose goal is to understand what yields linguistic change, and how this change proceeds in time. As in the previous project, the poorly addressed issue of the interaction between rhythm and syntax in change is emphasized. Moreover, we face the methodological challenge of the detection of prosodic patterns in written texts. Finally, by describing and analyzing the language state that gave birth to the two main modern variants of Portuguese, we set up the basis for a comparative history of Brazilian and Modern European Portuguese.

To attain these goals, we will articulate qualitative analysis with quantitative methods, putting together, on one side, generative grammar theory and prosodic phonology, and, on the other side, descriptive statistics and stochastic modeling. Such interdisciplinary approach relates the Project to spearhead frameworks in language change studies that combine the use of large amounts of data (made possible through the advances in computer sciences) with solidly grounded theoretical research. Finally, by bringing together scholars from varied backgrounds - syntacticians, phonologists, statisticians, probabilists, computer scientists and computational linguists - and fostering knowledge interchange, the project may contribute to a refreshed outlook for the field.

The core team of the project will work in tight collaboration other research groups. In Brazil, the main ones are the Stochastic Behavior, Critical Phenomena and Rhythmic Pattern Identification in Natural Languages project (PRONEX/FAPESP) and the Lácio-Web Project of the Interinstitutional Nucleus of Computational Linguistics (NILC, USP-São Carlos). Moreover the Tycho Brahe Corpus is hosted by and receives computational support from the IME-USP net. Outside Brazil, the project is affiliated to the pioneer Penn-Helsinki Parsed Corpus of Middle English project coordinated by Anthony Kroch. This situates us in the international stream of big annotated historical Corpuses.

Beyond regular meetings of the core team of researchers, their students and their closest collaborators, the project aims to organize national and international workshops, following the tradition of the previous project.

