Rhythmic patterns, parameter setting, and language change

Summary of the Project

The primary goal of the project is to model up the relationship between prosody and syntax in the process of language change which led from Classical Portuguese to Modern European Portuguese. Beyond the specific results of the linguistic and mathematical research which will be developed within this project, it will also produce:

the Tycho Brahe Parsed Corpus of Historical Portuguese, consisting of texts written by Portuguese authors born between 1550 and 1850. This electronic corpus, developed in the lines of the Penn-Helsinki Parsed Corpus of Middle English, will be hosted at the network of the Institute of Mathematics and Statistics of the University of São Paulo, being available to scholars for educational and research purposes. Our goal is to reach 1.000.000 words within four years;
a Comparative Tagged Corpus of Spoken Modern European Portuguese and Brazilian Portuguese, consisting of categorized recorded registers from speakers of both dialects. Our aim is to develop a statistically representative corpus of the distinctive prosodic aspects of Modern European Portuguese and Brazilian Portuguese.

The basic hypotheses of the project are the following:

the syntactic change, which occurred in Portuguese at the beginning of the 19th century, was driven by a previous prosodic change, which took place during the 18th century and affected the rhythmic pattern of the spoken language;
for the purposes of this research, the prosody of Classical Portuguese is identical to the prosody of Brazilian Portuguese;
written texts reflect their author's rhythmic patterns, through lexical and syntactic choices driven by prosodic considerations which are not affected by the norm.

This project adopts the Principles and Parameters approach to syntactic theory which has been developed by N. Chomsky and collaborators. The relation between syntax and prosody at the interface between grammar and the Articulatory-Perceptual performance system will be modelled by the Thermodynamics Formalism. The statistical analysis of historical texts will be based on the theoretical tools developed by A. Kroch and collaborators. Besides the regression models used in the statistical analysis of historical texts, the analysis of the phonetic data will require statistical modeling and inference of the underlying stochastic processes. The organisation of the Tycho Brahe Corpus will follow the steps taken for the Penn-Helsinki Parsed Corpus of Middle English. In particular, automatic morphological and syntactic parsers will be developed for Portuguese.

This is a multi-disciplinary project, involving several scientific domains. Consequently, the team of researchers working in this project includes syntacticians, phonologists, phoneticists, specialists in the history of Portuguese, statistical-physicists, probabilists, statisticians and computer scientists.

In order to develop this research, we should:

provide a detailed account of the clitic placement changes in texts written by Portuguese authors born between 1550 and 1850, describing the grammars at use.
provide a detailed account of phonetic aspects relevant for the identification of rhythmic patterns in Modern European Portuguese as well as in Brazilian Portuguese.
develop a mathematical-linguistic model for the notion of rhythmic patterns.
provide a formal model for acquisition, relating syntax and prosody at the interface of grammar and the Articulatory-Perceptual performance system.
develop a methodology to detect the rhythmic patterns of spoken language in written texts.

This project aims at providing an account of the time evolution of the rhythmic patterns of Portuguese as detected in historical written texts. This will make it possible to verify the hypothesis that the syntactic change from Classical to Modern European Portuguese was the result of a previous prosodic change, and to date both changes. Furthermore, apart from providing a better comprehension of the linguistic phenomena, the use of mathematical formalism in modelling them may also lead to new results in stochastic processes and statistics.

The execution of the project includes:

-several working sessions involving researchers of the project, including visits of the researchers to the various centers in which the project will be developed.

-two general meetings of evaluation and synthesis, involving all the researchers of the main team of the project and collaborators, in 1999 and 2001.

-four workshops on specific sub-topics of the project, with the presence of researchers and collaborators. Two of them will take place in 1998 and two in 2000. In 1998 the workshops will be held in August and December and will focus on specific problems o f the historical and phonetical data modelling, respectively.

-installing computational equipment and speech analysis tools, necessary for the corpora implementation, for the statistical and acoustic analyses of the data. This equipment will complement the existing technical resources of IME-USP and IEL-UNICAMP networks; and of the Forensic Phonetics Laboratory of the Forensic Medicine Department of the FCM-UNICAMP.