Recently, several papers, starting with Ramus, Nespor and Mehler (1999), gave evidence that simple statistics of the speech signal could discriminate between different rhythmic classes.In the present paper, we propose a new approach to the problem of finding acoustic correlates of the rhythmic classes. Its main ingredient is a rough measure of sonority defined directly from the spectrogram of the signal. This approach has the major advantage that it can be implemented in an entirely automatic way, with no need of previous hand-labelling of the acoustic signal. Applied to the same linguistic samples considered in RNM, it produces the same clusters corresponding to the three conjectured rhythmic classes. The resulting statistics strongly suggest that rhythmic class discrimination can be entirely based on a measure of obstruency present in the signal.
Large annotated speech corpora are a critical component of research in prosody. The classification of languages according to their speech rhythm, for example, requires a great number of annotated sentences by different speakers in different languages. We have developed {\it Vocale}, a tool for the semi-automatic annotation of vocalic and consonantal parts of speech because in recent models these units have been identified as reliable acoustic correlates of speech rhythm. {\it Vocale} is based on relative entropy and uses various additional classifiers such as energy and length for the annotation of vowels and consonants. It runs using Praat speech analysis facilities and gives a Praat label file as an output. {\it Vocale} is open source software and is available to the scientific community under http://www.ime.usp.br/$\sim$tycho/tipal/prosody/vocale/.
Typical postlexical interface phenomena, like secondary (rhythmic) stress, can be succesfully modeled by OT analyses, which predict optimally stressed outputs from a set of possible inputs and a hierarchically ranked set of constraints. This paper presents an OT analysis for European and Brazilian Portuguese secondary stressing. Based on this analysis, a computer program, sotaq, has been developed, allowing for automatic testing, against large corpora, of proposed constraint hierarchies for both varieties of Portuguese. Test results are presented, showing suitable hierarchies generating secondary stresses for both varieties of Portuguese.
In the linguistic literature it has been conjectured that natural languages are divided into rhythmic classes (cf. Abercrombie1967, Pike 1945, among others). During half a century no reliable phonetic evidence was presented to support this claim. Recently Ramus, Nespor and Mehler (1999), from now on denoted by RNM, and Grabe and Low (2000) gave evidence that simple statistical properties of the speech signal could discriminate between different rhythmic classes. Following the approach introduced in RNM, Frota and Vigário (2000, 2001) analyzed data from Brazilian Portuguese and European Portuguese (henceforth BP and EP, respectively). The present report discusses statistical issues suggested in RNM and introduces a new tools to analyze acoustic data. First we computed the sample statistics proposed in RNM for EP and BP. Afterwards we performed an exploratory data analysis for EP and BP data. We make an exploratory analysis of the effect of the last vocalic interval of each sentence on the sample statistics of BP and EP, and show that they have a high influence on these statistics. Finally we analyzed the data using a parametric probabilistic model which can be seen as a refinement of the results of the exploratory analysis because it provides a parsimonious model and allows us to make statistical tests of our hypotheses. As an application of this model we show that the standard deviation of the parametric probabilistic model for consonantal intervals allows us to cluster the languages considered in RNM, besides BP and EP, in groups that correspond precisely to the rhythmic class hypothesis. More precisely, the data support the hypothesis that the languages form three clusters: English-Polish-Dutch-EP, French-Italian- Spanish-Catalan-BP and Japanese.