Modeling Spanish Mood Choice in Belief Statements.

Robinson, Jason R.

This work develops a computational methodology new to linguistics that empirically evaluates competing linguistic theories on Spanish verbal mood choice through the use of computational techniques to learn mood and other hidden linguistic features from Spanish belief statements found in corpora. The machine learned probabilistic linguistic models are then evaluated in terms of mood prediction for sentences such as (1): (1) No creo que Juan es / sea alto."no believe that Juan is. Indicative / "is. Subjunctive "tal"l. "I don't believe that Juan is tall".. Most native speakers note subtle differences in meaning between the indicative and subjunctive versions. While there is abundant research on subjunctive licensors (see Portner 1999, 2011), and whether the same licensing property is universally applicable to all subjunctive contexts there is little work to explain what the Spanish mood morpheme contributes to the overall meaning. Villalta (2007, 2010) proposes gradability as being a key linguistic feature that licenses mood in Spanish. This dissertation extends her theory into a broader semantic process. This theory and other competing semantic theories were translated into probabilistic graphical models (see Koller & Friedman 2011). Large amounts of relevant text were collected, formatted, statistically summarized and ranked, yielding rich analysis of the Spanish mood choice problem. Then the models were trained on the data through common data sampling techniques so that the probabilities of the hidden linguistic features could be computationally learned. The resultant probabilistic models for each linguistic theory then predicted mood in 6,282 sentences. Multiple models achieved accuracy scores between 80% and 84%. The extended Villalta model did significantly outscore the regular Villalta model, but did not significantly outscore three other semantic models. Nevertheless, this model was also shown to accurately predict scalar implicatures and metalinguistic negation. With a sentence like (1), the model predicted whether Juan was "short, almost tall, just plain tall or very tall" 69% of the time. When the set of these four alternatives was reduced to three (not tall, just plain tall and very tall) the accuracy was 81%, suggesting an innovative application of theoretical linguistics to Opinion Mining (cf Liu 2012) efforts. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]