Emulating Language Acquisition with Stochastic Gradient Descent: A New Approach to Modeling Phonotactics

DSpace Repository

Show simple item record

dc.contributor.advisor Pustejovsky, James
dc.contributor.author Freyer, Frederic Jason
dc.date.accessioned 2017-05-22T20:24:54Z
dc.date.available 2017-05-22T20:24:54Z
dc.date.issued 2017
dc.identifier.uri http://hdl.handle.net/10192/33888
dc.description.abstract We present a phonotactic learning system that achieves strong performance in modeling gradience in phonotactic judgments by combining a natural class-based approach (following Albright 2009) with a learning algorithm that focuses more strongly than past models on emulating human acquisition. It has long been recognized (e.g. Scholes 1956) that phonotactic restrictions in languages are not binary, but rather represent a full spectrum between complete acceptability and unacceptability. Several experiments have verified that English speakers prefer, for example, /pɹ/ to /ʃɹ/ in syllable onsets, though both are legal; and /mɹ/ to /vm/, though both are illegal. Previous approaches to the computational modeling of phonotactics have been notably successful at learning hard constraints, but less so at learning gradient judgments (Coleman and Pierrehumbert 1997, Hayes and Wilson 2008, Albright 2009). Learning in the present model is done by stochastic gradient descent (Bottou 2010), in which every word to which the model is exposed (representing a word that a learner hears) very slightly nudges upward acceptability values for features extracted from the word. This kind of model represents a much more restricted learning environment than past models have used: the model only has access to one word at a time, and does only basic arithmetic calculations. We show that it is possible to substantially replicate phonotactic acceptability judgments---crucially including gradience---despite these restrictions, and using only about 1 million words of training data. 1 million words represent only a month or two of infant speech exposure (Hart and Risley 2003), further suggesting that it is conceivable for babies to effectively learn phonotactics at a younger age than has been established in existing literature (Mehler et al. 2009, Jusczyk et al. 1994, among others). Finally, we illustrate how the feature values learned by such a model can be used to compute phonotactic similarity between languages, a useful typological measure.
dc.description.sponsorship Brandeis University, Graduate School of Arts and Sciences
dc.format.mimetype application/pdf
dc.language English
dc.language.iso eng
dc.publisher Brandeis University
dc.relation.ispartofseries Brandeis University Theses and Dissertations
dc.rights Copyright by Frederic Freyer 2017
dc.subject linguistics
dc.subject phonotactics
dc.subject gradient descent
dc.subject machine learning
dc.subject phonology
dc.subject computational linguistics
dc.title Emulating Language Acquisition with Stochastic Gradient Descent: A New Approach to Modeling Phonotactics
dc.type Thesis
dc.contributor.department Graduate Program in Computational Linguistics
dc.degree.name MA
dc.degree.level Masters
dc.degree.discipline Computational Linguistics
dc.degree.grantor Brandeis University, Graduate School of Arts and Sciences


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search BIR


Browse

My Account