Applying Machine Learning to Usage of Aspect Markers in Chinese Text

DSpace Repository

Show simple item record

dc.contributor.advisor Pustejovsky, James
dc.contributor.author Entrikin, Russell
dc.date.accessioned 2012-06-12T16:30:32Z
dc.date.available 2012-06-12T16:30:32Z
dc.date.issued 2012
dc.identifier.uri http://hdl.handle.net/10192/74
dc.description.abstract One of the most difficult issues for learners of Chinese is understanding the way temporal information is marked in discourse. Like Indo-European languages, Chinese makes use of explicit temporal expressions, temporal adverbs, ordering of words and verb phrases, and pragmatics to communicate temporal relationships. However, Chinese lacks temporal inflection on verbs. An aspect marker may follow a verb, but crucially, these markers are considered optional in many contexts, and usage differs in different domains (e.g. written language, spoken language, official broadcast news). While all finite verbs in English are temporally marked in some way, the majority of verbs in most discourse will not be marked aspectually in Chinese. This lack of positive examples makes it extremely hard for learners to understand when aspect markers are licensed. Here, we explore the viability of using corpus linguistics techniques to create a sort of “discourse grammar”-checker for Chinese text which learners of Chinese can use to find errors in their own usage of aspect markers. We use a corpus-based machine learning approach to train a classifier on the usage of aspect markers and attempt to use this classifier to correctly posit aspect markers in unseen text. We discuss the capabilities and limits of our system, and how the optional and subjective nature of the placement of aspect markers blurs the notion of hits vs. false positives vs. false negatives, making evaluation difficult. We also sketch an annotation schema which would support a Chinese discourse-based aspect marker checking tool.
dc.description.sponsorship Brandeis University, Graduate School of Arts and Sciences
dc.format.mimetype application/pdf
dc.language English
dc.language.iso eng
dc.publisher Brandeis University
dc.relation.ispartofseries Brandeis University Theses and Dissertations
dc.rights Copyright by Russell Entrikin 2012
dc.subject Chinese
dc.subject Mandarin
dc.subject machine learning
dc.subject aspect markers
dc.title Applying Machine Learning to Usage of Aspect Markers in Chinese Text
dc.type Thesis
dc.contributor.department Department of Computer Science
dc.degree.name MA
dc.degree.level Masters
dc.degree.discipline Computer Science
dc.degree.grantor Brandeis University, Graduate School of Arts and Sciences


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search BIR


Browse

My Account