Abstract:
Sentiment analysis has been a rapidly growing research area since the advent of Web 2.0 when social networking, blogging, tweeting, web applications, and online shopping, etc., began to gain ever more popularity. The large amount of data from product reviews, blogging posts, tweets and customer feedbacks, etc., makes it necessary to automatically identify and classify sentiments from theses sources. This can potentially benefit not only businesses and organizations who need market intelligence but also individuals who are interested in purchasing/comparing products online. Sentiment analysis is performed on various levels from feature to document level. Supervised, semi-supervised, unsupervised and topic modeling techniques are used towards solving the problems. In this thesis, I explore linguistic features and structures unique to Chinese in a machine-learning context and experiment with document-level sentiment analysis using three Chinese corpora. Results from different feature sets and classifiers are reported in terms of accuracy, which shows the effectiveness of the current approach as compared to traditional machine-learning methods.