
The manuscript with the above title has been accepted for publication in the ACM Transactions on Asian Language Information Processing.
Abstract:
Anaphora resolution is one of the most difficult tasks in NLP. The ability to identify non-referential pronouns before attempting an anaphora resolution task would be significant, since the system would not have to attempt resolving such pronouns and hence end up with fewer errors. In addition, the number of non-referential pronouns has been found to be non-trivial in many domains. The task of detecting non-referential pronouns could also be incorporated into a part-of-speech tagger or a parser, or treated as an initial step in semantic interpretation. In this paper, I describe a machine learning method for identifying non-referential pronouns in an annotated sub-segment of the Penn Arabic Treebank using three different feature settings. I achieve an accuracy of 97.22% with 52 different features extracted from a small window size of -5/+5 tokens surrounding each potentially non-referential pronoun.
Anaphora resolution is one of the most difficult tasks in NLP. The ability to identify non-referential pronouns before attempting an anaphora resolution task would be significant, since the system would not have to attempt resolving such pronouns and hence end up with fewer errors. In addition, the number of non-referential pronouns has been found to be non-trivial in many domains. The task of detecting non-referential pronouns could also be incorporated into a part-of-speech tagger or a parser, or treated as an initial step in semantic interpretation. In this paper, I describe a machine learning method for identifying non-referential pronouns in an annotated sub-segment of the Penn Arabic Treebank using three different feature settings. I achieve an accuracy of 97.22% with 52 different features extracted from a small window size of -5/+5 tokens surrounding each potentially non-referential pronoun.
No comments:
Post a Comment