«´ Krister LINDEN PUBLICATIONS University of Helsinki Department of General Linguistics No. 37 P.O. Box 9 FIN-00014 University of Helsinki Finland ...»
WORD SENSE DISCOVERY AND
University of Helsinki
Department of General Linguistics
P.O. Box 9
FIN-00014 University of Helsinki
c Krister Lind´ n
ISBN 952-10-2471-2 (bound)
ISBN 952-10-2472-0 (PDF)
Helsinki 2005 Helsinki University Press Suit the action to the word, the word to the action.
— William Shakespeare (c. 1600-01) Hamlet iii Abstract The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954,1968). We study his assumption from two aspects: ﬁrstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses).
If we start with the different meanings of a word, we should be able to ﬁnd distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labeling contexts in an unsupervised or weakly supervised manner (Publication 1, 2 and 3). We are confronted with the question of how best to represent contexts in order to induce effective classiﬁers of contexts, because differences in context are the only means we have to separate word senses.
If we start with words in similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or multilingually. In the monolingual material, we ﬁnd synonyms and other related words in an unsupervised way (Publication 4). In the multilingual material, we ﬁnd translations by supervised learning of transliterations (Publication 5). In both the monolingual and multilingual case, we ﬁrst discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case we also aim at ﬁnding structure in the lists by discovering groups of similar words, e.g., synonym sets.
In this introduction to the publications of the thesis, we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it is modeled. We also consider how to deﬁne, collect and represent contexts. We discuss how to evaluate the trained context classiﬁers and discovered word sense classiﬁcations, and ﬁnally we present the word sense discovery and disambiguation methods of the publications.
This work supports Harris’ hypothesis by implementing three new methods modeled on his hypothesis. The methods have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes.
One could say that the work on this thesis has its roots in my childhood when I was 9 years old. I had already made two signiﬁcant decisions in my life. I wanted to be a scientist and I intended to invent a speech translation device so that people wouldn’t have to learn foreign languages. At that time PCs and mobile phones were still unknown. I already spoke two languages and was learning English at school, but it was the prospect of having to move to Finland and learn a fourth and radically different language, Finnish, that begot these lofty ideas.
Language Technology as a subject was not yet invented, when I began my studies at the Computer Science Department at Helsinki University. After graduation I joined the Department of Linguistics, where I was involved in an English-to-Finnish Machine Translation project under the supervision of Dr. Lauri Carlson. His vast knowledge of both applied and formal linguistics combined with his down-to-earth remarks paved my way to linguistics. During that period Dr. Kimmo Koskenniemi became the ﬁrst professor of Computational Linguistics in Finland, and Prof. Fred Karlsson was Head of the Linguistics Department leading a project on constraint grammar. This environment was tremendously inspiring and their ideas and views on morphology, surface syntax, constraint grammar, and translation I will forever carry with me.
I was, however, thrown into the business world of Language Technology, where I had the opportunity to participate in the start-up of a company called Lingsoft, which I headed for a number of years, before I went on to be its Chief Technology Ofﬁcer. At Lingsoft I took part in a range of interesting projects.
When doing language technology for information retrieval, I was fortunate to meet Prof. Kalervo J¨ rvelin, at Tampere University (UTA). During a project on a the Finnish dictionary, Perussanakirja, I met Dr. Krista Lagus, now at the Helsinki University of Technology (HUT). Both were later to become supervisors of this Ph.D. thesis. As one of my projects at Lingsoft I also designed and supervised the implementation of a Finnish speech recognition system.
By that time I had taken part in the development of all the necessary components for the speech translation device I had set out to create in childhood. Why then a dissertation on word senses? Well, after having seen all the components, I v
PREFACE AND ACKNOWLEDGEMENTS vi
During my time at the Graduate School of Language Technology, I had the privilege to cooperate with graduate students from three different universities.
Foremost among those have been Mathias Creutz at HUT, Jussi Piitulainen at HU, and Heikki Keskustalo at UTA. Together with them I was able to make some of the ideas materialize into publications.
As a complement to the intellectual work, I have enjoyed folk dancing several times a week. So much so that I now also hold a degree as a folk dancing instructor. I am grateful to the folk dancers at Arbetets V¨ nner and Brage for providing a me with relaxing, playful, but also challenging and stimulating environments for folk dancing. Merry motions always bring about a good mood.
I am also deeply indebted to my parents, Stig and Eva, without whose unfailing belief in my capabilities, and without whose decision long ago to move back to Finland, this thesis may never have happened, and to my sister, Lilian, for many discussions on the meaning of everything, and ﬁnally, to my partner in life, Juhani, without whose delicious food and good-natured support I would have felt much lonelier.
Word sense discovery and disambiguation are the essence of communication in a natural language. Discovery corresponds to growing or acquiring a vocabulary. Disambiguation is the basis for understanding. These processes are also key components of language evolution and development. In this work we will restrict ourselves to the core processes of word sense discovery and disambiguation in text-based computer applications.
We will try to demonstrate that word sense discovery and disambiguation are two sides of the same coin: you cannot have one without ﬁrst having the other. The resolution of this paradox requires some form of external reference. For humans the reference is provided by the world and the language community we live in.
Since we are dealing with computer programs analyzing text, we will refer to written representations of language communities, i.e., text corpora and machinereadable dictionaries.
In the introduction we outline the processes involved in word sense discovery and disambiguation and brieﬂy touch on some of the main problems common to both. We then outline the organization of the work and give an account of the author’s contributions.
CHAPTER 1. INTRODUCTION 2
1.1 Word Sense Disambiguation Word sense disambiguation is the task of selecting the appropriate senses of a word in a given context. An excellent survey of the history of ideas used in word sense disambiguation is provided by Ide and Veronis (1998). Word sense disambiguation is an intermediate task which is necessary in order to accomplish some other natural language processing task, e.g., £ translation selection in machine translation, £ eliminating irrelevant hits in information retrieval, £ analyzing the distribution of predeﬁned categories in thematic analysis, £ part-of-speech tagging, prepositional phrase attachment and parsing space restriction in grammatical analysis, £ phonetization of words in speech synthesis and homophone discrimination in speech recognition, and £ spelling correction, case changes and lexical access in text processing.
Word sense disambiguation (WSD) involves the association of a given word in a text or discourse with a deﬁnition or meaning which is distinguishable from other meanings potentially attributable to that word. The task therefore necessarily involves two steps according to Ide and Veronis (1998). The ﬁrst step is to determine all the different senses for every word relevant to the text or discourse under consideration, i.e., to choose a sense inventory, e.g., from the lists of senses in everyday dictionaries, from the synonyms in a thesaurus, or from the translations in a translation dictionary.
The second step involves a means to assign the appropriate sense to each occurrence of a word in context. All disambiguation work involves matching the context of an instance of the word to be disambiguated either with information from external knowledge sources or with contexts of previously disambiguated instances of the word. For both of these sources we need preprocessing or knowledge-extraction procedures representing the information as context features. For some disambiguation tasks, there are already well-known procedures such as morpho-syntactic disambiguation and therefore WSD has largely focused on distinguishing senses among homographs belonging to the same syntactic category.
However, it is useful to recognize that a third step is also involved: the computer needs to learn how to associate a word sense with a word in context using either machine learning or manual creation of rules or metrics.
CHAPTER 1. INTRODUCTION 3It is the third step which is the focus of this work and especially the machine learning aspect. Unless the associations between word senses and context features are given explicitly in the form of rules by a human being, the computer will need to use machine learning techniques to infer the associations from some training material. In order to avoid confusion, we will speak of manually 1 created disambiguation techniques as a separate category and only divide the machine learning techniques into the subcategories of supervised, semi-supervised and unsupervised.
1.2 Word Sense Discovery Word sense discovery is deﬁned as the task of learning what senses a word may have in different contexts. Word sense discovery is what lexicographers do by profession. Automated word sense discovery on a large scale in order to build a thesaurus has a much shorter history. Some of the ﬁrst attempts were made in the 1960s by Karen Sp¨ rck Jones (1986). As sufﬁciently large corpora and efﬁcient a computers have become available, several attempts to automate the process have been undertaken.
In lexicography, when building mono- and multilingual dictionaries as well as thesauruses and ontologies, word sense discovery is regarded as a preprocessing stage (Kilgarriff et al., 2004; Kilgarriff and Tugwell, 2001). In various applications, it is seen as a part of the lexical acquisition and adaptation process, e.g., in £ translation discovery when training statistical machine translation systems, £ synonym discovery for information retrieval, £ document clustering providing a domain analysis, detecting neologisms2 or rare uses of words in part-of-speech tagging and £ grammatical analysis, discovering ontological relations3 for terminologies, £ In the word sense disambiguation literature, notably SENSEVAL-2 (2001), manually created metrics or disambiguation rules are referred to as unsupervised. From a machine learning point of view, this is perhaps technically correct because no ﬁnal automated training was used to improve the performance with a training corpus. According to the same reasoning, e.g., a manually designed wide-coverage parser would be an unsupervised method from a machine learning point of view.
Basic lexical acquisition is done all the time in most natural language applications. Often it is simply dismissed as part of the preprocessing heuristics for neologisms, i.e., new words or out-of-vocabulary items.
Ontological relations are: type and subtype (isa), part-of and whole, etc.
CHAPTER 1. INTRODUCTION 4£ named entity recognition, and £ automated discovery of morphology and syntax.
Word sense discovery involves the grouping of words by their contexts into labeled sets of related words. Also this task can be seen as consisting of three steps. The ﬁrst step is to determine the groups of related words in context, i.e., create a context clustering. It involves calculating the similarity of the word contexts to be clustered, or to use similarity information from external knowledge sources.
The second step is to determine a suitable inventory of word sense labels.
There is no well-established convention for labeling the context-clustered word groups. The predeﬁned labels are typically taken from sense descriptors in everyday dictionaries, labels in thesauruses and ontologies, or translations in a translation dictionary. The labeling varies according to purpose: in terminology mining the ontological relations are frequently used, in thesaurus discovery thesaurus relations are often used, and in statistical machine translation the translations are suitable labels of word clusters.
In word sense discovery, the third step involves a way to learn how to associate a word sense label with a word cluster using either machine learning or manually created rules or metrics.