Manuel BARBERA (Turin, Italie)Complex lexical units and their morphosyntactic treatment in the Corpus Taurinense2000, Vol. V-2, pp. 57-70
Corpus Taurinense (CT) is the POS tagged version of ItalAnt Corpus, an electronic corpus of Old Italian texts (between 1251 and 1300). In this article we aim to describe the approach followed in CT for the annotation of multiword units (MWU). MWU in our work is a set of two or more graphic words which receive (also) an overall POS tagging because this set of words is in paradigmatic relation with one word lexical unit with the same POS.Our POS tagging confirms that most of the Modern Italian compound conjunctions at that time were not lexicalised. The order of the components is already the Modern Italian order but they can still be interrupted by occasional elements.
Paul BOGAARDS (Leiden, Pays-Bas)Collocational information in dictionaries1997, Vol. II-1, pp. 31-42
Hausmann (1979, 1984) proposes an interesting theory concerning the categorization and the internal analysis of collocations. What could seem less clear is the practical importance of this theory for the treatment of collocations in dictionaries. In this paper, Hausmann's theory will be discussed and tested in an experiment where learners of French and Dutch were asked to indicate where they would look in the dictionary whenever they did not know the French equivalent of a series of Dutch collocations.
Christiane FELLBAUM (Princeton, États-Unis)Transforming a Corpus into a Lexical Resource. The Berlin Idiom Project2005, Vol. X-2, pp. 49-62
We discuss the goals and methods of the lexicographic project "Collocations and Idioms in German" at the Berlin-Brandenburg Academy of Sciences. A very large corpus is tagged and parsed to enable flexible searches for target structures, more specifically German verb phrase idioms. On the basis of relevant tokens, an extensive linguistic-lexicographic analysis is performed and recorded on a set of structured forms, which comprise a kind of digital dictionary entry for the target structure. For transparency and future research, each recorded linguistic-lexicographic phenomenon is linked with appropriate corpus tokens. The resulting resource, which combines an exhaustive description of the idioms' properties with corpus tokens, allows for multiple search types.
François MANIEZ (Lyon 2)Automatic retrieval of intentionally modified proverbs in the American press2000, Vol. V-2, pp. 19-32
Reference to a well-known proverb or phrase by altering one of its components is a widespread phenomenon in the British and American press. Since such modifications can impede a non-native speaker's understanding of newspaper or magazine articles, a system that could identify them and refer learners of English as a second language to the original wording of such expressions might be useful in the conception of an on-line comprehension assistant. Using a data base of 10 500 titles from an American news magazine, we analyze the various types of modifications that come into play in the use of shared cultural references. Through the comparison of our data base with 800 English proverbs, we test various ways in which such modifications can be automatically detected.
Denis MAUREL (Tours)An electronic dictionary for proper names1997, Vol. II-1, pp. 101-111
Following on plain words lists and conventional electronic dictionaries, the Prolex project relational electronic dictionary is based on the relational model as defined in data base theory. It is represented as a finite state transducer, which allows a quick browsing and an efficient data compaction.
Agnès TUTIN (Grenoble 3)Encoding collocations in a formal lexicon for NLP1997, Vol. II-1, pp. 43-58
In this paper, we examine the "Dictionnaire Explicatif et Combinatoire" developed by I. Mel’cuk's team in the perspective of NLP. We deal here with collocations (e.g. 'heavy smoker', 'to take a walk') which are considered as restricted cooccurrences. We first present a few characteristics of these lexical associations and the way they are encoded in the DEC by means of syntagmatic lexical functions. We then examine syntactic properties hypothesizing that these properties are not fully idiosyncratic but that a large number of generalizations can be performed within a collocative entry. Lastly, focusing on a subset of emotion nouns, we investigate the feasibility of this proposal by proposing an encoding accounting for the generalizations.
Agnès TUTIN (Grenoble 3)On the necessity of collocation dictionaries2005, Vol. X-2, pp. 31-48
This paper compares two French dictionaries of collocations (the Dictionnaire des cooccurrents de Beauchesne (2001) and the Lexique Actif du Français (Mel'cuk et Polguère, in preparation)) with electronic versions of two well known French dictionaries: the Petit Robert Electronique (version 2.1) and the Trésor de la Langue Française Informatisé. Our goal is to evaluate to what extent specialised dictionaries are really better suited than general monolingual dictionaries in representing collocations for educational purposes. Three parameters are examined: the access to collocations in dictionaries, the quantity of data and the linguistic treatment of collocations.
Agnès TUTIN (Grenoble 3)Regular and irregular collocations. Towards a typology of the collocation phenomenon2002, Vol. VII-1, pp. 7-25
Collocations are an essential phenomenon in lexical combinatorics but often are fuzzily defined. In this paper, we aim at defining clearly this concept and propose parameters accounting for the degree of idiomaticity. We present then some syntactic characteristics and some semantic processes yielding colourful collocations. We finally distinguish collocations from close concepts.