Auswertung von Texten

  • Jean-Pierre DESCLÉS (Paris 4)
    Extraction d'informations de corpus composés de textes techniques
    (Information retrieval from corpora of technical texts)
    1997, Vol. II-2, pp. 19-33

    Technical texts present interesting and so far poorly researched linguistic characteristics. In this article, a research project is described, carried out by a multidisciplinary group of linguists and computer scientists, which aims at devising and realising prototypes of computer programmes for extracting information from technical texts. This research, as is illustrated by concrete examples, has led to computer programmes that have the form either of networks between concepts or of phrases taken from the analysed texts, and that are, if necessary, accompanied by automatically assigned semantic information.

  • Nathalie GASIGLIA (Lille 3)
    Faire coopérer deux concordanciers-analyseurs pour optimiser les extractions en corpus
    (The co-operation of Cordial Analyser and Unitex for optimising corpus extractions)
    2004, Vol. IX-1, pp. 45-62

    A well-delimited linguistic study - a semantico-syntactic analysis of the use of the verbs donner 'to give' and passer 'to pass' in the language of soccer - presents a useful framework for the development of quality reflection on documentary resources that can form an instructive concentrated electronic corpus and for the introduction of the notion of 'thematic corpus of high efficiency'. In order to explore a thus constructed corpus, two tools that generate concordances and provide syntactic analyses, Cordial Analyseur and Unitex, are put to the test. The description of their shared functionality and specificity, as well as of their weak points induced me to formulate an original proposal: to make these two tools function together so that their strategically used complementarity allows to formulate searches of certain complexity using confirmed analysis reliability and a capacity to mark every identified element in the generated concordances tagged in the XML language.

  • Sarah LEROY (Paris X-Nanterre)
    Extraire sur patrons : allers et retours entre analyse linguistique et repérage automatique
    (Extraction on patterns: two-way traffic between linguistic analysis and automated identification)
    2004, Vol. IX-1, pp. 25-43

    We present here an automated identification of proper name antonomasia in tagged texts. First, we compare manual and computed identification, describing the system's cogs as well as the methods and tools we used ; we point out that the automated process is better concerning reliability.After having exposed how capabilities and limits of automated location can influence linguistic work, we compare this rather old (2000) work with new tools now usable by linguists, e.g. the query ability on a subset of tagged texts in the Frantext database.

  • François MANIEZ (Lyon 2)
    Le repérage par traitement automatique du défigement lexical des proverbes dans la presse américaine
    (Automatic retrieval of intentionally modified proverbs in the American press)
    2000, Vol. V-2, pp. 19-32

    Reference to a well-known proverb or phrase by altering one of its components is a widespread phenomenon in the British and American press. Since such modifications can impede a non-native speaker's understanding of newspaper or magazine articles, a system that could identify them and refer learners of English as a second language to the original wording of such expressions might be useful in the conception of an on-line comprehension assistant. Using a data base of 10 500 titles from an American news magazine, we analyze the various types of modifications that come into play in the use of shared cultural references. Through the comparison of our data base with 800 English proverbs, we test various ways in which such modifications can be automatically detected.