2004-1Linguistique et informatique : nouveaux défis
(Linguistics and information technology: new challenges)
This issue has been put on line in its integrality on the Cairn portal:
  • Benoît HABERT (Paris X-Nanterre)
    Outiller la linguistique : de l'emprunt de techniques aux rencontres de savoirs
    (To tool up linguistics: from borrowing techniques to the meeting of knowledge)
    pp. 5-24

    As such, linguistic research does not imply specific devices. However, linguistic descriptions and models would benefit from relying more often on NLP (Natural Language Processing) tools and resources and on computer science methods. The possible outcome depends on the chosen type of interaction between NLP, computer science and linguistics. A synergy between paradigms and methodologies would be more fruitful than a mere import of techniques.

  • Sarah LEROY (Paris X-Nanterre)
    Extraire sur patrons : allers et retours entre analyse linguistique et repérage automatique
    (Extraction on patterns: two-way traffic between linguistic analysis and automated identification)
    pp. 25-43

    We present here an automated identification of proper name antonomasia in tagged texts. First, we compare manual and computed identification, describing the system's cogs as well as the methods and tools we used ; we point out that the automated process is better concerning reliability.After having exposed how capabilities and limits of automated location can influence linguistic work, we compare this rather old (2000) work with new tools now usable by linguists, e.g. the query ability on a subset of tagged texts in the Frantext database.

  • Nathalie GASIGLIA (Lille 3)
    Faire coopérer deux concordanciers-analyseurs pour optimiser les extractions en corpus
    (The co-operation of Cordial Analyser and Unitex for optimising corpus extractions)
    pp. 45-62

    A well-delimited linguistic study - a semantico-syntactic analysis of the use of the verbs donner 'to give' and passer 'to pass' in the language of soccer - presents a useful framework for the development of quality reflection on documentary resources that can form an instructive concentrated electronic corpus and for the introduction of the notion of 'thematic corpus of high efficiency'. In order to explore a thus constructed corpus, two tools that generate concordances and provide syntactic analyses, Cordial Analyseur and Unitex, are put to the test. The description of their shared functionality and specificity, as well as of their weak points induced me to formulate an original proposal: to make these two tools function together so that their strategically used complementarity allows to formulate searches of certain complexity using confirmed analysis reliability and a capacity to mark every identified element in the generated concordances tagged in the XML language.

  • Augusta MELA (Montpellier 3)
    Linguistes et « talistes » peuvent coopérer : repérage et analyse des gloses
    (Linguists and NLP-specialists may work together: location and analysis of gloses)
    pp. 63-82

    This paper is related with a collective linguistic research project about the word and its gloss. Just like definitions, glosses catch 'the spoken experience of the meaning'. In French texts, this metalinguistic activity appears in words such as c'est-à-dire, ou, signifier. These signs can clarify the nature of the semantical relationship between two words: specification with au sens, equivalency with ou, c'est-à-dire, nomination with dit, baptisé, hyponymy with en particulier, comme, hyperonymy with et/ou autre(s), etc. Glosses can be automatically located because of both their marks and the features of their configurations. This paper describes an automatical retriever implementation, using 'ou glosses' as 'un magazine électronique, ou webzine' and a data-processing environment 'for linguists', namely the textual base Frantext and its Stella query language interpreter.

  • Céline VAGUER (Paris X-Nanterre)
    Constitution d'une base de données : les emplois de dans marquant la « coïncidence »
    (Creating a database: the different usages of 'dans' in marking simultaneity)
    pp. 83-97

    The setting-up of a database from which a corpus and associated information (whether syntactic or semantic, etc.) are derived is not a natural undertaking in non-computational linguistics. This article sets out to present how such a technique can be exploited within the context of a research project focussing on the French preposition dans dans.

  • Serge HEIDEN & Alexei LAVRENTIEV (ENS LSH Lyon)
    Ressources électroniques pour l'étude des textes médiévaux : approches et outils
    (Electronic aids in studying medieval texts: methods and tools)
    pp. 99-118

    Two approaches to the development of medieval text corpora can be distinguished among the projects carried out since a few decades. The first one consists of digitizing modern critical editions, and the second one is concerned with the production of precise diplomatic transcriptions of manuscripts, often directly linked to the photographs of the originals. These approaches are in fact complementary rather than contradictory, as they make it possible for scholars to choose between the quantity (representativeness) and the quality (accuracy and richness) of the data depending on the goals of their research. For both types of corpora, the challenges of their XML-TEI encoding related to the tools of their processing and analyzing are considered. Many methodological problems which arise from creating and processing medieval text corpora also concern other types of linguistic corpora.

  • Valérie BEAUDOUIN (France Télécom R & D)
    Mètre en règles
    (Metrics in rules)
    pp. 119-137

    Metrics and rythmics aspects are examined on a 80 000-verse corpus, analysed with computational linguistics tools. We propose a cumulative experimental approach consisting in building a verse pattern with a series of features (morpho-syntactic, stress, rhyme.). Features may characterize units of different levels (syllables, hemi-verse, verse, etc.) and are evidenced by different tools, but all are integrated in a single database. Thus we can verify classic metric rules and hypotheses. We also document new regularities, for example on stress patterns, and we test some new hypotheses about links between traits and patterns. This empirical approach on a large corpus, beyond verification of hypotheses, may lead to the construction of grounded theories.

Book reviews
  • Mécanique des signes et langage des sciences, de Y. Bréchet, P. Jarry & F. Letoublon
    par M. Arrivé
    pp. 138-139
  • La variation sociale en français, de F. Gadet
    par M. Debrock
    pp. 140-141
  • Le français en Tunisie, de H. Naffati & A. Queffélec
    par L. Abouda
    pp. 142-143