• Mathieu VALETTE & Monique SLODZIAN (ATILF / INaLCO)
    Sémantique des textes et Recherche d’Information
    (Text semantics and Information research)
    2008, Vol. XIII-1, pp. 119-133

    The aim of this paper is to set out some of the proposals of text semantics for information retrieval - more specifically for content-based text classification. To start with, we will assess the contribution of linguistics to information retrieval by means of natural language processing techniques. This will give us an opportunity to look at the achievements that have been secured and to examine standard linguistic approaches to information retrieval. In particular, we will focus on the slow emergence of text considerations as the web expands. We intend to show that the ever-greater attention raised by text linguistics comes at a critical juncture in the evolution of information retrieval on the web. We will show how text categorisation is a departure from traditional approaches. The second and third parts will go into greater detail and examine the way text linguistics can apply to information retrieval. We will first lay out the methods used within the framework of a project aiming to filter racist web texts; we will then introduce some of the research currently conducted in the field of textual data analysis, which, in the near future, is liable to improve the methodology of information retrieval.