For thirty years, natural language processing has given more and more importance to machine learning, resulting in technological progress for an ever increasing number of applications. These advances have been made possible by the availability of corpora as learning material, the development of the evaluation paradigm (shared tasks) and the creation of infrastructures for technology evaluation. But copora need to be of appropriate size to be representative of the linguistic reality and neither the needed resources nor the means to produce them exist for all languages. Experiments have shown that the size of the learning material can be a substitute for algorithmic complexity or expert knowledge, but have we reached the limit for progress with this kind of approach?
