theatre français.jpeg

French Literary Fiction

From Roland to Conan: One thousands years of French literary fictions (1050 - 1920)

Jean-Baptiste Camps, Pierre-Carl Langlais, Olivier Morin, Nicolas Baumard

This project is at the intersection of two major trends in computational literary analysis: the creation and documentation of large literary corpora and the analysis of literary genre and dis- course through machine learning classification. In comparison with previous examples of French literary corpus (like théâtre classique (Fièvre, 2007)) or the French corpus of the European literary text collection (Odebrecht et al., 2021), French novels make up a massive amount of texts (80,000 registered work in the French National Library before the 20th century), a large share of which is non-canonical and little documented. Text mining techniques make it possible to explore and document large digitized corpora with little editorial work. Classification is not simply used as cataloguing tool: its limitations can in fact inform in a more complex way the development of genres and the intertextual interplay between one genre and another.

The French fiction corpus initially results from the collocation of three different collections:

1. A collection of medieval fictions and chansons de gestes (1050-1450)

2. A collection of printed fictions of Gallica from the modern period and the 19th century.

3. A new collection comprising most of the digitized fictions from the early modern period (1450-1700).