La question de la normalisation des écrits scolaires pour leur traitement automatique. Le cas de l’omission de mots

Martina Ponton Barletta

Download from

dx.doi.org

More download options

La question de la normalisation des écrits scolaires pour leur traitement automatique. Le cas de l’omission de mots

Martina Ponton Barletta

Corpus 26 (26) (2025) Copy BIBT_EX

Abstract

This paper addresses the treatment of noise caused by word omissions in a corpus of school writings, in order to facilitate their subsequent automatic processing. While a normalization step may facilitate the processing of these texts, certain linguistic expressions remain challenging to comprehend, particularly in instances where the writer omits words from the text. The present contribution proposes three automatic and semi-automatic potential solutions to this problem. The first method employs a "mask" token in the form of xxx. The second is a semi-automatic approach whereby each morpho-syntactic category proposed during normalization is replaced by the corresponding "prototypical word." The third involves a FlauBERT method, using this language model to "reconstruct" the most probable token in the text. The three methods are evaluated quantitatively, and the results obtained using method 3, which proved to be the most effective in the context of our research, are also presented qualitatively.

Cite

Plain text

BibTeX

Formatted text

Zotero

EndNote

Reference Manager

RefWorks

Options

Edit

Mark as duplicate

Find it on Scholar

Request removal from index

Revision history

Keywords

NLP NLP, children corpus, normalization, morphosyntactic analysis TAL, écrits scolaires, normalisation, analyse morphosyntaxique analyse morphosyntaxique children corpus morphosyntactic analysis normalisation normalization écrits scolaires TAL

Reprint years

DOI

10.4000/1364v

Other Versions

No versions found

My notes

Analytics

Added to PP
2025-01-28

Downloads
0

6 months
0

Historical graph of downloads

Sorry, there are not enough data points to plot this chart.

How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

Add more references

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

La question de la normalisation des écrits scolaires pour leur traitement automatique. Le cas de l’omission de mots

Abstract

Categories

Keywords

Reprint years

DOI

Other Versions

Links

PhilArchive

External links

Through your library

My notes

Similar books and articles

Analytics

Citations of this work

References found in this work