Leveraging LLMs for legal terms extraction with limited annotated data

Julien Breton; Mokhtar Mokhtar Billami; Max Chevalier; Ha Thanh Nguyen; Ken Satoh; Cassia Trojahn; May Myo Zin

Download from

dx.doi.org

More download options

Leveraging LLMs for legal terms extraction with limited annotated data

Julien Breton, Mokhtar Mokhtar Billami, Max Chevalier, Ha Thanh Nguyen, Ken Satoh, Cassia Trojahn & May Myo Zin

Artificial Intelligence and Law:1-27 (forthcoming) Copy BIBT_EX

Abstract

The legal industry is characterized by the presence of dense and complex documents, which necessitate automatic processing methods to manage and analyse large volumes of data. Traditional methods for extracting legal information depend heavily on substantial quantities of annotated data during the training phase. However, a question arises on how to extract information effectively in contexts that do not favour the utilization of annotated data. This study investigates the application of Large Language Models (LLMs) as a transformative solution for the extraction of legal terms, presenting a novel approach to overcome the constraints associated with the need for extensive annotated datasets. Our research delved into methods such as prompt-engineering and fine-tuning to enhance their performance. We evaluated and compared, to a rule-based and BERT systems, the performance of four LLMs: GPT-4, Miqu-1-70b, Mixtral-8x7b, and Mistral-7b, within the scope of limited annotated data availability. We implemented and assessed our methodologies using Luxembourg’s traffic regulations as a case study. Our findings underscore the capacity of LLMs to successfully deal with legal terms extraction, emphasizing the benefits of one-shot and zero-shot learning capabilities in reducing reliance on annotated data by reaching 0.690 F1 Score. Moreover, our study sheds light on the optimal practices for employing LLMs in the processing of legal information, offering insights into the challenges and limitations, including issues related to terms boundary extraction.

Keywords

Artificial Intelligence IT Law, Media Law, Intellectual Property Information Storage and Retrieval Legal Aspects of Computing Philosophy of Law

Reprint years

DOI

10.1007/s10506-025-09448-8

Other Versions

No versions found

Links

PhilArchive

This entry is not archived by us. If you are the author and have permission from the publisher, we recommend that you archive it. Many publishers automatically grant permission to authors to archive pre-prints. By uploading a copy of your work, you will enable us to better index it, making it easier to find.

Upload a copy of this work Papers currently archived: 104,467

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Sign in / register and customize your OpenURL resolver
Configure custom resolver

My notes

Analytics

Added to PP
2025-03-28

Downloads
0

6 months
0

Historical graph of downloads

Sorry, there are not enough data points to plot this chart.

How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

Leveraging LLMs for legal terms extraction with limited annotated data

Abstract

Categories

Keywords

Reprint years

DOI

Other Versions

Links

PhilArchive

External links

Through your library

My notes

Similar books and articles

Analytics

Citations of this work

References found in this work