Abstract
Automated classification of legal documents has been the subject of extensive research in recent years. However, this is still a challenging task for long documents, since it is difficult for a model to identify the most relevant information for classification. In this paper, we propose a two-stage supervised learning approach for the classification of petitions, a type of legal document that requests a court order. The proposed approach is based on a word-level encoder–decoder Seq2Seq deep neural network, such as a Bidirectional Long Short-Term Memory (BiLSTM) or a Bidirectional Encoder Representations from Transformers (BERT) model, and a document-level Support Vector Machine classifier. To address the challenges posed by the lengthy legal documents, the approach introduces a human-in-the-loop approach, whose task is to localize and tag relevant segments of text in the word-level training part, which dramatically reduces the dimension of the document classifier input vector. We performed experiments to validate our approach using a real-world dataset comprised of 270 intermediate petitions, which were carefully annotated by specialists from the 15th civil unit of the State of Alagoas, Brazil. Our results revealed that both BiLSTM and BERT-Convolutional Neural Networks variants achieved an accuracy of up to 95.49%, and also outperformed baseline classifiers based on the Term Frequency–Inverse Document Frequency test vectorizer. The proposed approach is currently being utilized to automate the aforementioned justice unit, thereby increasing its efficiency in handling repetitive tasks.