Licensing and Usage Rights of Language Data in Machine Translation

In Helena Moniz & Carla Parra Escartín (eds.), Towards Responsible Machine Translation: Ethical and Legal Considerations in Machine Translation. Springer Verlag. pp. 49-69 (2023)
  Copy   BIBTEX

Abstract

Machine translation (MT) is special in that it heavily relies on data. In rule-based MT, an engine performs the translation task by using language resources such as dictionaries and grammar rules, usually written by experts, but sometimes learned from monolingual or bilingual text. Corpus-based (statistical and, more recently, neural) MT leverages large amounts of monolingual and sentence-aligned bilingual text. Clearly, MT programs using these data are works of creation that may be copyright-protected, but this chapter focuses on data. Human labour, and therefore, creative authorship of works, is present in all forms of MT data: monolingual text has been authored, parallel text has been translated and aligned, and rules and dictionaries have been written by experts. Since its conception centuries ago, copyright protects the livelihoods of authors by regulating how copies of these data can be used and how works derived from them are used and published, using instruments such as licences. While the case of dictionaries and grammars as used in rule-based MT is reasonably clear, as they are purposely written for one or another language-processing application, monolingual and parallel text, as used in MT, were not created with MT in mind, and this has led some authors to ask whether authors and translators should get additional compensation for this unintended use of their work to generate new value downstream. This chapter gives an overview of the different sources of data used in MT, discussing authorship along the steps of creating, curating and transforming those data for use with MT, determining the kinds of implicit and explicit licensing schemes that apply to them and how they work. It also describes the controversy surrounding the use of published works to generate new, initially unintended, value through translation technologies and the various ways in which copyright issues are addressed.

Other Versions

No versions found

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 100,516

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Analytics

Added to PP
2023-04-13

Downloads
19 (#1,067,153)

6 months
6 (#838,367)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references