Abstract
Data Transfer Service (DTS) is an open-source project that is developing a document-centric message model for describing a bulk data transfer activity, with an accompanying set of loosely coupled and platform-independent components for brokering the transfer of data between a wide range of (potentially incompatible) storage resources as scheduled, fault-tolerant batch jobs. The architecture scales from small embedded deployments on a single computer to large distributed deployments through an expandable ‘worker-node pool’ controlled through message-orientated middleware. Data access and transfer efficiency are maximized through the strategic placement of worker nodes at or between particular data sources/sinks. The design is inherently asynchronous, and, when third-party transfer is not available, it side-steps the bandwidth, concurrency and scalability limitations associated with buffering bytes directly through intermediary client applications. It aims to address geographical–topological deployment concerns by allowing service hosting to be either centralized (as part of a shared service) or confined to a single institution or domain. Established design patterns and open-source components are coupled with a proposal for a document-centric and open-standards-based messaging protocol. As part of the development of the message protocol, a bulk data copy activity document is proposed for the first time