DataSpaces framework addresses data-intensive application workflows, which typically transform data they manage, and often reduce it before the data can be processed by consumer applications or services. For example, code-coupled application may only require subsets of data that are sorted and processed.
DataSpaces framework addresses data-intensive application workflows, which typically transform data they manage, and often reduce it before the data can be processed by consumer applications or services. For example, code-coupled application may only require subsets of data that are sorted and processed.
An initial solution we have explored consisted in embedding pre-defined data transformation operations in the staging area [3] to better utilize the CPU resources and transform the data before it is shipped to the consumer. This approach requires a priori knowledge of the processing, as well as the data structures and data representation.
ActiveSpaces is a data management framework that explores an alternate paradigm, it allows the application developers to programmatically define data-processing routines and dynamically deploy and execute them in the staging area at runtime rather that moving the data to the processing codes. ActiveSpaces builds on the concept of a staging area, and specifically on the DataSpaces[2] framework, which overlays the abstraction of an associative, virtual shared space on the staging area. Applications, which may run on remote and heterogeneous systems, can insert and retrieve data objects at runtime using semantically meaningful descriptors (e.g., geometric regions in a discretized application domain).
The ActiveSpaces framework provides (1) programming support for defining the data processing routines, called data kernels, to be deployed and executed on the staging area, and (2) run-time mechanisms for transporting the binary codes associated with these data kernels to the staging area and executing them in parallel on the staging nodes. The programming abstractions allow an application developer to define and implement the data kernels using all constructs of the native programming language ( e.g., C). The run-time mechanism enables code offloading and remote execution at the data source for HPC applications.

The ActiveSpaces architecture contains two main components an ActiveSpaces server and an ActiveSpaces client component. The ActiveSpaces server is a stand-alone component, which runs on the staging area and provides data services to user applications. The ActiveSpaces client integrates with user applications and runs on the computing nodes. These components implement the programming API which is exposed at the application level, and the run-time system, which executes the user-defined data kernels. ActiveSpaces extends the DataSpaces framework and implements new services to apply transformations to the data on the space or to the results of a data request. These services are provided by the run-time executionsystem (Rexec.)

Data kernels are implemented within an application and have direct knowledge of the structure of the data used in the application. Once deployed on the space, these kernels can access the data directly and manipulate it without additional support from the space, such as parameter marshalling or data decoding. After executing a data kernel, the Rexec layer from the server component returns the results back to the application.

Multiple applications collaborating at runtime can insert data in the space, and can retrieve raw or pre-processed data of interest using the data kernels processing routines. ActiveSpaces can reduce the amount of data that needs to be transferred over the network for data reduction operations. It can also reduce an application's computation time by offloading computations, such as interpolation, redistribution, reformatting, etc., which can be asynchronously executed in parallel on the staging area nodes. ActiveSpaces offers some benefits even in more constrained cases where execution of the data kernels is synchronous, because it can better exploit data locality within the staging nodes, because the number of nodes hosting the staging area is much smaller than the number of nodes running the application.
Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer
