|
 |
|
 |
| |
Institute on Knowledge and Data Management
|
Research Objectives |
|
Institute leader: Domenico Talia (talia deis.unical.it), University of Calabria |  | The overall objective of this Institute is to further the integration of data management and knowledge discovery with GRID technologies for providing knowledge-based GRID for the Semantic GRID and the Knowldege GRID. The institute provides a collaborative setting of European research teams working on: distributed storage management on GRIDs; development knowledge techniques and tools for supporting data intensive applications; and the integration of data and computation GRIDs with information and knowledge GRIDs. The goal is to strengthen the joint activity of research groups that today have sporadic and partial collaboration promoting larger leading teams and supporting efforts towards standard models and tools for data and knowledge management on GRIDs and P2P systems. This Institute has tasks in three main areas: - Distributed Data Management Providing infrastructures, techniques, and policies for managing storage resources in the GRID.
- Information and Knowledge Management Developing metadata, semantic representation, and protocols for GRID service discovery, information management and design of designing knowledge-oriented GRID services.
- Data Mining and Knowledge Discovery Design of GRID resource semantic mapping, database querying on GRIDs, and services and for distributed data mining and knowledge discovery on GRIDs.
Roadmap version 3 on Knowledge and Data Management Publications related to the Institute on Knowledge and Data Management |
|
|
Research Groups |
|
| Research Group | Leader | Participants | | Grid Data Storage Access and Management Architecture | FORTH | FORTH, PSNC, SZTAKI, UCY, INFN, UNL
| | Storage Security | INFN | INFN, FORTH, STFC | | GRID Data Integration Models and Architectures | UoM | UNICAL, UoM | Methods for Deriving GRID Trust and Security Policies for Managing VOs | CETIC | CETIC, STFC | Distributed Data Mining in GRIDs and P2P Systems | UNICAL | UNICAL, ISTI-CNR, UCY, ICAR-CNR | Adaptivity in Distributed Query and Workflow | UNCL | UoM, UNCL | |
|
|
Latest Research Highlights |
|
On Usage Control in Data Grids
CoreGRID Technical Report TR-0154 [pdf]: This paper reasons on usage control in Data Grids. First, we present a usage-based Grid authorization architecture using the functional components of the currents Grids, and consider the advantages of using Semantic Grid techologies for the specification of UCON subjects and objects. Then, we analyse the formal requirements for an enforcing mechanism of UCON policies, using the KAOS requirements engineering methodology with a bottom-up approach. To do it, we provide an abstract specification of an enforcement mechanism. Then, we prove that this specification is sound and complete showing formally that it can enforce all the policies pertaining to the Sandhu’s UCON authorization sub-models. Using the rigorous requirement engineering methodology of KAOS, we derive for each sub-model the operational requirements, showing that each one can be enforced by the specification previously provided. |  | A Scalable Architecture for Discovery and Planning in P2P Service Networks CoreGRID Technical Report TR-0152 [pdf]: The desirable global scalability of Grid systems has steered the research towards the employment of the peer-topeer (P2P) paradigm for the development of new resource discovery systems. As Grid systems mature, the requirements for such a mechanism have grown from simply locating the desired service to compose more than one service to achieve a goal. In Semantic Grid, resource discovery systems should also be able to automatically construct any desired service if it is not already present in the system, by using other, already existing services. In this report, we present a novel system for the automatic discovery and composition of services, based on the P2P paradigm, having in mind (but not limited to) a Grid environment for the application. The report improves composition and discovery by exploiting a novel network partitioning scheme for the decoupling of services that belong to different domains and an ant-inspired algorithm that places co-used services in neighbouring peers. |
| A data-centric security analysis of ICGrid
CoreGRID Technical Report TR-0145 [pdf]: The Data Grid is becoming a new paradigm for eHealth systems due to its enormous storage potential using decentralized resources managed by different organizations. The storage capabilities in these novel “Health Grids” are quite suitable for the requirements of systems like ICGrid, which captures, stores and manages data and metadata from Intensive Care Units. However, this paradigm depends on a widely distributed storage sites, therefore requiring new security mechanisms, able to avoid potential leaks to cope with modification and destruction of stored data under the presence of external or internal attacks. Particular emphasis must be put on the patient’s personal data, the protection of which is required by legislations in many countries of the European Union and the world in general. Taking into consideration underlying data protection legislations and technological data privacy mechanisms, in this paper we identify the security issues related with ICGrid’s data and metadata after applying an analysis framework extended from our previous research on the Data Grid’s storage services. Then, we present a privacy protocol that demonstrates the use of two basic approaches (encryption and fragmentation) to protect patients’ private data stored using the ICGrid system. |  | Providing security to the Desktop Data Grid CoreGRID Technical Report TR-0144 [pdf]: Volunteer Computing is becoming a new paradigm not only for the Computational Grid, but also for institutions using production-level Data Grids because of the enormous storage potential that may be achieved at a low cost by using commodity hardware within their own computing premises. However, this novel “Desktop Data Grid” depends on a set of widely distributed and untrusted storage nodes, therefore offering no guarantees about neither availability nor protection to the stored data. These security challenges must be carefully managed before fully deploying Desktop Data Grids in sensitive environments (such as eHealth) to cope with a broad range of storage needs, including backup and caching. In this paper we propose a cryptographic protocol able to fulfil the storage security requirements related with a generic Desktop Data Grid scenario, which were identified after applying an analysis framework extended from our previous research on the Data Grid’s storage services. The proposed protocol uses three basic mechanisms to accomplish its goal: (a) symmetric cryptography and hashing, (b) an Information Dispersal Algorithm and the novel (c) “Quality of Security” (QoSec) quantitative metric. Although the focus of this work is the associated protocol, we also present an early evaluation using an analytical model. Our results show a strong relationship between the assurance of the data at rest, the QoSec of the Volunteer Storage Client and the number of fragments required to rebuild the original file. |
| Distributed Data Mining in Desktop Grids
CoreGRID Technical Report TR-0141 [pdf]: Several kinds of scientific and commercial applications require the execution of a large number of independent tasks. One highly successful and low cost mechanism for acquiring the necessary compute power for these applications is the “public-resource computing”, or “desktop Grid” paradigm, which exploits the computational power of private computers. So far, this paradigm has not been applied to data mining applications for two main reasons. First, it is not trivial to decompose a data mining algorithm into truly independent sub-tasks. Second, the large volume of data involved makes it difficult to handle the communication costs of a parallel paradigm. In this paper, we focus on one of the main data mining problem: the extraction of closed frequent itemsets from transactional databases. We show that is possible to decompose this problem into independent tasks, which however need to share a large volume of data. We thus introduce a data-intensive computing network, which adopts a P2P topology based on super peers with caching capabilities, aiming to support the dissemination of large amounts of information. Finally, we evaluate the execution of our data mining job on such network. |  | |
|
| |
|
|
 |
|
 |
|