supported by the EC IST Programme
CoreGRID European Research Network on Foundations, Software Infrastructures and Applications
for large scale distributed, GRID and Peer-to-Peer Technologies
image
Home  
Sunday, 05 July 2009

spacer spacer
 

The CoreGRID Network of Excellence currently offers

ident Fellowships:
   for postgraduate students in the field of GRID Research

ident Job announcements:
   related to GRID research free of charge 

Main Menu
Home
News
Events
CoreGRID WG
CoreGRID NoE
Institutes
Integration Activities
Dissemination
Training & Education
CoreGRID & Industry
Mobility Portal
Trust&Security Portal
Collaboration Gateway
Other Collaborations
Links
Contact Us
Login Form





Lost Password?
Who's Online
Visitors: 3303001
Syndicate
Get the latest news direct to your desktop
 
spacer spacer
spacer spacer
 
CoreGRID Technical Report TR-0164 Print

Metadata Ranking and Pruning for Failure Detection in Grids

CoreGRID Technical Report TR-0164

The objective of Grid computing is to make processing power as accessible and easy to use as electricity and water. The last decade has seen an unprecedented growth in Grid infrastructures which nowadays enables large-scale deployment of applications in the scientific computation domain. One of the main challenges in realizing the full potential of Grids is making these systems dependable. In this paper we present FailRank, a novel framework for integrating and ranking information sources that characterize failures in a Grid system. After the failing sites have been ranked, these can be eliminated from the job scheduling resource pool yielding in that way a more predictable, dependable and adaptive infrastructure. We also present the tools we developed towards evaluating the FailRank framework. In particular, we present the FailBase Repository which is a 38GB corpus of state information that characterizes the EGEE Grid for one month in 2007. Such a corpus paves the way for the community to systematically uncover new, previously unknown patterns and rules between the multitudes of parameters that can contribute to failures in a Grid environment. Additionally, we present an experimental evaluation study of the FailRank system over 30 days which shows that our framework identifies failures in 93% of the cases and can achieve this by only fetching 65% of the available information sources. We believe that our work constitutes another important step towards realizing adaptive Grid computing systems.
Last Updated ( Wednesday, 23 July 2008 )
 
 
spacer spacer
spacer spacer
 
© 2009 CoreGRID Network of Excellence - European Grid Research
 
spacer spacer