1. Understanding Grids
Grids provide a distributed system for sharing computational and data resources across multiple administrative domains. They enable high-performance computing (HPC) applications to leverage geographically distributed resources efficiently.
1.1 Grid Applications
Grid computing is ideal for computation-intensive applications such as weather modeling, biological simulations, and physics experiments. An example is the Rapid Atmospheric Modeling System (RAMS), which uses grids to predict weather phenomena by running models on multiple processors.
- Characteristics:
- High computation-to-data ratio
- Distributed resource utilization
1.2 Grid Infrastructure
Grids consist of resources from various sites, like university clusters or supercomputing centers. Resources can be workstations or dedicated servers used during idle times or continuously.
1.2.1 Directed Acyclic Graph (DAG) for Jobs
Grid applications are structured as DAGs, where:
- Each node represents a job
- Edges indicate dependencies between jobs
Jobs are embarrassingly parallel, meaning they can be split into tasks executed independently on different machines. For example:
Job 0 → Job 1 and Job 2 → Job 3
Here, Job 1 and Job 2 can execute simultaneously, followed by Job 3.
2. Scheduling in Grids
Efficient scheduling is crucial to optimize resource utilization and reduce execution time.
2.1 Intrasite Scheduling
Handles job scheduling within a site. Protocols like HTCondor manage task distribution, resource allocation, and fault recovery.
Key Features:
- Monitoring and restarting failed tasks
- Data staging within the site
- Task execution on idle workstations or dedicated servers
2.2 Intersite Scheduling
Manages resource allocation across multiple sites. Standard protocols like Globus oversee job distribution and file transfers between sites.
Key Responsibilities:
- Staging in and out of files
- Interoperability with local schedulers
- Resource delegation across federated sites
2.2.1 Globus Toolkit
An open-source platform providing tools for intersite grid management.
Components:
- GridFTP: Transfers large datasets between sites.
- GRAM: Allocates and manages jobs across grids.
- Replica Location Service (RLS): Maps user-friendly file names to physical locations.
3. Security in Grid Computing
Security is critical due to the federated nature of grids. Mechanisms like Grid Security Infrastructure (GSI) ensure data integrity and user authentication.
Key Features:
- Single sign-on for seamless user access
- Support for local security protocols (e.g., Kerberos, UNIX authentication)
- Delegation and third-party authentication
4. Comparing Grid and Cloud Computing
Both provide distributed computing resources but differ in control and use cases.
- Grid: Federated resources across organizations. Emphasizes computation-intensive tasks.
- Cloud: Centrally managed resources. Focuses on scalability and on-demand resource allocation.