MapReduce - DMJCCLT - dmj.one

MapReduce

1. Introduction to MapReduce

MapReduce is a programming model used to process and generate large datasets across a distributed system. It simplifies parallel computation by abstracting the processing into two primary functions: Map and Reduce.

2. Core Components of MapReduce

2.1 Map Function

The Map function processes input data, line by line, and outputs key-value pairs. Each line of input data is parsed, and meaningful keys and values are generated for further processing.


public static class MapClass extends MapReduceBase 
 implements Mapper {
  private final static IntWritable one = new IntWritable(1);
  private Text word = new Text();

  public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) 
      throws IOException {
    String line = value.toString();
    StringTokenizer itr = new StringTokenizer(line);
    while (itr.hasMoreTokens()) {
      word.set(itr.nextToken());
      output.collect(word, one);
    }
  }
}

2.2 Reduce Function

The Reduce function aggregates values associated with a common key. It takes the intermediate key-value pairs from the Map phase and combines the values for each key to produce the final result.


public static class ReduceClass extends MapReduceBase 
 implements Reducer {
  public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) 
      throws IOException {
    int sum = 0;
    while (values.hasNext()) {
      sum += values.next().get();
    }
    output.collect(key, new IntWritable(sum));
  }
}

3. MapReduce Workflow

3.1 Input Data

Data is stored in the form of key-value pairs, which are processed line by line during the Map phase.

3.2 Execution Phases

3.3 Example

For a word count problem, the Map function outputs (word, 1) for each word, and the Reduce function sums up counts for each unique word.

4. Practical Applications of MapReduce

5. Sample Use Case: Mutual Followers

5.1 Problem Statement

Given a dataset of user relationships (a, b), where a follows b, find all pairs of users who mutually follow each other.

5.2 Map Function


public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) 
    throws IOException {
  String line = value.toString();
  String[] users = line.split(",");
  if (users[0].compareTo(users[1]) < 0) {
    output.collect(new Text("(" + users[0] + "," + users[1] + ")"), new IntWritable(1));
  } else {
    output.collect(new Text("(" + users[1] + "," + users[0] + ")"), new IntWritable(1));
  }
}

5.3 Reduce Function


public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) 
    throws IOException {
  int count = 0;
  while (values.hasNext()) {
    count += values.next().get();
  }
  if (count == 2) {
    output.collect(key, new IntWritable(count));
  }
}

6. Key Features of MapReduce