What is the workflow of MapReduce?

1 year ago

William Carter

1 minute

The workflow of MapReduce can be briefly described as the following steps:

Split: dividing the input data into multiple small data chunks, each chunk is called an input split.
Mapping: Distributing the sliced data chunks to multiple Map tasks for processing. Each Map task reads its assigned data chunk and processes the data based on a custom mapping function, converting the data into pairs.
Shuffle: The output from the Map tasks is partitioned by key and the values with the same key are then sorted.
Merge: Combine pairs for each partition to reduce data transfer volume.
Reduce: Distribute merged pairs to multiple Reduce tasks for processing. Each Reduce task processes the data using a custom reduction function to generate output results.
Merge output: Combine the output results of multiple Reduce tasks to form the final result.

It is important to note that the workflow of MapReduce can be customized, allowing users to write their own Map and Reduce functions based on their requirements and adjust the entire process by setting appropriate parameters.