Can Reducer always be reused for Combiner?

A Combiner function is an optional intermediary function which is executed on the Map phase right after the execution of the Mapper is complete. There are 2 primary benefits to use a combiner –

  1. Combiners can be used to reduce the amount of data sent to the reducer which increases network efficiency.
  2. Combiners can be used to reduce the amount of data sent to the reducer and this will improve the efficiency at the reduce side since each reduce function will be presented with less amount of records to process.

 

The signature of the Combiner program is same as the Reducer since both Combiner and Reducer process on the output of the Mapper. Which gives us a great opportunity to reuse the Reducer program as the Combiner.

But the question is, is it always a good idea to reuse reducer program for combiner?

Reducer for Combiner – Good use case
Let’s say we are writing a MapReduce program to calculate maximum closing price for each symbol from a stocks dataset. The mapper program will emit the symbol as the key and closing price as the value for each stock record from the dataset. The reducer will be called once for each stock symbol and with a list of closing prices. The reducer will then loop through all the closing prices for the symbol and will calculate the maximum closing price from the list of closing prices for that symbol.

Assume Mapper 1 processed 3 records for symbol ABC with closing prices – 50, 60 and 111. Let’s also assume that Mapper 2 processed 2 records for symbol ABC with closing prices – 100 and 31.

Now the reducer will receive five closing prices for symbol ABC – 50, 60, 111, 100 and 31. The job of the reducer is very simple it will simply loop through all the 5 closing prices and will calculate the maximum closing price to be 111.

We can use the same reducer program for combiner after each Mapper. The combiner on mapper 1 will process 3 closing prices – 50, 60 and 111 and will emit only 111 since it is the maximum closing price of the 3 values which is 111. The combiner on mapper 2 will process 2 closing prices – 100 and 31 and will emit only 100 since it is the maximum closing price of the 2 values which is 100.

Now with combiner reducer will only process 2 closing prices for symbol ABC which is 111 from Mapper 1 and 100 from Mapper 2 and will calculate the maximum closing price as 111 from both the values.

As we can see the output is the same with and with out the combiner hence in this case reusing the reducer as a combiner worked with no issues.

Reducer for Combiner – Bad use case
Let’s say we are writing a MapReduce program to calculate the average volume for each symbol from a stocks dataset. The mapper program will emit the symbol as the key and volume as the value for each stock record from the dataset. The reducer will be called once for each stock symbol and with a list of volumes. The reducer will then loop through all the volumes for the symbol and will calculate the average volume from the list of volumes for that symbol.

Assume Mapper 1 processed 3 records for symbol ABC with volumes – 50, 60 and 111. Let’s also assume that Mapper 2 processed 2 records for symbol ABC with volumes – 100 and 31.

Now the reducer will receive five volume values for symbol ABC – 50, 60, 111, 100 and 31. The job of the reducer is very simple it will simply loop through all the 5 volumes and will calculate the average volume to be 70.4

50 + 60 + 111 + 100 + 31 / 5 = 352 / 5 = 70.4

Let’s see what happens if we use the same reducer program as combiner after each Mapper. The combiner on mapper 1 will process 3 volumes – 50, 60 and 111 and will calculate the average of the 3 volumes 73.66

The combiner on mapper 2 will process 2 volumes – 100 and 31 and will calculate the average volume of the 2 values which is 65.5.

Now with the combiner in place, reducer will only process 2 average volumes for symbol ABC which is 73.66 from Mapper 1 and 65.5 from Mapper 2 and will calculate the average volume of symbol ABC as 73.66 + 65.5 /2 = 69.58 which is incorrect as the correct average volume is 70.4

So as we can see Reducer can not always be reused for Combiner. So when ever you decide to reuse reducer for combiner ask yourself this question – will my output be the same with and without the use of combiner ?