Four MapReduce design patterns – DZone Big Data

Last article I wrote about creating a simple word count program in MapReduce as well as a tutorial for running the program using Hadoop. Please go through it if you don’t know how to write a program in MapReduce.

To solve any problem in MapReduce, we have to think in terms of MapReduce. It is not necessarily true that each time we have both a card and a reduction work.

MapReduce Design Pattern

  • Entry-Map-Reduce-Exit
  • Entry-Card-Exit
  • Entry-Multiple Maps-Minimize-Exit
  • Input-Map-Combiner-Reduce-Output

Here is a real-time scenario to understand when to use which design template.

If we want to do an aggregation, this model is used:

Scenario

Count total gender / average employee salary

Card (key, value)

Key: Gender

Value: their salary

Reduce

Group by gender

And take the total salary for each group

Entry-Card-Exit

If we want to change only the data format, this template is used:

Scenario

Some employees have a gender entry like “Woman”, “F”, “f”, 0

Card (key, value)

Key: Employee ID

Value: Gender ->

if Gender is Female / F / f / 0 then converted to F

otherwise if Gender is Male / M / m / 1 then convert to M

  • Entry-Multiple Maps-Minimize-Exit

Entry-Multiple Maps-Minimize-Exit

In this design template, our input is taken from two files that have a different schema:

Scenario

We need to find the total salaries for the whole genre. But it has 2 files with different schema.

Input file 1

Gender is given as a prefix to the name

For example. Ms. Shital Katkar

Mr. Krishna Katkar

Input file 2

There is a different column for the genre. However, the format is mixed.

For example. Female / Male, 0/1, F / M

Card (key, value)

Card 1 (For entry 1)

We need to write a program to separate the prefix from the name and, depending on the prefix, determine the gender.

Next, prepare the key-value pair (Gender, Salary).

Card 2 (For entry 2)

Here the program will be simple. Solve the mixed format and create a key-value pair (Gender, Salary).

Reduce

Group by gender

And take the total salary for each group

  • Input-Map-Combiner-Reduce-Output

4. Enter-Map-Combine-Reduce-Exit

A Combiner, also known as a semi-reducer, is an optional class that works by accepting input from the Map class and then passing the output key-value pairs to the Reducer class. The purpose of the combiner is to reduce the workload of the reducer.

In the MapReduce program, 20% of the work is done in the map stage, also known as the data preparation stage, which works in parallel.

80% of the work is done in the reduction step which is known as the compute step, and it is not parallel. It is therefore slower than the Map phase. To reduce the time, some work in the reduction phase can be done in the combination phase.

Scenario

There are 5 departments. And we have to calculate the total salary for the whole sex. However, there are certain rules for calculating the total. After calculating the total by gender for each department, if the salary is more than 200K, add 20K to the total, if the salary is more than 100K, add 10K to the total

Input files (for each department, there is 1 file)

Menu

(Parallel)

(, Value = Salary)

Combiner

(Parallel)

Reducer

(Not parallel)

Go out

Department 1

Man

Women

Man

Women

Man

10,90,90,30>

Female

10,110,10,130,10>

Man

Female

Department 2

Man

Women

Man

Women

Dept 3

Man

Women

Man

Women

Department 4

Man

Women

Man

Women

Department 5

Man

Women

Man

Women

  • The points mentioned above are basic models. You can make your own according to the requirement of the problem.


Source link

Abdul J. Gaspar

Leave a Reply

Your email address will not be published.