Map Reduce in Mongo DB

Ajay Singh Chouhan
4 min readJul 16, 2021

--

Introduction to MongoDB

MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Server Side Public License

MongoDB is a document database designed for ease of development and scaling. The Manual introduces key concepts in MongoDB, presents the query language, and provides operational and administrative considerations and procedures as well as a comprehensive reference section

Introduction to MapReduce

MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.

Map-Reduce is a programming model that is mainly divided into two phases Map Phase and Reduce Phase. It is designed for processing the data in parallel which is divided into various machines(nodes)

  1. Mapper: It performs filtering and sorting
  2. Reducer: which performs a summary operation (such as counting, aggression )

Map-Reduce in MongoDB

Map-reduce is a data processing pattern for condensing large big data into useful aggregated results. To perform map-reduce operations, MongoDB provides the mapReduce database command.
Map-reduce operations use custom JavaScript functions to map, or associate, values to a key.

  • map is a javascript function that maps a value with a key and emits a key-value pair
  • reduce is a javascript function that reduces or groups all the documents having the same key
  • out specifies the location of the map-reduce query result
  • query specifies the optional selection criteria for selecting documents

Data Set

db.orders.insertMany([
{ _id: 1, cust_id: “Ant O. Knee”, ord_date: new Date(“2020–03–01”), price: 25, items: [ { sku: “oranges”, qty: 5, price: 2.5 }, { sku: “apples”, qty: 5, price: 2.5 } ], status: “A” },
{ _id: 2, cust_id: “Ant O. Knee”, ord_date: new Date(“2020–03–08”), price: 70, items: [ { sku: “oranges”, qty: 8, price: 2.5 }, { sku: “chocolates”, qty: 5, price: 10 } ], status: “A” },
{ _id: 3, cust_id: “Busby Bee”, ord_date: new Date(“2020–03–08”), price: 50, items: [ { sku: “oranges”, qty: 10, price: 2.5 }, { sku: “pears”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 4, cust_id: “Busby Bee”, ord_date: new Date(“2020–03–18”), price: 25, items: [ { sku: “oranges”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 5, cust_id: “Busby Bee”, ord_date: new Date(“2020–03–19”), price: 50, items: [ { sku: “chocolates”, qty: 5, price: 10 } ], status: “A”},
{ _id: 6, cust_id: “Cam Elot”, ord_date: new Date(“2020–03–19”), price: 35, items: [ { sku: “carrots”, qty: 10, price: 1.0 }, { sku: “apples”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 7, cust_id: “Cam Elot”, ord_date: new Date(“2020–03–20”), price: 25, items: [ { sku: “oranges”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 8, cust_id: “Don Quis”, ord_date: new Date(“2020–03–20”), price: 75, items: [ { sku: “chocolates”, qty: 5, price: 10 }, { sku: “apples”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 9, cust_id: “Don Quis”, ord_date: new Date(“2020–03–20”), price: 55, items: [ { sku: “carrots”, qty: 5, price: 1.0 }, { sku: “apples”, qty: 10, price: 2.5 }, { sku: “oranges”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 10, cust_id: “Don Quis”, ord_date: new Date(“2020–03–23”), price: 25, items: [ { sku: “oranges”, qty: 10, price: 2.5 } ], status: “A” }
])

Example Statement :

Find the average day-wise sale of fruits, Perform the map-reduce operation on the orders collection to group by the Order_date, and calculate the sum of the price for each Day :
1. Define the map function to process each input document:

  • create Javascript function to maps the price to the order_date for each document
  • emits function help to take the output to the next function.

var mapFun = function() {
emit(this.ord_date, this.price); };

2. Define the corresponding reduce function with two arguments key and value

  • The value is an element of the price emitted by the map function and grouped by order_date.
  • The reduce function averages the elements of price.

var redFun= function( key , value ) {
return Array.avg(value);};

3. Perform map-reduce on all documents

  • mapFun and redFun are the mapper and reducer functions
  • query specifies the optional selection criteria for selecting documents
  • out specifies the location of the map-reduce query result.

db.orders.mapReduce(
mapFun ,redFun , { query: { ord_date: { $gt: ISODate(“2020–03–01”)}},
out: “output” })

The Result stored in the collection output

******************************THE END****************************

--

--