Let's say that we are an employee for a big chain store. Mr. Chain-Store-CEO loves MongoDB so we store everything using it. One of our tables in Mongo is called "purchases" and contains every single purchase at every store.
The documents in "purchases" look like this:
The documents in "purchases" look like this:
{
id,
item,
date_purchased,
store
}
Let's say that we are tasked with a project to see which store has the most purchases. We want the output to just be [Store Name] - [Items purchased]. I will go over two ways that we can accomplish this task: MongoDB aggregations, and MongoDB MapReduce.
MongoDB Aggregations
MongoDB Aggregations work on a pipeline basis, meaning you can chain aggregations after another. For our purposes, we can write a fairly simple aggregation using this command.
MongoDB Aggregations
MongoDB Aggregations work on a pipeline basis, meaning you can chain aggregations after another. For our purposes, we can write a fairly simple aggregation using this command.
db.purchases.aggregate([{$group : {_id : "$store", total : {$sum:1}}])
This will make it so that it will group each document based on what they have on the "store" field, and sum up the values. You can think of the result as a key-value pair, where store is the key and the sum is the value.
Currently if we run that command, it will be mostly useless to us since we will be getting results in a seemingly random order. What we are interested in us a sorted result set. We can accomplish this by modifying our query to look like this:
db.purchases.aggregate([{$group:{_id:"$store", total:{$sum:1}}}, {$sort:{total : -1}}, {$limit : 10}])
This command works the same as the previous one, except the results will now come in a sorted descending order, as seen by the "$sort:{total:-1}". We are also limiting our result set to only the top 10 items for easier reading.
MapReduce
MongoDB MapReduce works similarly to aggregations, except it follows the traditional MapReduce paradigm (surprise!). This works by dividing the work up in two main phases: the Map phase, and the Reduce phase. Here I will explain how each of these phases work:
Map
The map phase is the first phase to execute and it goes through each document and performs a mapping function. This usually results in a key-value pair for each document that it goes through. In our particular problem, the key will be the store name, and the value will be the number 1.
MapReduce
MongoDB MapReduce works similarly to aggregations, except it follows the traditional MapReduce paradigm (surprise!). This works by dividing the work up in two main phases: the Map phase, and the Reduce phase. Here I will explain how each of these phases work:
Map
The map phase is the first phase to execute and it goes through each document and performs a mapping function. This usually results in a key-value pair for each document that it goes through. In our particular problem, the key will be the store name, and the value will be the number 1.
{ $store : 1 }
After the map phase, we will have a lot of key-value pairs that we will then hand off to our reduce phase.
Reduce
The reduce phase is the second phase, and it goes through the results of the map phase and does a sort of combination. In our example, we will be combining the results that have the same key and we will add the values together.
The following is the whole map-reduce function that we will be passing onto our mongo shell.
Reduce
The reduce phase is the second phase, and it goes through the results of the map phase and does a sort of combination. In our example, we will be combining the results that have the same key and we will add the values together.
The following is the whole map-reduce function that we will be passing onto our mongo shell.
db.purchases.mapReduce(function(){emit(this.store, 1);}, function(k,v) {return Array.sum(v)}, {out:"purchase_count"})
Here we see that we provided it a mapper function, which is:
function() {
emit(this.store, 1);
}
This emits the key-value pair that we talked about, the store being the key, and the value being "1".
We then pass it a reduce function:
We then pass it a reduce function:
function(k,v) {
return Array.sum(v)
}
Here we just sum the values that appear with the same key.
On the last part of the function, we pass to the Mongo shell the name of the output collection that it will be storing the results into. For us, we have called it "purchase_count". Once we execute this method, we should now see our results in the corresponding collection.
Congratulations! You have just completed your first Mongo aggregation! Please let me know of any comments or suggestions in the comments.
On the last part of the function, we pass to the Mongo shell the name of the output collection that it will be storing the results into. For us, we have called it "purchase_count". Once we execute this method, we should now see our results in the corresponding collection.
Congratulations! You have just completed your first Mongo aggregation! Please let me know of any comments or suggestions in the comments.