By Jacobian


2015-01-23 07:31:56 8 Comments

I know that it is a bad practice to use skip in order to implement pagination, because when your data gets large skip starts to consume a lot of memory. One way to overcome this trouble is to use natural order by _id field:

//Page 1
db.users.find().limit(pageSize);
//Find the id of the last document in this page
last_id = ...

//Page 2
users = db.users.find({'_id'> last_id}). limit(10);

The problem is - I'm new to mongo and do not know what is the best way to get this very last_id

1 comments

@Neil Lunn 2015-01-23 08:02:21

The concept you are talking about can be called "forward paging". A good reason for that is unlike using .skip() and .limit() modifiers this cannot be used to "go back" to a previous page or indeed "skip" to a specific page. At least not with a great deal of effort to store "seen" or "discovered" pages, so if that type of "links to page" paging is what you want, then you are best off sticking with the .skip() and .limit() approach, despite the performance drawbacks.

If it is a viable option to you to only "move forward", then here is the basic concept:

db.junk.find().limit(3)

{ "_id" : ObjectId("54c03f0c2f63310180151877"), "a" : 1, "b" : 1 }
{ "_id" : ObjectId("54c03f0c2f63310180151878"), "a" : 4, "b" : 4 }
{ "_id" : ObjectId("54c03f0c2f63310180151879"), "a" : 10, "b" : 10 }

Of course that's your first page with a limit of 3 items. Consider that now with code iterating the cursor:

var lastSeen = null;
var cursor = db.junk.find().limit(3);

while (cursor.hasNext()) {
   var doc = cursor.next();
   printjson(doc);
   if (!cursor.hasNext())
     lastSeen = doc._id;
}

So that iterates the cursor and does something, and when it is true that the last item in the cursor is reached you store the lastSeen value to the present _id:

ObjectId("54c03f0c2f63310180151879")

In your subsequent iterations you just feed that _id value which you keep ( in session or whatever ) to the query:

var cursor = db.junk.find({ "_id": { "$gt": lastSeen } }).limit(3);

while (cursor.hasNext()) {
   var doc = cursor.next();
   printjson(doc);
   if (!cursor.hasNext())
     lastSeen = doc._id;
}

{ "_id" : ObjectId("54c03f0c2f6331018015187a"), "a" : 1, "b" : 1 }
{ "_id" : ObjectId("54c03f0c2f6331018015187b"), "a" : 6, "b" : 6 }
{ "_id" : ObjectId("54c03f0c2f6331018015187c"), "a" : 7, "b" : 7 }

And the process repeats over and over until no more results can be obtained.

That's the basic process for a natural order such as _id. For something else it gets a bit more complex. Consider the following:

{ "_id": 4, "rank": 3 }
{ "_id": 8, "rank": 3 }
{ "_id": 1, "rank": 3 }    
{ "_id": 3, "rank": 2 }

To split that into two pages sorted by rank then what you essentially need to know is what you have "already seen" and exclude those results. So looking at a first page:

var lastSeen = null;
var seenIds = [];
var cursor = db.junk.find().sort({ "rank": -1 }).limit(2);

while (cursor.hasNext()) {
   var doc = cursor.next();
   printjson(doc);
   if ( lastSeen != null && doc.rank != lastSeen )
       seenIds = [];
   seenIds.push(doc._id);
   if (!cursor.hasNext() || lastSeen == null)
     lastSeen = doc.rank;
}

{ "_id": 4, "rank": 3 }
{ "_id": 8, "rank": 3 }

On the next iteration you want to be less or equal to the lastSeen "rank" score, but also excluding those already seen documents. You do this with the $nin operator:

var cursor = db.junk.find(
    { "_id": { "$nin": seenIds }, "rank": "$lte": lastSeen }
).sort({ "rank": -1 }).limit(2);

while (cursor.hasNext()) {
   var doc = cursor.next();
   printjson(doc);
   if ( lastSeen != null && doc.rank != lastSeen )
       seenIds = [];
   seenIds.push(doc._id);
   if (!cursor.hasNext() || lastSeen == null)
     lastSeen = doc.rank;
}

{ "_id": 1, "rank": 3 }    
{ "_id": 3, "rank": 2 }

How many "seenIds" you actually hold on to depends on how "granular" your results are where that value is likely to change. In this case you can check if the current "rank" score is not equal to the lastSeen value and discard the present seenIds content so it does not grow to much.

That's the basic concepts of "forward paging" for you to practice and learn.

@Disposer 2015-01-23 08:26:18

@Neil-Lunn, such a full and nice explanation. I discovered something about you, It seems that you never sleep, I monitored you (kidding) and saw you are always online 24/7 ;)

@Neil Lunn 2015-01-23 08:27:59

@Disposer Mobile App and robots. Rise of the machines.

Related Questions

Sponsored Content

38 Answered Questions

[SOLVED] How to query MongoDB with "like"?

19 Answered Questions

[SOLVED] How do I drop a MongoDB database from the command line?

  • 2012-01-13 21:11:40
  • coffee-grinder
  • 446833 View
  • 861 Score
  • 19 Answer
  • Tags:   mongodb

26 Answered Questions

[SOLVED] Random record from MongoDB

16 Answered Questions

[SOLVED] "Large data" work flows using pandas

11 Answered Questions

[SOLVED] API pagination best practices

1 Answered Questions

Mongodb pagination by id revival - is it an anti pattern?

5 Answered Questions

[SOLVED] Using findOne in mongodb to get element with max id

  • 2014-03-01 18:07:34
  • Jorge
  • 61806 View
  • 56 Score
  • 5 Answer
  • Tags:   mongodb

2 Answered Questions

[SOLVED] Previous page on MongoDB range query pagination

  • 2014-09-14 11:11:33
  • jpanagiotidis
  • 942 View
  • 1 Score
  • 2 Answer
  • Tags:   mongodb pagination

1 Answered Questions

[SOLVED] MongoDB, sort() and pagination

Sponsored Content