DEV Community

Dmitry Romanoff
Dmitry Romanoff

Posted on

Working with AWS DocumentDB: Indexing, Performance, and Stats at Scale

Amazon DocumentDB is a fully managed document database service that supports MongoDB workloads. In this post, we’ll walk through a practical, large-scale indexing and performance analysis using DocumentDB. We’ll simulate inserting 100,000 documents, creating indexes, measuring stats, and analyzing the impact of updates on performance.

We'll use raw shell commands with real outputs β€” no shortcuts. Let’s get started.


πŸ”§ Step 1: Insert 100,000 Documents into DocumentDB

// Choose the database
use mylibrary;

// Define helper functions (basic randomizer)
function getRandomElement(arr) {
  return arr[Math.floor(Math.random() * arr.length)];
}

function getRandomInt(min, max) {
  return Math.floor(Math.random() * (max - min + 1)) + min;
}

var genres = ['Fiction', 'Non-Fiction', 'Science', 'Fantasy', 'Biography', 'History', 'Mystery', 'Romance'];
var formats = ['epub', 'pdf', 'txt', 'audio'];
var firstNames = ['John', 'Mary', 'Alice', 'Robert', 'Linda', 'Michael', 'Sarah', 'David'];
var lastNames = ['Smith', 'Johnson', 'Williams', 'Brown', 'Jones', 'Miller', 'Davis', 'Garcia'];

for (var i = 0; i < 100000; i++) {
  var author = getRandomElement(firstNames) + " " + getRandomElement(lastNames);
  var title = "Book Title " + (i + 1);
  var book = {
    author: author,
    title: "title,"
    genre: getRandomElement(genres),
    format: getRandomElement(formats),
    num_of_pages: getRandomInt(50, 1000),
    year_when_published: getRandomInt(1900, 2025)
  };

  db.books.insertOne(book);
}

db.books.find().count();
Enter fullscreen mode Exit fullscreen mode

Output:

100000
Enter fullscreen mode Exit fullscreen mode

πŸ” Step 2: Create Indexes on Author and Title + Year

db.books.createIndex({ author: 1 })
db.books.createIndex({ title: 1, year_when_published: 1 })
Enter fullscreen mode Exit fullscreen mode

πŸ“Š Step 3: Get Collection Stats

db.books.stats()
Enter fullscreen mode Exit fullscreen mode

Output:

{
  ns: 'mylibrary;.books',
  count: 100000,
  size: 20600000,
  storageSize: 24150016,
  nindexes: 3,
  totalIndexSize: 14106624,
  indexSizes: {
    _id_: 4079616,
    author_1: 4104192,
    title_1_year_when_published_1: 5922816
  },
  ...
}
Enter fullscreen mode Exit fullscreen mode

πŸ“Œ Step 4: Print Only Index Sizes and OpCounter

const stats = db.books.stats();
printjson({
  ns: stats.ns,
  indexSizes: stats.indexSizes,
  opCounter: stats.opCounter
});
Enter fullscreen mode Exit fullscreen mode

Output:

{
  ns: 'mylibrary;.books',
  indexSizes: {
    _id_: 4079616,
    author_1: 4104192,
    title_1_year_when_published_1: 5922816
  },
  opCounter: {
    numDocsIns: 100000,
    numDocsUpd: 0,
    numDocsDel: 0
  }
}
Enter fullscreen mode Exit fullscreen mode

πŸ“‚ Step 5: Check Index Metadata

db.books.getIndexes()
Enter fullscreen mode Exit fullscreen mode

Output:

[
  { key: { _id: 1 }, name: '_id_' },
  { key: { author: 1 }, name: 'author_1' },
  { key: { title: 1, year_when_published: 1 }, name: 'title_1_year_when_published_1' }
]
Enter fullscreen mode Exit fullscreen mode

πŸ”„ Step 6: Perform 10 Updates and Check opCounter Diff

const before = db.books.stats().opCounter;

db.books.find().limit(10).forEach(doc => {
  db.books.updateOne({ _id: doc._id }, { $inc: { num_of_pages: 1 } });
});

const after = db.books.stats().opCounter;

const diff = {
  numDocsIns: after.numDocsIns - before.numDocsIns,
  numDocsUpd: after.numDocsUpd - before.numDocsUpd,
  numDocsDel: after.numDocsDel - before.numDocsDel
};

printjson(diff);
Enter fullscreen mode Exit fullscreen mode

Output:

{
  numDocsIns: 0,
  numDocsUpd: 10,
  numDocsDel: 0
}
Enter fullscreen mode Exit fullscreen mode

πŸ“ˆ Step 7: Perform 1000 Updates and Check Index Growth

const beforeIndexes = db.books.stats().indexSizes;

db.books.find().limit(1000).forEach(doc => {
  db.books.updateOne({ _id: doc._id }, { $inc: { num_of_pages: 1 } });
});

const afterIndexes = db.books.stats().indexSizes;

const diffIndexes = {};
for (const key in beforeIndexes) {
  diffIndexes[key] = afterIndexes[key] - beforeIndexes[key];
}

printjson(diffIndexes);
Enter fullscreen mode Exit fullscreen mode

Output:

{
  _id_: 0,
  author_1: 16384,
  title_1_year_when_published_1: 0
}
Enter fullscreen mode Exit fullscreen mode

πŸ” Step 8: Rebuild author_1 Index

db.books.dropIndex("author_1")
db.books.createIndex({ author: 1 }, { background: true })
Enter fullscreen mode Exit fullscreen mode

⚑ Step 9: Compare Execution Plan With and Without Index

// 1. Query using index
const indexedPlan = db.books.find({ author: "Sarah Williams" })
  .explain("executionStats");
const indexedTime = indexedPlan.executionStats.executionTimeMillis;

// 2. Query forcing collection scan
const collScanPlan = db.books.find({ author: "Sarah Williams" })
  .hint({ $natural: 1 })
  .explain("executionStats");
const collScanTime = collScanPlan.executionStats.executionTimeMillis;

// 3. Print both times and their difference
print("Execution Time With Index (ms):", indexedTime);
print("Execution Time Without Index (ms):", collScanTime);
print("Difference (Without - With):", collScanTime - indexedTime);
Enter fullscreen mode Exit fullscreen mode

Output:

Execution Time With Index (ms): 3.716
Execution Time Without Index (ms): 90.578
Difference (Without - With): 86.862
Enter fullscreen mode Exit fullscreen mode

πŸ§ͺ Step 10: Update Author Field and Rerun the Same Query

// STEP 1: Query before updates
const beforePlan = db.books.find({ author: "Sarah Williams" })
  .explain("executionStats");
const beforeTime = beforePlan.executionStats.executionTimeMillis;

// STEP 2: Perform 1000 updates that affect the `author` index
let updated = 0;
db.books.find().limit(1000).forEach(doc => {
  const newAuthor = doc.author + "_v2_" + updated;
  db.books.updateOne({ _id: doc._id }, { $set: { author: newAuthor } });
  updated++;
});

// STEP 3: Query again after updates
const afterPlan = db.books.find({ author: "Sarah Williams" })
  .explain("executionStats");
const afterTime = afterPlan.executionStats.executionTimeMillis;

// STEP 4: Compare timings
print("Execution Time With Index (Before Updates):", beforeTime, "ms");
print("Execution Time With Index (After Updates):", afterTime, "ms");
print("Time Difference (After - Before):", afterTime - beforeTime, "ms");
Enter fullscreen mode Exit fullscreen mode

Output:

Execution Time With Index (Before Updates): 2.208 ms
Execution Time With Index (After Updates): 2.431 ms
Time Difference (After - Before): 0.223 ms
Enter fullscreen mode Exit fullscreen mode

🧾 Conclusion

In this walkthrough, we:

  • Created 100K documents in AWS DocumentDB.
  • Built and measured the impact of indexes.
  • Used opCounter and stats() to track insert and update behavior.
  • Compared index performance vs. collection scans.
  • Saw how index size and query performance can shift under load.

This kind of data instrumentation is critical when working with large-scale workloads in DocumentDB β€” especially since it's a managed service where you don’t have access to low-level server controls.


βœ… Tip: Use these techniques to continuously monitor, tune, and optimize your DocumentDB performance β€” especially when dealing with write-heavy workloads or evolving query patterns.

Top comments (0)