Dmitry Romanoff

Posted on Jul 12

Working with AWS DocumentDB: Indexing, Performance, and Stats at Scale

#database #productivity #performance #aws

Amazon DocumentDB is a fully managed document database service that supports MongoDB workloads. In this post, we’ll walk through a practical, large-scale indexing and performance analysis using DocumentDB. We’ll simulate inserting 100,000 documents, creating indexes, measuring stats, and analyzing the impact of updates on performance.

We'll use raw shell commands with real outputs — no shortcuts. Let’s get started.

🔧 Step 1: Insert 100,000 Documents into DocumentDB

// Choose the database
use mylibrary;

// Define helper functions (basic randomizer)
function getRandomElement(arr) {
  return arr[Math.floor(Math.random() * arr.length)];
}

function getRandomInt(min, max) {
  return Math.floor(Math.random() * (max - min + 1)) + min;
}

var genres = ['Fiction', 'Non-Fiction', 'Science', 'Fantasy', 'Biography', 'History', 'Mystery', 'Romance'];
var formats = ['epub', 'pdf', 'txt', 'audio'];
var firstNames = ['John', 'Mary', 'Alice', 'Robert', 'Linda', 'Michael', 'Sarah', 'David'];
var lastNames = ['Smith', 'Johnson', 'Williams', 'Brown', 'Jones', 'Miller', 'Davis', 'Garcia'];

for (var i = 0; i < 100000; i++) {
  var author = getRandomElement(firstNames) + " " + getRandomElement(lastNames);
  var title = "Book Title " + (i + 1);
  var book = {
    author: author,
    title: "title,"
    genre: getRandomElement(genres),
    format: getRandomElement(formats),
    num_of_pages: getRandomInt(50, 1000),
    year_when_published: getRandomInt(1900, 2025)
  };

  db.books.insertOne(book);
}

db.books.find().count();

Output:

🔍 Step 2: Create Indexes on Author and Title + Year

db.books.createIndex({ author: 1 })
db.books.createIndex({ title: 1, year_when_published: 1 })

📊 Step 3: Get Collection Stats

db.books.stats()

Output:

{
  ns: 'mylibrary;.books',
  count: 100000,
  size: 20600000,
  storageSize: 24150016,
  nindexes: 3,
  totalIndexSize: 14106624,
  indexSizes: {
    _id_: 4079616,
    author_1: 4104192,
    title_1_year_when_published_1: 5922816
  },
  ...
}

📌 Step 4: Print Only Index Sizes and OpCounter

const stats = db.books.stats();
printjson({
  ns: stats.ns,
  indexSizes: stats.indexSizes,
  opCounter: stats.opCounter
});

Output:

{
  ns: 'mylibrary;.books',
  indexSizes: {
    _id_: 4079616,
    author_1: 4104192,
    title_1_year_when_published_1: 5922816
  },
  opCounter: {
    numDocsIns: 100000,
    numDocsUpd: 0,
    numDocsDel: 0
  }
}

📂 Step 5: Check Index Metadata

db.books.getIndexes()

Output:

[
  { key: { _id: 1 }, name: '_id_' },
  { key: { author: 1 }, name: 'author_1' },
  { key: { title: 1, year_when_published: 1 }, name: 'title_1_year_when_published_1' }
]

🔄 Step 6: Perform 10 Updates and Check `opCounter` Diff

const before = db.books.stats().opCounter;

db.books.find().limit(10).forEach(doc => {
  db.books.updateOne({ _id: doc._id }, { $inc: { num_of_pages: 1 } });
});

const after = db.books.stats().opCounter;

const diff = {
  numDocsIns: after.numDocsIns - before.numDocsIns,
  numDocsUpd: after.numDocsUpd - before.numDocsUpd,
  numDocsDel: after.numDocsDel - before.numDocsDel
};

printjson(diff);

Output:

{
  numDocsIns: 0,
  numDocsUpd: 10,
  numDocsDel: 0
}

📈 Step 7: Perform 1000 Updates and Check Index Growth

const beforeIndexes = db.books.stats().indexSizes;

db.books.find().limit(1000).forEach(doc => {
  db.books.updateOne({ _id: doc._id }, { $inc: { num_of_pages: 1 } });
});

const afterIndexes = db.books.stats().indexSizes;

const diffIndexes = {};
for (const key in beforeIndexes) {
  diffIndexes[key] = afterIndexes[key] - beforeIndexes[key];
}

printjson(diffIndexes);

Output:

{
  _id_: 0,
  author_1: 16384,
  title_1_year_when_published_1: 0
}

🔁 Step 8: Rebuild `author_1` Index

db.books.dropIndex("author_1")
db.books.createIndex({ author: 1 }, { background: true })

⚡ Step 9: Compare Execution Plan With and Without Index

// 1. Query using index
const indexedPlan = db.books.find({ author: "Sarah Williams" })
  .explain("executionStats");
const indexedTime = indexedPlan.executionStats.executionTimeMillis;

// 2. Query forcing collection scan
const collScanPlan = db.books.find({ author: "Sarah Williams" })
  .hint({ $natural: 1 })
  .explain("executionStats");
const collScanTime = collScanPlan.executionStats.executionTimeMillis;

// 3. Print both times and their difference
print("Execution Time With Index (ms):", indexedTime);
print("Execution Time Without Index (ms):", collScanTime);
print("Difference (Without - With):", collScanTime - indexedTime);

Output:

Execution Time With Index (ms): 3.716
Execution Time Without Index (ms): 90.578
Difference (Without - With): 86.862

🧪 Step 10: Update Author Field and Rerun the Same Query

// STEP 1: Query before updates
const beforePlan = db.books.find({ author: "Sarah Williams" })
  .explain("executionStats");
const beforeTime = beforePlan.executionStats.executionTimeMillis;

// STEP 2: Perform 1000 updates that affect the `author` index
let updated = 0;
db.books.find().limit(1000).forEach(doc => {
  const newAuthor = doc.author + "_v2_" + updated;
  db.books.updateOne({ _id: doc._id }, { $set: { author: newAuthor } });
  updated++;
});

// STEP 3: Query again after updates
const afterPlan = db.books.find({ author: "Sarah Williams" })
  .explain("executionStats");
const afterTime = afterPlan.executionStats.executionTimeMillis;

// STEP 4: Compare timings
print("Execution Time With Index (Before Updates):", beforeTime, "ms");
print("Execution Time With Index (After Updates):", afterTime, "ms");
print("Time Difference (After - Before):", afterTime - beforeTime, "ms");

Output:

Execution Time With Index (Before Updates): 2.208 ms
Execution Time With Index (After Updates): 2.431 ms
Time Difference (After - Before): 0.223 ms

🧾 Conclusion

In this walkthrough, we:

Created 100K documents in AWS DocumentDB.
Built and measured the impact of indexes.
Used opCounter and stats() to track insert and update behavior.
Compared index performance vs. collection scans.
Saw how index size and query performance can shift under load.

This kind of data instrumentation is critical when working with large-scale workloads in DocumentDB — especially since it's a managed service where you don’t have access to low-level server controls.

✅ Tip: Use these techniques to continuously monitor, tune, and optimize your DocumentDB performance — especially when dealing with write-heavy workloads or evolving query patterns.

DEV Community

Working with AWS DocumentDB: Indexing, Performance, and Stats at Scale

🔧 Step 1: Insert 100,000 Documents into DocumentDB

🔍 Step 2: Create Indexes on Author and Title + Year

📊 Step 3: Get Collection Stats

📌 Step 4: Print Only Index Sizes and OpCounter

📂 Step 5: Check Index Metadata

🔄 Step 6: Perform 10 Updates and Check `opCounter` Diff

📈 Step 7: Perform 1000 Updates and Check Index Growth

🔁 Step 8: Rebuild `author_1` Index

⚡ Step 9: Compare Execution Plan With and Without Index

🧪 Step 10: Update Author Field and Rerun the Same Query

🧾 Conclusion

Top comments (0)

🔧 Step 1: Insert 100,000 Documents into DocumentDB

🔍 Step 2: Create Indexes on Author and Title + Year

📊 Step 3: Get Collection Stats

📌 Step 4: Print Only Index Sizes and OpCounter

📂 Step 5: Check Index Metadata

🔄 Step 6: Perform 10 Updates and Check opCounter Diff

📈 Step 7: Perform 1000 Updates and Check Index Growth

🔁 Step 8: Rebuild author_1 Index

⚡ Step 9: Compare Execution Plan With and Without Index

🧪 Step 10: Update Author Field and Rerun the Same Query

🧾 Conclusion

🔄 Step 6: Perform 10 Updates and Check `opCounter` Diff

🔁 Step 8: Rebuild `author_1` Index