Amazon DocumentDB is a fully managed document database service that supports MongoDB workloads. In this post, weβll walk through a practical, large-scale indexing and performance analysis using DocumentDB. Weβll simulate inserting 100,000 documents, creating indexes, measuring stats, and analyzing the impact of updates on performance.
We'll use raw shell commands with real outputs β no shortcuts. Letβs get started.
π§ Step 1: Insert 100,000 Documents into DocumentDB
// Choose the database
use mylibrary;
// Define helper functions (basic randomizer)
function getRandomElement(arr) {
return arr[Math.floor(Math.random() * arr.length)];
}
function getRandomInt(min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min;
}
var genres = ['Fiction', 'Non-Fiction', 'Science', 'Fantasy', 'Biography', 'History', 'Mystery', 'Romance'];
var formats = ['epub', 'pdf', 'txt', 'audio'];
var firstNames = ['John', 'Mary', 'Alice', 'Robert', 'Linda', 'Michael', 'Sarah', 'David'];
var lastNames = ['Smith', 'Johnson', 'Williams', 'Brown', 'Jones', 'Miller', 'Davis', 'Garcia'];
for (var i = 0; i < 100000; i++) {
var author = getRandomElement(firstNames) + " " + getRandomElement(lastNames);
var title = "Book Title " + (i + 1);
var book = {
author: author,
title: "title,"
genre: getRandomElement(genres),
format: getRandomElement(formats),
num_of_pages: getRandomInt(50, 1000),
year_when_published: getRandomInt(1900, 2025)
};
db.books.insertOne(book);
}
db.books.find().count();
Output:
100000
π Step 2: Create Indexes on Author and Title + Year
db.books.createIndex({ author: 1 })
db.books.createIndex({ title: 1, year_when_published: 1 })
π Step 3: Get Collection Stats
db.books.stats()
Output:
{
ns: 'mylibrary;.books',
count: 100000,
size: 20600000,
storageSize: 24150016,
nindexes: 3,
totalIndexSize: 14106624,
indexSizes: {
_id_: 4079616,
author_1: 4104192,
title_1_year_when_published_1: 5922816
},
...
}
π Step 4: Print Only Index Sizes and OpCounter
const stats = db.books.stats();
printjson({
ns: stats.ns,
indexSizes: stats.indexSizes,
opCounter: stats.opCounter
});
Output:
{
ns: 'mylibrary;.books',
indexSizes: {
_id_: 4079616,
author_1: 4104192,
title_1_year_when_published_1: 5922816
},
opCounter: {
numDocsIns: 100000,
numDocsUpd: 0,
numDocsDel: 0
}
}
π Step 5: Check Index Metadata
db.books.getIndexes()
Output:
[
{ key: { _id: 1 }, name: '_id_' },
{ key: { author: 1 }, name: 'author_1' },
{ key: { title: 1, year_when_published: 1 }, name: 'title_1_year_when_published_1' }
]
π Step 6: Perform 10 Updates and Check opCounter
Diff
const before = db.books.stats().opCounter;
db.books.find().limit(10).forEach(doc => {
db.books.updateOne({ _id: doc._id }, { $inc: { num_of_pages: 1 } });
});
const after = db.books.stats().opCounter;
const diff = {
numDocsIns: after.numDocsIns - before.numDocsIns,
numDocsUpd: after.numDocsUpd - before.numDocsUpd,
numDocsDel: after.numDocsDel - before.numDocsDel
};
printjson(diff);
Output:
{
numDocsIns: 0,
numDocsUpd: 10,
numDocsDel: 0
}
π Step 7: Perform 1000 Updates and Check Index Growth
const beforeIndexes = db.books.stats().indexSizes;
db.books.find().limit(1000).forEach(doc => {
db.books.updateOne({ _id: doc._id }, { $inc: { num_of_pages: 1 } });
});
const afterIndexes = db.books.stats().indexSizes;
const diffIndexes = {};
for (const key in beforeIndexes) {
diffIndexes[key] = afterIndexes[key] - beforeIndexes[key];
}
printjson(diffIndexes);
Output:
{
_id_: 0,
author_1: 16384,
title_1_year_when_published_1: 0
}
π Step 8: Rebuild author_1
Index
db.books.dropIndex("author_1")
db.books.createIndex({ author: 1 }, { background: true })
β‘ Step 9: Compare Execution Plan With and Without Index
// 1. Query using index
const indexedPlan = db.books.find({ author: "Sarah Williams" })
.explain("executionStats");
const indexedTime = indexedPlan.executionStats.executionTimeMillis;
// 2. Query forcing collection scan
const collScanPlan = db.books.find({ author: "Sarah Williams" })
.hint({ $natural: 1 })
.explain("executionStats");
const collScanTime = collScanPlan.executionStats.executionTimeMillis;
// 3. Print both times and their difference
print("Execution Time With Index (ms):", indexedTime);
print("Execution Time Without Index (ms):", collScanTime);
print("Difference (Without - With):", collScanTime - indexedTime);
Output:
Execution Time With Index (ms): 3.716
Execution Time Without Index (ms): 90.578
Difference (Without - With): 86.862
π§ͺ Step 10: Update Author Field and Rerun the Same Query
// STEP 1: Query before updates
const beforePlan = db.books.find({ author: "Sarah Williams" })
.explain("executionStats");
const beforeTime = beforePlan.executionStats.executionTimeMillis;
// STEP 2: Perform 1000 updates that affect the `author` index
let updated = 0;
db.books.find().limit(1000).forEach(doc => {
const newAuthor = doc.author + "_v2_" + updated;
db.books.updateOne({ _id: doc._id }, { $set: { author: newAuthor } });
updated++;
});
// STEP 3: Query again after updates
const afterPlan = db.books.find({ author: "Sarah Williams" })
.explain("executionStats");
const afterTime = afterPlan.executionStats.executionTimeMillis;
// STEP 4: Compare timings
print("Execution Time With Index (Before Updates):", beforeTime, "ms");
print("Execution Time With Index (After Updates):", afterTime, "ms");
print("Time Difference (After - Before):", afterTime - beforeTime, "ms");
Output:
Execution Time With Index (Before Updates): 2.208 ms
Execution Time With Index (After Updates): 2.431 ms
Time Difference (After - Before): 0.223 ms
π§Ύ Conclusion
In this walkthrough, we:
- Created 100K documents in AWS DocumentDB.
- Built and measured the impact of indexes.
- Used
opCounter
andstats()
to track insert and update behavior. - Compared index performance vs. collection scans.
- Saw how index size and query performance can shift under load.
This kind of data instrumentation is critical when working with large-scale workloads in DocumentDB β especially since it's a managed service where you donβt have access to low-level server controls.
β Tip: Use these techniques to continuously monitor, tune, and optimize your DocumentDB performance β especially when dealing with write-heavy workloads or evolving query patterns.
Top comments (0)