Introduction to MongoDB - CSU677

MongoDB

1. Introduction to MongoDB

MongoDB is a NoSQL database that stores data in a flexible, JSON-like format called BSON (Binary JSON). Unlike traditional relational databases that use tables and rows, MongoDB uses collections and documents, making it well-suited for applications that require fast, scalable, and flexible data storage. MongoDB is popular for its ease of use, horizontal scalability, and ability to handle large volumes of unstructured or semi-structured data.

2. MongoDB Architecture

MongoDB's architecture is built around a few core concepts: databases, collections, and documents.

2.1 Database

A MongoDB instance can host multiple databases, each functioning as a high-level container for collections. Each database has its own set of files on the filesystem and is completely independent of other databases.

2.2 Collection

A collection is a group of MongoDB documents. It is the equivalent of a table in relational databases, but unlike tables, a collection does not enforce a schema. This means that documents within the same collection can have different fields, allowing for a high degree of flexibility.

2.3 Document

A document is the basic unit of data in MongoDB and is represented as a JSON-like object. Each document contains a set of key-value pairs, where the keys are field names, and the values can be various data types such as strings, numbers, arrays, or even other documents.

2.3.1 Example: Document Structure


{
  "_id": "507f191e810c19729de860ea",
  "name": "John Doe",
  "age": 29,
  "address": {
    "street": "123 Main St",
    "city": "Springfield",
    "state": "IL"
  },
  "interests": ["reading", "hiking", "coding"]
}

In this example, the document represents a person with fields like _id, name, age, address, and interests. The _id field is a unique identifier automatically generated by MongoDB if not provided.

3. Installing MongoDB

To start using MongoDB, you need to install it on your system. MongoDB is available for various operating systems, including Windows, macOS, and Linux.

3.1 Installation Steps

To install MongoDB on your system, follow these general steps:

Windows: Download the MongoDB installer from the official website and follow the installation instructions.
macOS: Use Homebrew to install MongoDB: brew tap mongodb/brew and brew install mongodb-community.
Linux: Use the package manager for your distribution. For example, on Ubuntu: sudo apt-get install -y mongodb.

3.2 Verifying Installation

After installation, you can verify that MongoDB is installed correctly by starting the MongoDB server:


mongod --version

This command should display the installed version of MongoDB.

4. Basic MongoDB Operations

Once MongoDB is installed, you can start interacting with it using the MongoDB shell or a programming language like JavaScript, Python, or Node.js. Here are some basic operations:

4.1 Inserting Documents

To insert a document into a collection, you can use the insertOne() or insertMany() methods.

4.1.1 Example: Inserting a Document


// Insert a single document
db.users.insertOne({
  "name": "Alice",
  "age": 30,
  "email": "[email protected]"
});

// Insert multiple documents
db.users.insertMany([
  {
    "name": "Bob",
    "age": 25,
    "email": "[email protected]"
  },
  {
    "name": "Charlie",
    "age": 35,
    "email": "[email protected]"
  }
]);

4.2 Querying Documents

To retrieve documents from a collection, you use the find() method, which allows you to specify query criteria.

4.2.1 Example: Querying Documents


// Find all documents in the collection
db.users.find();

// Find documents with a specific name
db.users.find({ "name": "Alice" });

// Find documents with age greater than 25
db.users.find({ "age": { $gt: 25 } });

In this example, the find() method retrieves documents that match the specified criteria, such as documents where the name is "Alice" or where the age is greater than 25.

4.3 Updating Documents

To update documents in a collection, you can use the updateOne(), updateMany(), or replaceOne() methods.

4.3.1 Example: Updating a Document


// Update a single document
db.users.updateOne(
  { "name": "Alice" },
  { $set: { "age": 31 } }
);

// Update multiple documents
db.users.updateMany(
  { "age": { $lt: 30 } },
  { $set: { "status": "young" } }
);

The updateOne() method updates a single document matching the criteria, while the updateMany() method updates all documents that match the criteria. The $set operator is used to specify the fields to be updated.

4.4 Deleting Documents

To remove documents from a collection, you can use the deleteOne() or deleteMany() methods.

4.4.1 Example: Deleting a Document


// Delete a single document
db.users.deleteOne({ "name": "Alice" });

// Delete multiple documents
db.users.deleteMany({ "age": { $gt: 30 } });

The deleteOne() method removes a single document that matches the criteria, while the deleteMany() method removes all documents that match the criteria.

5. Indexing in MongoDB

Indexes in MongoDB improve the performance of queries by reducing the amount of data that MongoDB needs to scan. Without indexes, MongoDB performs a collection scan, which can be slow for large collections. By creating indexes, you can significantly speed up query performance.

5.1 Creating Indexes

You can create indexes on fields that are frequently used in queries. MongoDB supports single-field indexes, compound indexes, text indexes, and more.

5.1.1 Example: Creating a Single-Field Index


// Create an index on the 'name' field
db.users.createIndex({ "name": 1 });

This example creates an ascending index on the name field of the users collection.

5.2 Viewing Indexes

You can view the indexes on a collection using the getIndexes() method.

5.2.1 Example: Viewing Indexes


db.users.getIndexes();

This command returns a list of all indexes on the users collection.

5.3 Dropping Indexes

If an index is no longer needed, you can remove it using the dropIndex() method.

5.3.1 Example: Dropping an Index


// Drop the index on the 'name' field
db.users.dropIndex("name_1");

This example removes the index on the name field from the users collection.

6. Aggregation in MongoDB

The aggregation framework in MongoDB is a powerful tool for performing data analysis and transformations. It allows you to process data and return computed results using a pipeline of stages. Each stage transforms the data, and the output of one stage is passed to the next.

6.1 Aggregation Pipeline

The aggregation pipeline consists of multiple stages, such as $match, $group, $sort, and $project. Each stage performs a specific operation on the data.

6.1.1 Example: Basic Aggregation Pipeline


db.sales.aggregate([
  { $match: { "status": "A" } },
  { $group: { _id: "$item", totalSales: { $sum: "$amount" } } },
  { $sort: { totalSales: -1 } }
]);

In this example, the pipeline filters documents with a status of "A", groups them by the item field, calculates the total sales for each item, and then sorts the results in descending order of total sales.

6.2 Common Aggregation Operators

$match: Filters documents to pass only those that match the specified condition.
$group: Groups documents by a specified key and can perform aggregations like $sum, $avg, $max, etc.
$sort: Sorts documents based on a specified field in ascending or descending order.
$project: Reshapes documents by including, excluding, or computing new fields.

7. Replication in MongoDB

Replication in MongoDB provides redundancy and increases data availability. MongoDB achieves replication by using replica sets, which are groups of MongoDB instances that maintain the same data set. Replica sets ensure that data is replicated across multiple servers, providing fault tolerance and high availability.

7.1 Replica Sets

A replica set typically consists of a primary node and multiple secondary nodes:

Primary: The node that receives all write operations. There is only one primary node in a replica set at any time.
Secondary: Nodes that replicate the primary's data set. Secondary nodes can serve read operations and can take over as primary if the current primary fails.
Arbiter: A node that participates in elections for primary but does not hold data.

7.2 Setting Up a Replica Set

To set up a replica set, you need to start multiple MongoDB instances with the --replSet option and then initiate the replica set configuration.

7.2.1 Example: Initiating a Replica Set


mongod --replSet "rs0" --port 27017 --dbpath /data/db1
mongod --replSet "rs0" --port 27018 --dbpath /data/db2
mongod --replSet "rs0" --port 27019 --dbpath /data/db3

After starting the instances, connect to one of them using the MongoDB shell and run the following command to initiate the replica set:


rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "localhost:27017" },
    { _id: 1, host: "localhost:27018" },
    { _id: 2, host: "localhost:27019" }
  ]
});

This command configures the replica set with three members.

8. Sharding in MongoDB

Sharding is MongoDB's method for scaling horizontally by distributing data across multiple servers. As data grows, sharding allows MongoDB to distribute the data across multiple machines, ensuring that the system can handle increased loads and large data sets efficiently.

8.1 Shard Keys

Sharding in MongoDB requires a shard key, which is a field or combination of fields that determines how data is distributed across shards. Choosing the right shard key is crucial for maintaining a balanced and efficient sharded cluster.

8.2 Setting Up a Sharded Cluster

To set up a sharded cluster, you need to configure the following components:

Shards: These are the data-bearing servers, typically replica sets, that store the data.
Config Servers: Store metadata and configuration settings for the cluster.
Mongos: Acts as a query router, directing client requests to the appropriate shard.

8.2.1 Example: Enabling Sharding on a Collection


// Connect to the mongos instance
use admin;

// Enable sharding on the database
sh.enableSharding("mydatabase");

// Shard a collection using the specified shard key
sh.shardCollection("mydatabase.mycollection", { "userid": 1 });

In this example, sharding is enabled on the mydatabase database, and the mycollection collection is sharded using the userid field as the shard key.