Best Practices for Using a Custom MongoDB Environment with Engine Yard Cloud

If you are using a custom MongoDB environment in Engine Yard Cloud, consider these best practices.

Note: If you need help with a custom MongoDB environment, you can consult Engine Yard Professional Services for assistance.

General NoSQL best practices

Consider these factors when choosing the right NoSQL database for your application:

  • Read/write throughput
  • Durability
  • Consistency of data
  • Latency

For more information about these factors, see Visual Guide to NoSQL Systems. Choosing the right NoSQL database is important. Invest time in finding the right NoSQL database for your needs. If you decide that MongoDB is right for you, then continue to read this article.

Test exhaustively

Test within the context of your application and against traffic patterns that are representative of your production system. A test environment that does not resemble your production traffic prevents you from discovering performance bottlenecks and architectural design flaws. Examine your queries closely and always collect metrics.

Don’t assume that what worked for your RDBMS will translate

Whatever worked on your SQL database may not work on a NoSQL database. Make sure that your expectations are realistic and aligned with the features of the database.

  • Design your documents and queries according to what 10gen recommends.
  • Understand that your application might need to be re-architected to migrate to a non-relational data store. Read The cost of Migration for information on migrating to NoSQL.

Think about the consistency and durability needs of your data.

Think about your durability and consistency needs. We cannot emphasize this enough. MongoDB offers durability through replication. Do not run a standalone MongoDB for production use, make sure you understand why.

Understand what to expect from EBS volumes

The performance of Amazon’s Elastic Block Storage (EBS) can be inconsistent. Collect throughput metrics over time when benchmarking your application and plot your data.

MongoDB Best Practices

Always use replica sets

Replica sets provide high availability through automatic failover. If your primary node fails, a secondary node is elected as primary and your cluster will remain functional. We do not support a non-replicated MongoDB for production environments. Consider a hosted solution if the cost of replicating Mongo is too much. Engine Yard has established partnerships with MongoHQ and MongoLab. See the Partner program for more information regarding offerings for Engine Yard customers.

Keep current with versions

Please keep your version of MongoDB current. 10gen rolls out numerous fixes within each release that help your cluster run smoother. Version 2.0.x includes significant performance and concurrency improvements, index changes, bug fixes, a compaction command, and it even makes it easier to upsize your cluster. If you are still using 1.6.3 please be sure to upgrade as soon as possible.

Don’t run MongoDB on 32-bit systems

MongoDB has a ~2.5GB data limit on 32-bit systems. Its storage engine uses memory-mapped files for performance and they are tied to the available memory addressing. With Engine Yard Cloud, you should use a Large instance as your base installation. We only support production MongoDB on 64-bit instances.

Turn journaling on by default

MongoDB supports write-ahead journaling of operations to facilitate crash recovery and node durability. We strongly recommend that you turn on journaling by default.

Mind the location of your data files

Check your recipes to make sure that your MongoDB data files exist in a persistent volume (example: /data/mongodb). Using ephemeral drives is possible, but you should be extremely careful when deciding to do so because it influences your cluster architecture. We recommend using EBS for your MongoDB data.

Your working set should fit in memory

Being able to keep the working set (and indexes) in memory is an important factor in overall cluster performance. If you notice the number of page faults increasing, there is a very high chance that your working set is larger than your available RAM. You have two options when your data exceeds your available RAM: increasing the size of your MongoDB instance or sharding. We recommend increasing instance size first.

Scale up if your metrics show heavy use

If your instance shows a load over 65%, consider scaling up. Your load should be consistently below this threshold during normal operations. This also impacts recovery and vertical scaling scenarios. If you need to increase your instance size, AWS recommends the following upgrade path: Large, Extra Large, High Memory Quadruple Extra Large. We have also observed less latency on larger EBS volumes.

Be careful when sharding

Sharded installations require careful understanding of your application’s data access patterns. Please take the time to understand how MongoDB sharding works and if you really need it. Also remember that selecting a good shard key is important because it affects performance.

Configuration servers are critical to the health of your cluster. You must have three configuration servers in a sharded production environment. Never delete their data, make sure to back them up frequently, and refer to them, if you can, by name using an /etc/hosts file (this makes your cluster more resilient).

Configuration servers are light processes but they must also live on 64-bit instances. Do not put all three configuration servers in the same instance. You can schedule a consultation with Engine Yard Professional Services if you are considering a sharded installation.

Use Mongo MMS to graphically monitor your service

Try using Mongo MMS. 10gen is actively developing this product. It allows you to visually evaluate the health of your cluster.

Keep up with MongoDB resources

Keep informed because things change rapidly. Some MongoDB resources are:

Comments

  • Avatar
    Ian Bishop

    Hello,
    I found this an helpful article and good piece of information on MongoDB.I was having a confusion on MongoDB and RDMS, I found a clear explanation from MongoDB Online Training.

    Difference between MongoDB and RDMS:

    RDBMS is completely structured way of storing data. While the NoSQL is unstructured way of storing the data. And another main difference is that the amount of data stored mainly depends on the Physical memory of the system. While in the NoSQL you don’t have any such limits as you can scale the system horizontally.

    “Extremely large datasets are often event based transactions that occur in chronological order. Examples are weblogs, shopping transactions, manufacturing data from assembly line devices, scientific data collections, etc. These types of data accumulate in large numbers every second and can take a RDBMS with all of its overhead to its knees. But for OLTP processing, nothing beats the combination of data quality and performance of a well designed RDBMS.”

    NoSQL is a very broad term and typically is referred to as meaning “Not Only SQL.” The term is dropping out of favor in the non-RDBMS community. You’ll find that NoSQL database have few common characteristics. They can be roughly divided into a few categories:

    key/value stores

    Bigtable inspired databases (based on the Google Bigtable paper)
    Dynamo inspired databases
    distributed databases
    document databases
    This is a huge question, but it’s fairly well answered in this Survey of Distributed Databases.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk