Amazon Web Services (AWS) provides Amazon Elastic Block Store (Amazon EBS) for EC2 instance storage. EBS is the virtual hard drives and SSDs for your servers running in the cloud. Amazon EBS volumes are automatically replicated, and it is easy to take snapshots of volumes to back them up in a known state. The replication happens within an availability zone (AZ).
AWS EBS has lots of advantages like reliability, snapshotting, and resizing.
AWS provides four volume types. It provides two types of Hard Disk Drives (HDD), and two types of SSDs. Volumes differ in price and performance. An EC2 instance can have many volumes attached to it, just like a server can have many drives. A volume can only be attached to one EC2 instance at a time. If you wanted to share files between EC2 instances than you would use Amazon Elastic File System or S3.
There are many types of volumes. Different types have different performance characteristics. The trick is to pick the most cost-efficient for the workload of your service.
Magnetic volumes have the lowest performance for random access. However, they have the least cost per gigabyte. But, they have the highest access for throughput (500 MB/s) for sequential access. Magnetic volumes average 100 IOPS, but can burst to hundreds of IOPS.
IOPS are Input/output operations per second (pronounced eye-ops). IOPS are used to characterize storage devices.
Services like Kafka which writes to a transaction log in long streams, and databases which use log structured storageor an approximate of that using some sort of log structured merge tree (examples LevelDB, RocksDB, Cassandra) might do well with HDD EBS - Magnetic volumes. Application that might employ streaming, or less operations per second but larger writes could actually benefit from using HDDs throughput performance.
In general, magnetic volumes do best with sequential operations like:
Magnetic volumes can't be used as a boot volume.
There are two types of HDD - Magnetic Volumes:
General-purpose SSD (gp2) volumes are cost effective, and useful for many workloads. It is the minivan of EBS. Not sexy but works for a lot of applications, and is common.
Performance of gp2 is three IOPS per gigabyte provisioned, but capped at 10,000 IOPS. The sizes range from 1 GB to 16 TB. Databases that use some form of BTrees (MongoDB, MySQL, etc.) can benefit from using SSD. But gp2 would be more geared to a lower volume database or one that has peak load times but long periods at rest where IOPS credits can accumulate.
Under 1 TB these volumes burst to 3,000 IOPS for extended periods of time. For example, if you have a 250 GB volume you can expect a baseline of 750 IOPS. When those 750 IOPS are not used, they are accumulated as IOPS credits. Under heavy traffic, those IOPS credits will be used and this is how you can burst up to 3,000 IOPS. IOPS credits is like a savings account. You use this savings when you get hit hard by a user tornado. But as you are using it, the bank account is being withdrawn from.
Provisioned IOPS SSD volumes are for I/O-intensive workloads. These volumes are for random access I/O throughput. They are the most expensive Amazon EBS volume type per gigabyte. And, they provide the highest performance of random access of any Amazon EBS volume. With this volume type you can pick the number of IOPs (pay to play). The IOPs can be up to 20,000. These volumes are great for high-volume databases or just databases that need a constant level of performance. High volume databases that use some form of BTrees (MongoDB, MySQL, etc.) can benefit from using this SSD volume. The io1 IOPS can be ramped up.
Provisioned IOPS SSD volumes are more predictable (don't have to store up IOPS like gp2), and for application with higher performance needs like:
Overcoming the performance problems by using Provisioned IOPS is expensive.
Some companies have employed RAID-0 striping using a 4-way stripe and used EnhanceIO to effectively increased throughput by over 50% with no more additional expense.
RAID-0 can be employed to increase size constraints of EBS and to increase throughput.
AWS allows configuring EBS volumes into RAID. EBS volumes can be setup in RAID 0 (stripe multiple volumes together) configuration for even more throughput and size. RAID 0 volume capacity is the sum of the capacities of the disks mounted. RAID 0 does not provide added redundancy for disk failures. Striping distributes the data of files among all disks mounted. In general, this speeds read and write operations almost by a factor number of disks mounted into a RAID 0 set. Increased throughput (and size for EBS) is the big benefit of RAID 0.
#!/bin/bash exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1 echo BEGIN #Raid if mutliple devices are mapped if [ -e '/dev/xvdb' ] && [ -e '/dev/xvdc' ] ; then yum -y install mdadm umount /mnt/ # create the array and format the drive yes | mdadm --create /dev/md0 --level=0 --name=RAID0 --raid-devices=2 /dev/xvdb /dev/xvdc mkfs.ext4 -L RAID0 /dev/md0 mount LABEL=RAID0 /mnt # update fstab sed -i '/\/mnt/d' /etc/fstab echo "LABEL=RAID0 /mnt/ ext4 defaults,nofail,noatime,barrier=0 0 2" >> /etc/fstab mount -a fi
The best way to make an educated guess to pick the right EBS is to know your tool. If you are deploying Kafka or Cassandra or MongoDB then you must understand how to configure the tool and EBS to get the most bang for the buck. When in doubt, test.
You can make educated guesses about which EBS will fit your application or service the best. However using Amazon CloudWatch and watching the IOPs and IO throughput while load testing or watching production workloads could be the quickest way to to pick the best EBS volume type and get the most bang for your buck. This can also help you decide whether or not to use RAID 0, HDDs (st1 or sc1), provisioned IOPS SSD (io1), SSD general purpose (gp2) or not. There is no point in overpaying, and you do not want a laggy service or application that do not meet their SLAs.
Amazon has HDDs which have fewer IOPs but are much better at streaming data. There is a good chance the HDDs would be better at Cassandra and Kafka due to the way Cassandra uses its version of log structured storage and updates and the way that Kafka does distributed logs. You basically get twice the throughput when streaming data over HDDs then you get when using provisioned IOPs. Since most NoSQL solutions use some form of log structured storage, then most would actually benefit from mounting a second EBS volume that is magnetic. The exception would be MongoDB, but you could still mount an HDD for MongoDBs transaction log whilst leaving the BTree indexed document storage on provisioned IO.
Cloud News >