- RD BMS (SQL/ OLTP)
- RDS, Aurora – great for joins
- NoSQL databases
- DynamoDB (JSON)
- ElasticCache (Key/value pairs)
- Neptune (graphs)
- DocumentDB (for MongoDB)
- Key spaces (for Apache Cassandra)
- Object Store:
- S3 – for big objects / blob storage
- Glacier (for back-ups / archives)
- Data Warehouse SQL analytics / BI
- Redshift (OLAP)
- Athena
- EMR
- Search
- OpenSearch(JSON) – free text, unstructured searches
- Graphs:
- Amazon Neptune – displays relationships between data.
- Ledger:
- Amazon Quantum Ledger Database
- Time series:
Some criteria to focus
- read-heavy, write-heavy or balanced workload ?
- Throughput needs?
- Will it change ?, does it need to scale or fluctuate during the day ?
- How much data to store and for how long ?
- Will it grow ?
- Average object size ?
- How are they accessed ?
- Data durability ?
- Source of truth for the data ?
- Latency requirements ?
- Data Model ?
- How will you query the data ?
- Joins ?
- Structured ?
- Semi-structured?
- Strong schema ?
- More flexibility ?
- Reporting ?
- Search?
- RDBMS / NoSQL ?
- License costs ?
- Switch to cloud native DB such as Aurora ?
RDS
- Managed PostgreSQL
- MySQL
- Oracle
- SQL server
- MariaDB
- Custom
- Provisioned RDS Instance Size and EBS Volume Type & Size
- Auto-scaling capability for Storage.
- Support for Read-Replicas and
- stand by Multi-AZ just for failover useful for Disaster recovery scenarios.
- Security through IAM, Security Groups, KMS, SSL in transit.
- Automated back up with point in time restore feature (up to 35 days)
- Manual DB Snapshot for longer-term recovery
- Managed and scheduled maintenance (with downtime)
- Support for IAM Authentication, integration with Secrets Manager.
- RDS custom for access to and customize the underlying instance
Use case
- store relational datasets (RD BMS / OLTP)
- perform SQL queries, transactions.
Aurora
- Compatible API for PostgreSQL / MySQL.
- Separation of storage and compute.
- Storage
- data is stored in 6 replicas., across 3 AZ.
- Highly available, self-healing, auto-scaling.
- Compute:
- Custom endpoints for writer and reader DB instances.
- Same security / monitoring / maintenance features as RDS.
- Know the back-up & restore options for aurora.
- Aurora Serverless
- for unpredictable / intermittent workloads, no capacity planning.
- Aurora Multi-Master
- for continuouss writers failover (high write availability)
- Aurora Global
- up to 16 Db Read Instances in each Region, <1 second storage replication
- Aurora Machine Learning
- perform ML using SageMaker & comprehend on aurora.
- Aurora Database Cloning
- new cluster from existing one, faster than restoring snapshot.
ElastiCache
- Managed Redis / Memcached
- In-memory data store, sub-milisecond latency
- select an ElastiCache instance type (e.g., cache.m6g.large)
- Support for
- Clustering (Redis)
- Multi-AZ
- Read Replicas (Sharding)
- Sec through IAM, security Groups, KMS, Redis Auth
- Back-up / Snapshot
- Requires some application code changes to be leveraged.
Use case
- Key-value store.
- Frequent reads.
- Less writes.
- Cache results for DB queries.
- Store session data for websites
- cannot use SQL.
DynamoDB
- AWS proprietary tech
- managed serverless NoSQL database, millisecond latency.
- Capacity Modes:
- provisioned capacity with optional auto-scaling.
- On-demand capacity.
- Can replace ElasticCache as Key/Value store (storing session data for example using TTL feature)
- Resilience.
- Highly Available.
- Multi-AZ by default.
- Read & Writes are decoupled
- Transaction capability.
- DAX cluster for read cache.
- Microsecond read latency.
- Sec, authentication, and authorization is done through IAM.
- Event Processing:
- DynamoDB Streams to integrate with:
- AWS Lambda
- Kinesis Data Streams
- Global Table feature.
- Read & writes from any region.
- Back-ups:
- Automated back-ups up to 35 days with PITR(restore to new table).
- On-demand back-ups.
- Import / Export directly to S3:
- Exports don't use RCU within the PITR window.
- Imports don't use WCU.
Use cases
- Rapidly evolve schemas.
- Serverless applications development (small documents, 100s KBS).
- Distributed serverless cache.
S3
- key / value store for objects
- Great for blob storage, bigger objects.
- Architecture:
- Serverless.
- Scale infinitely.
- Max object size is 5TB.
- Versioning capability.
- Tiers + Lifecycle policies:
- S3 Standard.
- S3 Infrequent Access.
- S3 Intelligent.
- S3 Glacier Flexible Retrieval.
- S3 Glacier Instant Retrieval.
- S3 Glacier Deep Archive.
- Features:
- Versioning.
- Replication:
- Batch operations:
- S3 Batch: Batch operations.
- S3 Inventory : List files,
- Encryption:
- SSE-KMS.
- SSE-S3 (default).
- SSE-C Client Side Encryption.
- Client-side.
- TLS in transit.
- Replication.
- MFA-Delete.
- Access Logs
- Performance:
- Multi-part upload:
- parallel chunks upload.
- For files, > 5 GB.
- S3 Transfer Acceleration:
- to reduce latency for long-distance transfers of large objects.
- S3 Select
Use Cases
- static files.
- Key value store for big files.
- Website hosting.
Document DB
- supports MongoDB
- Fully Managed
- highly available with replication across 3-AZ
- auto grows in increments of 10 GBS up to 64 TB.
Neptune
- Fully managed graph database
- a popular graph dataset would be social network:
- Users have friends.
- Posts have comments.
- Comments have likes from users.
- Users hared and like posts.
- Highly available across 3-AZ.
- With up to 15 read replicas.
- Highly connected datasets.
- Can store up to billions of relations.
- Great for knowledge graphs
- (e.g. Wikipedia).
- Fraud detection.
- Recommendation engines.
- Social networking.
Keyspaces for apache Cassandra
- Fully Managed.
- Serverless.
- Tables are replicated 3 times across multiple AZ.
- tables scales automatically.
- Up / down based on the app traffic.
- Single digit ms latency
- 1000s of requests per second.
- Use Cassandra Query Language (CQL).
- Capacity Modes:
- provisioned capacity with optional auto-scaling.
- On-demand capacity.
- Features:
- Encryption
- back-up
- Point-In-Time Recovery (PITR) up to 35 days.
Use Cases
- store IoT Data.
- Time series Data.
Quantum Ledger Database
- recording finical transactions.
- A ledger is a book recording financial transactions.
- Used to review history of all the changes made to your application data over time.
- Fully managed
- serverless
- high available
- replications across 3 AZ
- immutable
- cryptographic signature
- review history of transactions
- better performance than common ledger blockchain framings.
- Use SQL to gather data.
- 2-3x better performance than common ledger blockchain frameworks.
- No decentralization – central database.