DynamoDB
noSQL key-value database
Overview
fully managed noSQL (no schema) key-value DB
fast performance with seamless scalability (scale up or scale down without downtime)
Search data
There are 2 ways to search data
Query: must specify primary key, optional for sort key.
Scan: scan ENTIRE table, return all attributes. + (optional) filter expression
Scan result is divided into pages. 1 page is <=1MB in size.
Table classes
Standard
Standard-IA
Lower cost for IA data
Application logs
E-commerce order history
Old social media post
Past gaming achievement
Benefit
Serverless
That means no server to provision, patch, or manage. No software to install, maintain or operate.
Capacity modes
Provisioned: be able to set read/write capacity
On-demand: for less predictable workload, pay for what consume.
Auto-scaling
Auto increase / scale down:
Throughput
Storage
Use cases
Integrate with AWS Lambda and act as a DB server.
Features
Push button scaling /Auto scaling
Import/Export to S3
zero-ETL -> OpenSearch
Integration with OpenSearch Service. Use DynamoDB data as a source in Amazon OpenSearch Ingestion to automatically replicate your table data to OpenSearch Service indexes.
DynamoDB - DAX
Normal DynamoDB is low latency (in milisecond), but DAX can provide microsecond latency.
in-memory cache, specially designed for DynamoDB.
can x10 read performance. Support milion requests / sec.
Use case:
Improve performance of READ-heavy of bursty workloads. If you want to improve WRITE performance, using SQS in front of DynamoDB.
DynamoDB - Stream
Capture item-level changes (PutItem
, UpdateItem
, or DeleteItem
) in your table (it stores this information in a log for 24 hours), and push the changes to a DynamoDB stream. In plain English, if your data is modified, DynamoDB will notify. You can then process the stream by using Lambda function.
How it work?
Associate the stream's ARN to a Lambda function.
Lambda polls the stream and invoke the function synchronously when it detects new stream records.
An actual modification must be made to an item for it to be considered an event. If you send an UPDATE request that does not change anything, DynamoDB simply ignores it
Use case:
An app in 1 region modify the data in DynamoDB table, another app in another region will read it, to update another table, or create statistic about those data.
An app send notification for all the users as soon as a new item added to the table.
StreamViewType
When an item in the table is modified, StreamViewType
determines what information is written to the stream for this table. If you do not want to expose any PII to the stream, you can use KEYS_ONLY
KEYS_ONLY
: only the key attributes (PartitionKey + SortKey) of the modified items are captured.NEW_IMAGE
: capture new value of itemOLD_IMAGES
: capture old value of itemNEW_AND_OLD_IMAGES
: capture both old & new value.
DynamoDB - Global tables
Multi-region, multi-master solution.
Table: collection of data in a particular topic.
Item (row): collection of attribute.
Attribute (column):
Need to enable DynamoDB Streaming first.
Point-in-time Recovery
PITR will protect from accidental write or delete operations.
Recover any time up to second
35 days no downtime
ACID transaction
DynamoDB Transactions enables reading and writing of multiple items across multiple tables as an all or nothing operation. It checks for a pre-requisite condition before writing to a table.
DataPlan API
PutItem
: put a single itemBatchWriteItem
: write up to 25 itemsGetItem
: get a single itemBatchGetItem
: get upto 100 items from 1 or more tables.UpdateItem
: update one ore more attributes in a itemDeleteItem
: delete a single item.
LSI vs GSI
LSI | GSI | |
---|---|---|
Scope | Same partition key from the base table | Different partition key from the base table |
Querying | Can only query within the same partition | Can query across the entire table |
Creation time | Must be created at the same time as the table | Can be created or modified after the table is created |
Throughput | Shares throughput with the base table | Seperate throughput with base table |
Locking mechanisms
Databases employ locking mechanisms to ensure that data is always updated to the latest version and is concurrent. There are multiple types of locking strategies that benefit different use cases. Some of these are:
Optimistic Locking: each item has an attribute that acts as a version number. If you retrieve an item from a table, the application records the version number of that item. You can update the item, but only if the version number on the server side has not changed.
Pessimistic Locking: an entity is locked in the database for the entire time
Overly Optimistic Locking: is used for systems that have only one user or operation performing changes at a single time.
Read data
GetItem
Query
Scan
Security
ACID transaction: native, server-side support for transactions
Encryption at Rest using KMS
Pricing
DynamoDB charges for reading, writing, and storing data in your DynamoDB tables, along with any optional features you choose to turn on.
Best practices
Avoid using scan operation on large table or index, use Filter
--filter-expression
and Projection--projection-expression
to get specific data instead.
Turn on
ConsistentRead
if you want strongly consistent read. Because thePutItem
orUpdateItem
might not reflect to your replicas.Cache for popular items. Use DAX for caching reads.
Concepts
Table, item, attribute: are core components of DynamoDB.
Table: collection of item.
Item: collection of attribute.
Primary key: unique identifier each item in table.
GSI (Global Secondary Index) uses a different partition key as well as a different sort key to speed up queries on non-key attributes. All reads from GSIs and streams are eventually consistent.
Partition key: mandatary. -> Hash function -> Hash key.
Sort key (optional): additional for querying data.
LSI (Local Secondary Index)
The same partition key as the base table.
Both tables and LSIs provide two read consistency options: eventually consistent(default) and strongly consistent reads.
WCU:
1 api write data to your table = 1 write request
For 1 item upto 1KB in size
1 WCU = 1 standard write
1 WCU = 0.5 transactional write. Or 1 transational write require 2 WCUs.
RCU:
1 api call to your data is a read request (strongly consistent, eventually consistent, or transactional).
For item upto 4KB
1 RCU = 1 strongly consistent read request / sec
1 RCU = 2 eventually consistent read request / sec
1 RCU = 0.5 transactional read request / sec
1 RCU = 4KB/sec. 1WCU = 1KB/sec -> in one second, you can read 4KB but write only 1KB.
PartiQL: SQL-compatible query language that makes it easier to interact with data in AWS services like Amazon DynamoDB, S3 Select, and Glacier Select.
Composite key = Partition key + Sort key.
Throttled: occur when the configure RCU or WCU exceeded.
ProvisionedThroughputExceededException
. Reasons for this exception are:request rate > provision throughput
wrong choice of partition key -> uneven distribution of data
frequent access of the same key in a partition -> hot key, if your access pattern exceed 3000 RCU, and 1000 WCU, regardless of the capacity (provisioned or on-demand)
The AWS SDKs for DynamoDB automatically retry requests that receive this exception. Unless your retry queue is too large to finish -> your request is eventually successful.
Trivia
DAX is an in-memory acceleration service that accelerates DynamoDB tables. DAX cannot be used with other databases.
DynamoDB can support tables of virtually any size.
DynamoDB can scale to > 10 trillion requests / day with > 20 milion request/sec.
1 single DynamoDB scan can retrieve max 1MB.
The maximum size of an item in Dynamo table is 400 KB.
the
LIMIT
parameter in query string is not the number of matching items. It is the maximum number of items to evaluate. 😄Each table can have up to 20 GSI and 5 LSI (default quota).
You can add Replica only when the table is empty. So do it before inserting any data.
Last updated