DynamoDB

noSQL key-value database

Overview

  • fully managed noSQL (no schema) key-value DB

  • fast performance with seamless scalability (scale up or scale down without downtime)

Search data

There are 2 ways to search data

  • Query: must specify primary key, optional for sort key.

  • Scan: scan ENTIRE table, return all attributes. + (optional) filter expression

    • Scan result is divided into pages. 1 page is <=1MB in size.

Table classes

Standard

Standard-IA

Lower cost for IA data

  • Application logs

  • E-commerce order history

  • Old social media post

  • Past gaming achievement

Benefit

Serverless

That means no server to provision, patch, or manage. No software to install, maintain or operate.

Capacity modes

  • Provisioned: be able to set read/write capacity

  • On-demand: for less predictable workload, pay for what consume.

Auto-scaling

Auto increase / scale down:

  • Throughput

  • Storage

Use cases

Integrate with AWS Lambda and act as a DB server.

Features

Push button scaling /Auto scaling

Import/Export to S3

zero-ETL -> OpenSearch

Integration with OpenSearch Service. Use DynamoDB data as a source in Amazon OpenSearch Ingestion to automatically replicate your table data to OpenSearch Service indexes.

DynamoDB - DAX

  • Normal DynamoDB is low latency (in milisecond), but DAX can provide microsecond latency.

  • in-memory cache, specially designed for DynamoDB.

  • can x10 read performance. Support milion requests / sec.

  • Use case:

    • Improve performance of READ-heavy of bursty workloads. If you want to improve WRITE performance, using SQS in front of DynamoDB.

DynamoDB - Stream

Capture item-level changes (PutItem, UpdateItem, or DeleteItem) in your table (it stores this information in a log for 24 hours), and push the changes to a DynamoDB stream. In plain English, if your data is modified, DynamoDB will notify. You can then process the stream by using Lambda function.

How it work?

  • Associate the stream's ARN to a Lambda function.

  • Lambda polls the stream and invoke the function synchronously when it detects new stream records.

An actual modification must be made to an item for it to be considered an event. If you send an UPDATE request that does not change anything, DynamoDB simply ignores it

Use case:

  • An app in 1 region modify the data in DynamoDB table, another app in another region will read it, to update another table, or create statistic about those data.

  • An app send notification for all the users as soon as a new item added to the table.

StreamViewType

When an item in the table is modified, StreamViewType determines what information is written to the stream for this table. If you do not want to expose any PII to the stream, you can use KEYS_ONLY

  • KEYS_ONLY: only the key attributes (PartitionKey + SortKey) of the modified items are captured.

  • NEW_IMAGE: capture new value of item

  • OLD_IMAGES: capture old value of item

  • NEW_AND_OLD_IMAGES: capture both old & new value.

DynamoDB - Global tables

  • Multi-region, multi-master solution.

  • Table: collection of data in a particular topic.

    • Item (row): collection of attribute.

    • Attribute (column):

  • Need to enable DynamoDB Streaming first.

Point-in-time Recovery

PITR will protect from accidental write or delete operations.

  • Recover any time up to second

  • 35 days no downtime

ACID transaction

DynamoDB Transactions enables reading and writing of multiple items across multiple tables as an all or nothing operation. It checks for a pre-requisite condition before writing to a table.

DataPlan API

  • PutItem: put a single item

  • BatchWriteItem: write up to 25 items

  • GetItem: get a single item

  • BatchGetItem: get upto 100 items from 1 or more tables.

  • UpdateItem: update one ore more attributes in a item

  • DeleteItem: delete a single item.

LSI vs GSI

LSIGSI

Scope

Same partition key from the base table

Different partition key from the base table

Querying

Can only query within the same partition

Can query across the entire table

Creation time

Must be created at the same time as the table

Can be created or modified after the table is created

Throughput

Shares throughput with the base table

Seperate throughput with base table

Locking mechanisms

Databases employ locking mechanisms to ensure that data is always updated to the latest version and is concurrent. There are multiple types of locking strategies that benefit different use cases. Some of these are:

  • Optimistic Locking: each item has an attribute that acts as a version number. If you retrieve an item from a table, the application records the version number of that item. You can update the item, but only if the version number on the server side has not changed.

  • Pessimistic Locking: an entity is locked in the database for the entire time

  • Overly Optimistic Locking: is used for systems that have only one user or operation performing changes at a single time.

Read data

  • GetItem

aws dynamodb get-item \
    --table-name ProductCatalog \
    --key '{"Id":{"N":"1"}}' \
    --projection-expression "Description, RelatedItems[0], ProductReviews.FiveStar"
  • Query

  • Scan

Security

  • ACID transaction: native, server-side support for transactions

  • Encryption at Rest using KMS

Pricing

DynamoDB charges for reading, writing, and storing data in your DynamoDB tables, along with any optional features you choose to turn on.

Best practices

  • Avoid using scan operation on large table or index, use Filter --filter-expression and Projection --projection-expression to get specific data instead.

  • Turn on ConsistentReadif you want strongly consistent read. Because the PutItem or UpdateItem might not reflect to your replicas.

  • Cache for popular items. Use DAX for caching reads.

Concepts

  • Table, item, attribute: are core components of DynamoDB.

    • Table: collection of item.

    • Item: collection of attribute.

  • Primary key: unique identifier each item in table.

  • GSI (Global Secondary Index) uses a different partition key as well as a different sort key to speed up queries on non-key attributes. All reads from GSIs and streams are eventually consistent.

    • Partition key: mandatary. -> Hash function -> Hash key.

    • Sort key (optional): additional for querying data.

  • LSI (Local Secondary Index)

    • The same partition key as the base table.

    • Both tables and LSIs provide two read consistency options: eventually consistent(default) and strongly consistent reads.

  • WCU:

    • 1 api write data to your table = 1 write request

    • For 1 item upto 1KB in size

      • 1 WCU = 1 standard write

      • 1 WCU = 0.5 transactional write. Or 1 transational write require 2 WCUs.

  • RCU:

    • 1 api call to your data is a read request (strongly consistent, eventually consistent, or transactional).

    • For item upto 4KB

      • 1 RCU = 1 strongly consistent read request / sec

      • 1 RCU = 2 eventually consistent read request / sec

      • 1 RCU = 0.5 transactional read request / sec

    • 1 RCU = 4KB/sec. 1WCU = 1KB/sec -> in one second, you can read 4KB but write only 1KB.

  • PartiQL: SQL-compatible query language that makes it easier to interact with data in AWS services like Amazon DynamoDB, S3 Select, and Glacier Select.

  • Composite key = Partition key + Sort key.

  • Throttled: occur when the configure RCU or WCU exceeded. ProvisionedThroughputExceededException. Reasons for this exception are:

    • request rate > provision throughput

    • wrong choice of partition key -> uneven distribution of data

    • frequent access of the same key in a partition -> hot key, if your access pattern exceed 3000 RCU, and 1000 WCU, regardless of the capacity (provisioned or on-demand)

The AWS SDKs for DynamoDB automatically retry requests that receive this exception. Unless your retry queue is too large to finish -> your request is eventually successful.

Trivia

  • DAX is an in-memory acceleration service that accelerates DynamoDB tables. DAX cannot be used with other databases.

  • DynamoDB can support tables of virtually any size.

  • DynamoDB can scale to > 10 trillion requests / day with > 20 milion request/sec.

  • 1 single DynamoDB scan can retrieve max 1MB.

  • The maximum size of an item in Dynamo table is 400 KB.

  • the LIMIT parameter in query string is not the number of matching items. It is the maximum number of items to evaluate. 😄

  • Each table can have up to 20 GSI and 5 LSI (default quota).

  • You can add Replica only when the table is empty. So do it before inserting any data.

Last updated