# DynamoDB

## Overview

* fully managed noSQL (no schema) key-value DB
* fast performance with seamless scalability (scale up or scale down without downtime)

![](https://2259236002-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fuh9xZDZ53qGqmMCM44PU%2Fuploads%2Fgit-blob-cf7ecb1d73ce01feed66127cf8c626fa6adf00b4%2Ffigure_20230331073356.png?alt=media)

##

### Search data

There are 2 ways to search data

* **Query**: must specify <mark style="background-color:blue;">primary key</mark>, optional for <mark style="background-color:blue;">sort key</mark>.
* **Scan**: scan <mark style="color:red;">ENTIRE</mark> table, return all attributes. + (optional) filter expression
  * Scan result is divided into pages. 1 page is <=1MB in size.

### Table classes

#### Standard

#### Standard-IA

Lower cost for IA data

* Application logs
* E-commerce order history
* Old social media post
* Past gaming achievement

### Benefit

#### Serverless

That means no server to provision, patch, or manage. No software to install, maintain or operate.

#### Capacity modes

* **Provisioned**: be able to set read/write capacity
* **On-demand**: for less predictable workload, pay for what consume.

#### Auto-scaling

Auto increase / scale down:

* Throughput
* Storage

### Use cases

Integrate with AWS Lambda and act as a DB server.

<figure><img src="https://2259236002-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fuh9xZDZ53qGqmMCM44PU%2Fuploads%2Fm0K9bPNrRZ17eZBOtlir%2Fimage.png?alt=media&#x26;token=efe5a625-8200-4c51-ad8e-1b274008f1d4" alt=""><figcaption><p>integrate with AWS Lambda</p></figcaption></figure>

## Features

### Push button scaling /Auto scaling

### Import/Export to S3

### zero-ETL -> OpenSearch

Integration with OpenSearch Service. Use DynamoDB data as a source in Amazon OpenSearch Ingestion to automatically replicate your table data to OpenSearch Service indexes.

### DynamoDB - DAX

* Normal DynamoDB is low latency (in milisecond), but DAX can provide microsecond latency.
* *<mark style="color:red;">in-memory cache</mark>*, *specially designed* for DynamoDB.&#x20;
* can x10 read performance. Support milion requests / sec.
* ***Use case***:&#x20;
  * Improve performance of <mark style="color:red;">READ-heavy</mark> of bursty workloads. If you want to improve <mark style="color:red;">WRITE</mark> performance, using SQS in front of DynamoDB.

### DynamoDB - Stream

Capture item-level changes (`PutItem`, `UpdateItem`, or `DeleteItem`) in your table (it stores this information in a log for 24 hours), and push the changes to a DynamoDB stream. In plain English, if your data is modified, DynamoDB will notify. You can then process the stream by using Lambda function.

***How it work?***

* Associate the stream's ARN to a Lambda function.
* Lambda polls the stream and invoke the function synchronously when it detects new stream records.

{% hint style="info" %}
An actual modification must be made to an item for it to be considered an event. If you send an UPDATE request that does not change anything, DynamoDB simply ignores it
{% endhint %}

***Use case***:

* An app in 1 region modify the data in DynamoDB table, another app in another region will read it, to update another table, or create statistic about those data.
* An app send notification for all the users as soon as a new item added to the table.

#### StreamViewType

When an item in the table is modified, `StreamViewType` determines what information is written to the stream for this table. If you do not want to expose any PII to the stream, you can use `KEYS_ONLY`

* `KEYS_ONLY`: only the key attributes (PartitionKey + SortKey) of the modified items are captured.
* `NEW_IMAGE`: capture new value of item
* `OLD_IMAGES`: capture old value of item
* `NEW_AND_OLD_IMAGES`: capture both old & new value.

### DynamoDB - Global tables

* **Multi-region**, **multi-master** solution.
* **Table**: collection of data in a particular topic.
  * Item (row): collection of attribute.
  * Attribute (column):
* <mark style="background-color:yellow;">Need to enable DynamoDB Streaming first.</mark>

![Global\_table](https://d1.awsstatic.com/product-marketing/DynamoDB/DynamoDB_Global-Tables-01.dad2508b80e8b7c544fe1a94a2abd3f770b789da.png)

### Point-in-time Recovery

PITR will protect from accidental write or delete operations.

* Recover any time up to second
* 35 days no downtime

### ACID transaction

DynamoDB Transactions enables reading and writing of multiple items across multiple tables as an all or nothing operation. It checks for a pre-requisite condition before writing to a table.

### DataPlan API

* `PutItem`: put a single item
* `BatchWriteItem`: write up to 25 items
* `GetItem`: get a single item
* `BatchGetItem`: get upto 100 items from 1 or more tables.
* `UpdateItem`: update one ore more attributes in a item
* `DeleteItem`: delete a single item.

### LSI vs GSI

|                   | LSI                                                                      | GSI                                                                           |
| ----------------- | ------------------------------------------------------------------------ | ----------------------------------------------------------------------------- |
| **Scope**         | *<mark style="color:red;">Same</mark>* partition key from the base table | *<mark style="color:red;">Different</mark>* partition key from the base table |
| **Querying**      | Can only query within the same partition                                 | Can query across the entire table                                             |
| **Creation time** | Must be created at the same time as the table                            | Can be created or modified after the table is created                         |
| **Throughput**    | Shares throughput with the base table                                    | Seperate throughput with base table                                           |

## Locking mechanisms

Databases employ locking mechanisms to ensure that data is always updated to the latest version and is concurrent. There are multiple types of locking strategies that benefit different use cases. Some of these are:

* ***Optimistic Locking***: each item has an attribute that acts as a *<mark style="color:red;">version number</mark>*. If you retrieve an item from a table, the application records the version number of that item. You can update the item, but only if the version number on the server side has not changed.
* ***Pessimistic Locking***: an entity is locked in the database for the entire time
* ***Overly Optimistic Locking***: is used for systems that have *<mark style="background-color:yellow;">**only one user**</mark>* or operation performing changes at a single time.

## Read data

* GetItem

```bash
aws dynamodb get-item \
    --table-name ProductCatalog \
    --key '{"Id":{"N":"1"}}' \
    --projection-expression "Description, RelatedItems[0], ProductReviews.FiveStar"
```

* Query
* Scan

## Security

* ACID transaction: native, server-side support for transactions
* Encryption at Rest using KMS

## Pricing

DynamoDB charges for reading, writing, and storing data in your DynamoDB tables, along with any optional features you choose to turn on.

## Best practices

* Avoid using [scan](https://mamawhocode.gitbook.io/aws/services/database/dynamodb/scan) operation on large table or index, use Filter `--filter-expression` and Projection `--projection-expression` to get specific data instead.

<img src="https://2259236002-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fuh9xZDZ53qGqmMCM44PU%2Fuploads%2Fa6VA2KLRFlR70tZUC5Ej%2Fimage.png?alt=media&#x26;token=a82f458c-bdd5-4852-ad42-117d3bb2730d" alt="" data-size="original">

* Turn on `ConsistentRead`if you want strongly consistent read. Because the `PutItem` or `UpdateItem` might not reflect to your replicas.
* Cache for popular items. Use DAX for caching reads.

## Concepts

* **Table, item, attribute:** are core components of DynamoDB.
  * *Table*: collection of *item*.
  * *Item*: collection of *attribute*.
* **Primary key**: unique identifier each item in table.
* GSI (***Global Secondary Index***) uses a different partition key as well as a different sort key to speed up queries on non-key attributes. All reads from GSIs and streams are eventually consistent.
  * **Partition key**: mandatary. -> Hash function -> Hash key.
  * **Sort key** (optional): additional for querying data.
* LSI (***Local Secondary Index***)
  * The same partition key as the base table.
  * Both tables and LSIs provide two read consistency options: *eventually consistent*(default) and *strongly consistent* reads.&#x20;
* ***WCU***:&#x20;
  * 1 api write data to your table = 1 write request
  * For 1 item upto 1KB in size
    * 1 WCU = 1 standard write
    * 1 WCU = 0.5 transactional write. Or 1 transational write require 2 WCUs.
* ***RCU***:&#x20;
  * 1 api call to your data is a read request (strongly consistent, eventually consistent, or transactional).&#x20;
  * For item upto 4KB
    * 1 RCU = 1 strongly consistent read request / sec
    * 1 RCU = 2 eventually consistent read request / sec
    * 1 RCU = 0.5 transactional read request / sec
  * 1 RCU = 4KB/sec. 1WCU = 1KB/sec -> in one second, you can read 4KB but write only 1KB.
* ***PartiQL***: SQL-compatible query language that makes it easier to interact with data in AWS services like Amazon DynamoDB, [S3 Select](https://mamawhocode.gitbook.io/aws/storage/s3#s3-select-and-glacier-select), and [Glacier Select](https://mamawhocode.gitbook.io/aws/storage/s3#s3-select-and-glacier-select).
* ***Composite key*** = Partition key + Sort key.
* ***Throttled***: occur when the configure RCU or WCU exceeded. `ProvisionedThroughputExceededException`. Reasons for this exception are:
  * request rate > provision throughput
  * wrong choice of partition key -> uneven distribution of data
  * frequent access of the same key in a partition -> <mark style="color:red;">hot key</mark>, if your access pattern exceed 3000 RCU, and 1000 WCU, regardless of the capacity (provisioned or on-demand)

{% hint style="info" %}
The AWS SDKs for DynamoDB <mark style="color:red;">automatically retry</mark> requests that receive this exception. Unless your retry queue is too large to finish -> your request is eventually successful.
{% endhint %}

## Trivia

* DAX is an in-memory acceleration service that **accelerates DynamoDB tables**. DAX cannot be used with other databases.
* DynamoDB can support tables of virtually *any size*.
* DynamoDB can scale to > 10 trillion requests / day with > 20 milion request/sec.
* 1 single DynamoDB scan can retrieve max 1MB.
* <mark style="color:red;">The maximum size of</mark> <mark style="color:red;"></mark>*<mark style="color:red;">an item</mark>* <mark style="color:red;"></mark><mark style="color:red;">in Dynamo table is 400 KB.</mark>
* the `LIMIT` parameter in query string is not the number of matching items. It is the maximum number of items to evaluate. :smile:
* Each table can have up to 20 GSI and 5 LSI (default quota).
* You can add Replica only when the table is empty. So do it before inserting any data.
