S3
Last updated
Last updated
Support LIST/GET/PUT/COPY/POST/DELETE
S3 can host static websites and have them accessible on the Internet.
The website endpoint will be: http://bucket-name.s3-website-aws-region.amazonaws.com
none SSL
included bucket-name & region-name in the endpoint link.
Helps you recover accidental, such as: overwrite & delete.
Can be enabled at bucket level.
Turn ON versioning is a best practice.
When not enable = Suspend
Before enabling versioning, all the Version ID
of objects are NULL
After enabling Bucket Versioning, you might need to update your lifecycle rules to manage previous versions of objects -> Only apply to newly create objects.
After suspend Bucket Versioning
Lifecycle rules set for previous object versions will still apply.
Existing objects in your bucket do not change.
Newly added objects with the same name as an existing object replaces the existing object.
Used to analyze storage access patterns to help you decide when to transition the right data to the right storage class.
MUST enable Versioning in source & destination buckets.
Copy is ASYNC
Need to enable CORS headers if client does cross-origin request.
to receive notifications when an event happen in your S3 bucket
Provide temporary access URL to your private bucket.
Url expiration varies by the way you generate the URLs
S3 console: 1 min ~ 12 hours
AWS CLI: 3600sec ~ 168 hours
Use cases: allow only logged-in users to see you premium videos.
Object versioning must be enabled
.
Block an object version deletion for a specified amount of time.
Retention mode
Compliance (strick mode): can't be overwritten or deleted by any user, even root user
Governance (softer): most user can't overwrite or delete an object.
refer Vault Lock.
A fully managed S3 storage analytics solution that provides a comprehensive view of
object storage usage
activity trends
recommendations to optimize costs.
Storage Lens allows you to analyze object access patterns across all of your S3 buckets and generate detailed metrics and reports.
Allows you to add your own code to S3 GET requests to modify and process data as it's being returned to an application.
Use cases: data needs to be transformed on-the-fly, redact the PII from the data.
Below is the pricing order. The first one is the most expensive one.
Standard
Standard Infrequent Access
lower cost than Standard
use cases: disaster recovery, backups
Intelligent-Tiering: automatically move
your data to infrequent access tier S3 Standard-IA
One Zone IA
11 9's in single AZ, but data lost when AZ is detroyed.
use cases: storing backup data of your on-premises, or data that can recreated.
Glacier: low cost storage used for achiving/backup
Glacier Instant Retrieval
Glacier Flexible Retrieval
Glacier Deep Archive: long term storage
Help optimize S3 storage cost
Ex: transit all objects of a bucket from Standard class -> Standard-IA after 6 months uploading.
Help to decide when to transit to the right class
Recommendation for Standard
& Standard-IA
. Not work for One-Zone-IA or Glacier.
S3 scales per prefix
. Request per second as below:
3,500 PUT/COPY/POST/DELETE
5,500 GET/HEAD
Latency between 100-200ms
Support FOLDER concept to group
objects.
Use as many prefixes as posible to achieve the required throughput and disired performance
Object may be replicated accross AZs, but within a single region. S3-IA object can be in 1 AZ.
A bucket-level feature.
Use S3 Transfer Acceleration to enable fast, easy and secure transfer of files over LONG distance. It will transfer files to an AWS Edge location
of target S3 bucket.
-> Speed up 50-500%
Use cases:
Need to collect data from various locations.
Using Server-side filtering (simple SQL) for better performance & less transfer, CPU cost at client. You can perform S3 Select to query only the necessary data inside the CSV files based on the bucket's name and the object's key.
For more complex queries, consider using Athena.
Supports CSV, JSON, and Parquet.
Use cases: retrieve only a subset of data, best for simple SQL (no JOIN, no function, no array...)
Perform batch operations on existing S3 objects.
S3 Inventory provides a report of your S3 objects and their corresponding metadata on a daily or weekly basis for a specified S3 bucket or a shared prefix.
These reports include the type of server-side encryption each object is using, along with its replication status.
Configurable to include all or specific object versions.
Can report on various metadata fields such as size, last modified date, storage class, and encryption status.
Supports output in CSV, ORC, and Parquet formats, enabling straightforward integration with analytics tools.
User-based: IAM policies
Resource-based:
Bucket policy: ex: allow cross-account access
Object ACL (can be disable)
Bucket ACL (can be disable)
Encryption data at rest: SSE-S3
, SSE-KMS
, SSE-C (Customer provided key)
Encryption data in transit: TLS
Access to the most recent
data immediately after a write (create or overwrite)
To avoid accidental deletion in S3 bucket:
Enable versioning
Enable MFA delete
Client-side
you
you
None
SSE-C
you
S3
None
SSE-KMS
S3 & KMS
S3
Rotation control
Role seperation
By providing S3 object key and the encryption key, you can use GetObject API to download encrypted object.
Amazon S3 now applies server-side encryption with Amazon S3 managed keys (SSE-S3) as the base level of encryption for every bucket in Amazon S3. You only need to use the x-amz-server-side-encryption
header if you want to override the default SSE-S3 encryption and use a different encryption option like SSE-KMS or SSE-C
Note
Server-side encryption encrypts only the object data, not the object metadata.
SSE-KMS supports symmetric keys, not asymmetric keys.
KMS have limitation of request per second
When download, it call Decrypt KMS API
When upload, it calls GenerateDataKey
KMS API
KMS quota different between region: 5500, 10000, 30000
increase the quota by making request at Service Quota console
Encrypt your data either on the client-side
Turn on versioning
Consider using multipart uploads (divide in parts & parallel uploads) for
object that are over 100MB.
if > 5GB, multi-part upload
is a MUST.
Bucket name is GLOBALY
unique. But bucket are defined at REGIONAL level.
Maximum of an object is 5TB.
Multi-part upload is recommended for file > 100MB. If a file larger than 5GB, multi-part upload is a must.
S3 provides no API that can search for objects based on object metadata.
Naming convention
No UPPERCASE, no underscore
3-63 chars long
Not an IP
Must start with lowercase letter or number
Must NOT start with prefix xn--
Must NOT end with suffix -s3alias
Using Athena to query data in S3 by standard SQL.
To achieve more requests per second, increase prefixes in your bucket.
"aws:SecureTransport":"false"
is a condition and Deny
effect to a bucket policy to force the request using SSL or TLS -> force using HTTPS.
When an application running on the EC2 instance makes a call to the S3 ListObjects
API, the request will be allowed by IAM if the attached role has the s3:ListBucket
permission for the relevant bucket.
Key: FULL path of an object
ex: s3://mybucket/myfolder/subfolder/myfile.txt
Bucket policy: JSON-based access policy that determines who can access your bucket & what operations they can perform.
Hot storage: frequently accessed data
Warm storage: less frequently accessed data
Cold storage: rarely accessed data
In requester pay pattern, the owner still in charge for storage cost.
Can use S3 Inventory
to get the list of object. Then use S3 Select to filter objects.