Data Engineer
Last updated
Last updated
Variety: can store many types of data structured or unstructured, semi-structured, image, video...
RDS
S3
Glue
Comprehend
Volume
Velocity (vận tốc)
Kinesis
Lambda
Veracity/Validity
Value
Latest state of data
Latest state of historical data
Normalization & 3rd normal
Normalization can cause lowness
Optimize for point queries
Query latency matter
Latency not as important
Optimize for GROUP BY
Common Table Expression (CTEs) can cause latency
Use CTEs instead of sub-queries
EMR
EMR
Lambda
Batch
Glue
Steps function
Redshift
full feature, distributed Hadoop environment
additional framework & hardware
Data cleaning
Enrichment
Movement
fully managed