Glue
serverless ETL service
Overview
A fully managed ETL (Extract, Transform, Load) service that can extract data from various sources, transform it into the required format, and load it into a target data store.
Use cases
Prepare & transform data for analytic.
Features
Glue - Convert data into Parquet format
Glue - Data Crawler: Catalog a dataset
Glue - Job bookmark
Prevent processing old data -> Glue can resume a job from where it left off.
Glue - Studio
GUI for create, run and monitor ETL jobs.
Glue - DataBrew
Clean & normalize data using pre-built transformation.
Glue - Streaming ETL
Built on Apache Spark Structured Streaming
Compatible with Kinesis Data Streaming, Kafka, MSK (managed Kafka)
Best practices
Trivia
Data transformation = AWS Glue.
Concepts
Parquet format: is a columnar storage file format optimized for use with big data processing frameworks.
Last updated