Glue

serverless ETL service

Overview

  • A fully managed ETL (Extract, Transform, Load) service that can extract data from various sources, transform it into the required format, and load it into a target data store.

Use cases

  • Prepare & transform data for analytic.

Features

Glue - Convert data into Parquet format

Glue - Data Crawler: Catalog a dataset

Glue - Job bookmark

Prevent processing old data -> Glue can resume a job from where it left off.

Glue - Studio

GUI for create, run and monitor ETL jobs.

Glue - DataBrew

Clean & normalize data using pre-built transformation.

Glue - Streaming ETL

  • Built on Apache Spark Structured Streaming

  • Compatible with Kinesis Data Streaming, Kafka, MSK (managed Kafka)

Best practices

Trivia

  • Data transformation = AWS Glue.

Concepts

  • Parquet format: is a columnar storage file format optimized for use with big data processing frameworks.

Last updated