Glue

serverless ETL service

Overview

A fully managed ETL (Extract, Transform, Load) service that can extract data from various sources, transform it into the required format, and load it into a target data store.

Prevent processing old data -> Glue can resume a job from where it left off.

GUI for create, run and monitor ETL jobs.

Glue DataBrew is a visual data preparation tool that allows you to clean, transform, and enrich data without writing code.
Features:
- Data profiling: automatically analyze datasets to provide insights like missing values, outliers, and data distributions.
- Pre-built transformation: offer over 250 transformations, such as filtering, normalizing, and deduplication.
- Custom rules: allows you to define custom data quality rules to enforce specific standards (e.g., detecting PII or ensuring data completeness).
- Integration: work with S3, Redshift, and Glue ETL.
Use cases:
- Cleaning raw data from sources like S3 or databases.
- Detecting and handling PII (Personally Identifiable Information).
- Preparing data for machine learning or analytics in tools like Amazon Athena or Redshift.

Parquet format: is a columnar storage file format optimized for use with big data processing frameworks.

Last updated 3 months ago