CtrlK

EMR

Elastic MapReduce: hosted Hadoop framework

Document | Note |

Overview

Helps create Hadoop cluster (big data). A cluster can be made of hundreds of EC2 instances.
Process vast amount of data using Apache Hadoop or Apache Spark framework across dynamically scalable Amazon EC2 instances.
Take care of all
- provisioning
- configuration

Benefits

Auto scaling
Can integrate with Spot instances.
Can interact with data in AWS data stores such as:
- Amazon S3
Integrate with S3 & Redshift
- Amazon DynamoDB.

Use cases

Big data framework
Web indexing
ML
Data processing

Type of node

Master node

Core node

Task node

Previousdata processing NextKinesis

Last updated 10 months ago