EMR

Elastic MapReduce: hosted Hadoop framework

Document | Note |

Overview

  • Helps create Hadoop cluster (big data). A cluster can be made of hundreds of EC2 instances.

  • Process vast amount of data using Apache Hadoop or Apache Spark framework across dynamically scalable Amazon EC2 instances.

  • Take care of all

    • provisioning

    • configuration

Benefits

  • Auto scaling

  • Can integrate with Spot instances.

  • Can interact with data in AWS data stores such as:

    • Amazon S3

    • Amazon DynamoDB.

Use cases

  • Big data framework

  • Web indexing

  • ML

  • Data processing

Type of node

Master node

Core node

Task node

Last updated