# EMR

[Document](https://docs.aws.amazon.com/emr/?icmpid=docs_homepage_analytics) | [Note](https://gist.github.com/PhuongNHT1/caaf838f6a44fc8cfae5351970891a9f#amazon-emr) |

## Overview

* Helps create **Hadoop cluster** (<mark style="color:red;">big data</mark>). A cluster can be made of hundreds of EC2 instances.
* Process vast amount of data using **Apache Hadoop** or **Apache Spark** framework across dynamically scalable Amazon EC2 instances.
* Take care of all
  * provisioning
  * configuration

### Benefits

* Auto scaling
* Can integrate with Spot instances.
* Can interact with data in AWS data stores such as:

  * Amazon S3

  <figure><img src="/files/2sRYv9HUuib9xlGlADxb" alt="" width="375"><figcaption><p>Integrate with S3 &#x26; Redshift</p></figcaption></figure>

  * Amazon DynamoDB.

### Use cases

* *<mark style="color:red;">Big data framework</mark>*
* Web indexing
* ML
* Data processing

## Type of node

### Master node

### Core node

### Task node


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://mamawhocode.gitbook.io/aws/services/analytics/data-processing/emr.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
