AWS
DevOps
  • knowledge
    • glossary
    • network knowledge
      • CIDR Block
      • OSI
      • List of Ports
      • Network model
    • AWS best practices
      • Least privilege principle
      • Support Plan
      • Well-architected framework
        • Well-architected framework
        • Cost optimization
        • Operational Excellence
        • Performance efficiency
        • Reliability
        • Security
    • Exams
      • DOP-C02
        • DOP-C02 topics
        • DOP-C02 Labs
      • DVA-C02
      • SOA-C02
  • services
    • access management
      • Directory Service
      • IAM
        • PassRole
      • IAM Identity Center (SSO)
      • Organizations
        • Organizational Unit
        • Control Tower
      • AD Domain Service
    • analytics
      • data analytic
        • Athena
        • QuickSight
        • Redshift
      • data collection
        • Data Lake
        • Lake Formation
      • data processing
        • EMR
        • Kinesis
        • Glue
          • Glue Data Catalog
      • OpenSearch
    • compute
      • Batch
      • EC2
        • Auto Scaling
        • AMI
        • ELB
          • Global accelerator
        • Security Group
        • EBS
        • EC2 Instance Store
        • Spot Fleet
      • Elastic Beanstalk
      • Lambda
        • Layer
        • Lambda API
      • Outposts
      • Wavelength
      • SAM
      • VMWare Cloud
    • container
      • Copilot
      • ECR
      • ECS
        • ECS Anywhere
      • EKS
        • EKS Anywhere
        • EKS Distro
      • Fargate
    • cost management
      • Budgets
      • Cost Explorer
      • Saving Plans
      • Compute Optimizer
    • database
      • Data Engineer
      • Document DB
      • DynamoDB
        • DynamoDB API
        • Scan
      • ElastiCache
      • Keyspaces
      • MemoryDB for Redis
      • Neptune
      • Quantum Ledger Database
      • RDS
        • Aurora
          • Aurora Global Database
          • Aurora Serverless
      • Timestream
    • devTools
      • CICD
        • CodeArtifact
        • CodeCommit
        • CodeBuild
        • CodeDeploy
        • CodePipeline
      • CloudFormation
      • CodeGuru
      • CodeStar
      • CodeWhisperer
      • X-Ray
      • Deployment strategies
    • finance
      • Cost explorer
    • integration
      • AppFlow
      • AppSync
      • EventBridge
      • MQ
      • SNS
      • SQS
      • Step Functions
      • SWF
    • management
      • AppConfig
      • AWS Backup
      • AWS CDK
      • Config
      • Grafana
      • Health Dashboard
      • Proton
      • Service Catalog
      • System Manager
      • SSM
      • Resource Group
      • OpsWorks (discontinued)
    • media
      • Elemental MediaConvert
      • Transcoder
    • messaging
      • SES
    • migration
      • Application Migration Service
      • DataSync
      • DMS
      • Migration Evaluator
      • Migration Hub
      • Server Migration Service
      • Snow Family
      • Transfer Family
    • ML
      • Comprehend
      • Forecast
      • Kendra
      • Lex
      • Rekognition
      • SageMaker
        • SageMaker Data Wrangler
        • SageMaker ML Lineage Tracking
    • monitoring
      • CloudTrail
      • CloudWatch
      • TrustedAdvisor
    • networking
      • CloudFront
      • Customer gateway
      • Edge Location
      • hybrid networking
        • Direct Connect
          • Direct Connect Gateway
        • Site-to-site VPN
      • PrivateLink
      • Region
        • AZ
      • Route 53
      • Transit Gateway
      • VPC
        • VPC Lattice
        • Subnet
          • NACL
        • Internet Gateway
        • Network Firewall
        • VPN
        • NAT Gateway
      • Virtual Private Gateway
    • security
      • Artifact
      • ACM
      • CloudHSM
      • Cognito
      • Detective
      • Firewall Manager
      • GuardDuty
      • Inspector
      • KMS
      • Macie
      • Network Firewall
      • Resource Access Manager
      • Security Hub
      • Secret Manager
      • Secret Hub
      • Shield
      • STS
      • Trusted Advisor
      • WAF
    • storage
      • Backup
      • EBS
      • EFS
      • FSx
      • S3
        • S3 Glacier
        • S3 Snippet
        • S3 Mountpoint
      • Snow family
      • Storage gateway
      • WorkDocs
    • web & mobile
      • Amplify
      • API Gateway
      • Device Farm
      • Pinpoint
Powered by GitBook
On this page
  • Overview
  • Use cases
  • Features
  • Kinesis Data Streams
  • Kinesis Firehose
  • Kinesis Data Analytics/ Managed service for Apache Flink (MSAF)
  • Kinesis Video Streams
  • Best practices
  • Trivia
  • Concepts
  1. services
  2. analytics
  3. data processing

Kinesis

a suite of services that helps you work with streaming data

PreviousEMRNextGlue

Last updated 8 months ago

| | |

Overview

  • collect, process, analyze video & data streams in real-time.

  • Real-time data: Application logs, Metrics, Website clickstreams, IoT telemetry data.

  • operate in several modes

    • Data Streams

    • Firehose

    • Managed Apache Flink (Analytics)

    • Video stream

Use cases

  • Application monitoring

  • Fraud detection

  • Live game leaderboards

  • IoT

  • Sentiment analysis

Features

Service

Description

Use Case

Kinesis Data Streams

capture, process & store data stream

Ingesting data from various sources, processing data in real-time, performing real-time (millisecond) analytics

Kinesis Firehose

built-in Transform, deliver streaming data directly to AWS services

near real-time (buffer 1 min to 15min) storing and analyzing large amounts of data over time without managing your own data pipeline, simple transformation, auto scaling

Kinesis Data Analytics -> Amazon Managed service for Apache Flink (MSAF)

Process and analyze streaming data using standard SQL queries

Real-time data analytics on data streams without the need for specialized programming skills

Kinesis Video Streams

Securely stream video from connected devices to AWS for analysis and processing

Capturing video from security cameras, drones, and IoT sensors, and analyzing the data in real-time

Kinesis Data Streams

  • Producer

    • Kinesis Agent

    • AWS SDK

    • Kinesis Producer Library (KPL)

  • Consumer

    • Kinesis Data Analytics: use an Amazon Kinesis Data Analytics application to process and analyze using SQL or Java.

    • Kinesis Firehose: use an Amazon Kinesis Data Firehose delivery stream to process and store records in a destination.

    • Kinesis Client Library (KCL): use Kinesis Client Library to develop consumers.

Capacity modes

  • Provisioned mode

    • Choose the number of shards

    • Scale manually using API

  • On-demand mode

    • No nead to provision or manage capicty

    • Scale automatically based on observed throughput peak during the last 30 days.

Kinesis Firehose

  • Serverless, fully managed, automatic scaling.

  • Supports custom data transformations using Lambda

  • Producer

  • Consumer

    • AWS S3

    • Redshift

    • OpenSearch

    • 3rd party: Splunk, MongoDB, DataDog, NewRelic, HoneyCom...

    • HTTP Endpoint

  • Use cases:

    • Main usage scenarios for CloudWatch metric streams: Data lake— Create a metric stream and direct it to an Amazon Kinesis Data Firehose delivery stream that delivers your CloudWatch metrics to a data lake such as Amazon S3.

Kinesis Data Analytics/ Managed service for Apache Flink (MSAF)

  • Reads and processes real-time streaming data.

Kinesis Video Streams

Best practices

  • Increase number of shards in your Kinesis Data stream to handle increase throughput/traffic (resolve ProvisionedThroughputExceeded problem).

Trivia

  • real-time or near-real time = Kinesis Data Stream.

  • Kinesis Data stream uses the partition key associated with each data record to determine which shard a data record belongs to.

  • Multiple Kinesis Data Streams applications can consume data from a stream.

  • PutRecords request can support up to 500 records. Each record in the request can be as large as 1 MiB, up to a limit of 5 MiB.

  • ProvisionedThroughputExceededException: when there is throttling, it best practices to

    • Implement retries with exponential backoff.

    • Increase Shard Count

    • Optimize Data Send Rate: If possible, batch records to use the PutRecords API

    • Reduce the frequency and/or size of the requests.

    • Uniformly Distribute Partition Keys

Concepts

      • a shard is a unit of throughput capacity.

      • The number of instances does not exceed the number of open shards. Each shard is processed by exactly one KCL worker and has exactly one corresponding record processor, so you never need multiple instances to process one shard. However, one worker can process any number of shards, so it's fine if the number of shards exceeds the number of instances.

  • Clickstream: Clickstream data is a record of a user's activity on the internet, including every click they make while browsing a website or using an application.

Firehose does not support DynamoDB.

: a producer put records into Kinesis

: a consumer get records from Kinesis

: DB sharding is the processing of breaking up large tables into multiple smaller tables, or chunks called shards. So sharding is horizontal partitioning.

:

Producer (upstream)
Consumer (downstream)
Shard
refer
AWS Document
Data Analytics
Kinesis Streams
Apache Flink
Sharding
shard
Kinesis Data Analytics/ MSAF