CTO Cheat Sheet: Apache Storm

What is Storm?

Apache Storm is a free and open source distributed realtime computation system. Apache Storm makes it easy to reliably process unbounded streams of data, doing for real time processing what Hadoop did for batch processing.

What Problem Does it Solve?

Storm makes it easy to reliably process unbounded streams of data. Here are some examples:

Real Time Processing
Machine Learning
Business intelligence
Big data analytics
Log monitoring/auditing system

Basic Concepts

Topology: A topology defines the workload for real time stream processing. It consists of 1 spout and 1 or more bolts. It’s like a mapreduce job in Hadoop (but mapreduce jobs end and a topology runs forever).
Stream: The stream is the core abstraction in Storm. A stream is an unbounded sequence of tuples that is processed and created in parallel in a distributed fashion. (Each tuple carries information that was processed by a node (bolt) and it passed to others node to transform that information)
Spout: A source of streams (Pull data from social media like Twitter, Instagram, Facebook)
Bolts: All processing in topologies is done in bolts. Bolts can do anything from filtering, functions, aggregations, joins, talking to databases, and more. (Filter data from twitter based on certain criteria, such as get all tweets in English or get some trending event, and then use another Bolt to store those tweets in a repository, send them to a external service or send them to external services and await some outcome to pass the data to another bolt)

Storm Vs

Storm‍

Distributed real time processing
Stateless, Data is streamed
Stream abstraction
Micro batching processing

Kafka

It is a distributed message broker
It is about transferring messages, data is store in the filesystem
Use publisher - subscriber paradigm
Stream Processing

Hadoop

Distributed processing
State based, data is static and stored
MapReduce cluster computing paradigm
Batch Processing

Spark

Distributed processing
Stateless / Stateful
Resilient distributed dataset (RDD)
Batch processing

What is Storm?

What Problem Does it Solve?

Basic Concepts

Storm Vs

Suscribe to our newsletter