Vu's Blog

Bài đăng

Hiển thị các bài đăng có nhãn apache hadoop

[Big Data] Compare Hadoop vs Storm

Storm Hadoop Real-time stream processing Batch processing Stateless Stateful Master/Slave architecture with ZooKeeper based coordination. The master node is called as nimbus and slaves are supervisors . Master-slave architecture with/without ZooKeeper based coordination. Master node is job tracker and slave node is task tracker . A Storm streaming process can access tens of thousands messages per second on cluster. Hadoop Distributed File System (HDFS) uses MapReduce framework to process vast amount of data that takes minutes or hours. Storm topology runs until shutdown by the user or an unexpected unrecoverable failure. MapReduce jobs are executed in a sequential order and completed eventually. Both are distributed and fault-tolerant If nimbus / supervisor dies, restarting makes it continue from where it stopped, hence nothing gets affected. If the JobTracker dies, all the running jobs are lost.