Vu's Blog

Bài đăng

Hiển thị các bài đăng có nhãn Big data

[AI] The three V’s of Big Data

3 V’s of big data Volume : Challenge will just keep on getting bigger. Currently Facebook has more users than China has people. 👀 Velocity : How fast data coming in. Facebook was received 735M comments/421M status/195M image uploads per days... ten year ago. 💨💨💨 Variety : Almost of data is unstructured (storing photographs, sensor data, IoT device information, tweets, encrypted packets, voice, video...). 👻

[Big Data] Compare Hadoop vs Storm

Storm Hadoop Real-time stream processing Batch processing Stateless Stateful Master/Slave architecture with ZooKeeper based coordination. The master node is called as nimbus and slaves are supervisors . Master-slave architecture with/without ZooKeeper based coordination. Master node is job tracker and slave node is task tracker . A Storm streaming process can access tens of thousands messages per second on cluster. Hadoop Distributed File System (HDFS) uses MapReduce framework to process vast amount of data that takes minutes or hours. Storm topology runs until shutdown by the user or an unexpected unrecoverable failure. MapReduce jobs are executed in a sequential order and completed eventually. Both are distributed and fault-tolerant If nimbus / supervisor dies, restarting makes it continue from where it stopped, hence nothing gets affected. If the JobTracker dies, all the running jobs are lost.