Chuyển đến nội dung chính

Bài đăng

Hiển thị các bài đăng có nhãn Big data

[AI] The three V’s of Big Data

3 V’s of big data Volume : Challenge will just keep on getting bigger. Currently Facebook has more users than China has people. 👀 Velocity : How fast data coming in. Facebook was received 735M comments/421M status/195M image uploads per days... ten year ago. 💨💨💨 Variety : Almost of data is unstructured (storing photographs, sensor data, IoT device information, tweets, encrypted packets, voice, video...). 👻

[Big Data] Compare Hadoop vs Storm

Storm Hadoop Real-time stream processing Batch processing Stateless Stateful Master/Slave architecture with ZooKeeper based coordination. The master node is called as  nimbus  and slaves are  supervisors . Master-slave architecture with/without ZooKeeper based coordination. Master node is  job tracker  and slave node is  task tracker . A Storm streaming process can access tens of thousands messages per second on cluster. Hadoop Distributed File System (HDFS) uses MapReduce framework to process vast amount of data that takes minutes or hours. Storm topology runs until shutdown by the user or an unexpected unrecoverable failure. MapReduce jobs are executed in a sequential order and completed eventually. Both are distributed and fault-tolerant If nimbus / supervisor dies, restarting makes it continue from where it stopped, hence nothing gets affected. If the JobTracker dies, all the running jobs are lost.