Apache Kafka Review

As far as I know, it’s not only used as a sink, but also data source for stream computation framework like Spark, Storm.

Here I pay more attention to its Concurrent Design(multi producers, consumers, topics) than its Storage Model.

Read More

Apache Zookeeper Review

Animal keeper manages all kinds of animals. It does a lot which troubles me giving it a single category. A config center like spring cloud config or apollo ? A register and discover center like eureka, consul? A cluster manager? A message bus?

I privately prefer to treat it as a Cluster Manager after looking at HBase, Kafka, Storm, HDFS cluster. So here I am talking about Election, Node Management.(distributed lock is not for now)

Read More

Apache Hbase Review

A key-value database based on HDFS which relies on zookeeper to manage its cluster. Oh, it stoges data by column, which is better to timely insert and random query, especially for big, big tables.

You can learn Architecture of Hbase Cluster(Region, HMaster…), CRUD Logic from this post.

Read More

Apache Cassandra Review

Not like any other database(nosql), Casandra is a p2p or decentralization colum based NoSQL.

And because of this very type, it’s necessary to talk about its Design of Storage Model(column family), Read-Write Strategy(includes consistent hash).

Read More

Apache Flink Review

Not like Spark, Flink is mainly designed to deal with stateful data set(batch) and data stream vary in different use cases. Besides I personally think its hierarchical api system is friendly to us.

This short article includes its Computaion Model, Cluster Architecture, Api Category.

Read More

Apache Spark Review

Not like Flink which is short and sharp, Spark acts as huge eco-system which covers from rpc, data storage to computation, scheduler…

I am ganna talk about Spark Core, Spark SQL, Spark Streaming instead of all its features at very short length.

Read More
Why I choose 'Distributed System' as one of my Main Subject?