Questions | Votes | Answers |
---|---|---|
Kafka 是什麼(較接近以下何者)(what is Kafka exactly) |
a. stream engine b. storage/file system c. message queue d. message broker e. CP/AP NoSQL database | @AlexTz @Anonymous @Titan gene @Ryu Wong | Ref: Martin Kleppmann | Kafka Summit London 2019 Keynote | Is Kafka a Database? | | What is a Kafka Topic? | @AlexTz | | | What is a Kafka Cluster? | @AlexTz @Ryu Wong | | | 在 Kafka Cluster 內
a. 只會有單一節點嗎? b. 是否會有強一制性 c. cluster 內的節點/partition 是都得在同個「網路環境」底下(data center, AWS region, 同個內網, cloud availability zones)? | @Michael Chen @wang allen | | | Topic 跟 Cluster 的關係為何?
a. (多對一)一個 cluster 內是否可以存多於一個 Topic? b. (一對多)一個 Topic, 是否可以散布在不同 Cluster/Region 內被存取? b.(1) 如果可以,這種散布是強一制性的,還是參考 quorom 將資料碎片化? | @AlexTz @小廢廢 @wang allen @Anonymous @kuan hou Chan @Ryu Wong | | | 什麼是 Kafka 裡的 (Topic) Partition | | | | Difference between batch (poll-based) vs. realtime (push-based) processing?
a. Does Kafka only support batch processing? | | |
| Difference between Kafka, Google PubSub, and SQS | @AlexTz | |
| 解釋什麼是「passing database changelog to subscribers」 | @Eric Chang | a. 拿來作為 Change-Data-Capture (CDC) 的儲存/派送機制使用
a. by @Eric Chang 把 資料的新增/改動
跟 Database 本身解耦合,讓其他的服務也能在資料異動的同時做相應操作 e.g. invalidate cache.
Ref: https://medium.com/event-driven-utopia/8-practical-use-cases-of-change-data-capture-8f059da4c3b7
https://medium.com/dcardlab/postgresql-技術筆記-跟疾管署沒有關係的cdc-218e27eb363d |
| Does single cluster necessarily have bad availability and fault tolerance? | | |
| Would the Metadata Server in Cluster Federation increases latency, while becoming a single point of failure? | | |
| How does Kafka normally scale, without adding more clusters? | | |
| 解釋什麼是「new topics are seamlessly created on the newly added clusters」 | @AlexTz @kuan hou Chan | |
| 解釋為什麼「以前要在不同 cluster 間,遷徙 topic 是很難的」
a. A Topic could only exist in a single cluster? b. Why is coordination needed to shift cluster traffic? | | | | Does Kafka have a default retrying mechanism?
a. How is it different from PubSub, SQS | | | | DLQ
a. What is the definition of failed to be processed? (NACK vs worker failure?) b. is there a centralized DLQ, or each topic could have its own DLQ? | @Eric Chang @AlexTz @kuan hou Chan @Anonymous | 當我們 consumer 一個 topic,可能會 fail, retry or drop
多個 topic 來應付失敗狀況,用不同的 topic 來人工介入
https://www.uber.com/en-TW/blog/reliable-reprocessing/ | | Cluster Federation | | | | Consumer Proxy
a. What is the definition of consumer/consumer groups in Kafka? b. Why is a proxy server needed? | | | | Cross-cluster Replication
a. What does 「uses multiple clusters in different data centers」means? b. Explain 「needs global view」and 「replicated for redundancy」 c. How does the cross region setup look like? | | | | What does it mean in terms of Kafka as a steaming storage (maybe talk about the type of data store in Kafka, or the format)?
Is Kafka responsible for stream processing as well? Not sure how event streaming works generally | @AlexTz | | | What is OLAP exactly? How do data analyst access/use it (cuz it’s a bit different from using SQL?) | @Anonymous | | | Kafka 不同的消息處理策略對應用的影響?
Knowledge Point | Votes | 解釋 | 補充資料 |
---|---|---|---|
Message polling/pushing | @Eric Chang | (Eric) Batching vs. Streaming |
一次處理資料的大小,pull 跟 push → 接收後發送資料的方式 覺得主角不太一樣 | | | Kafka cluster/topic/partition | @小廢廢 | 一群一群的 server node 的實體
topic 是抽象得概念,類似的 streaming 事件的歸納
partition: 寫入跟 consistency/redundancy 的最小單位
有 watermark, 有點像是 single threaded 的 consistency 保證
可以用 partition key 來強制寫入某個分區 | 細說 Kafka Partition 分區 | | Change-Data-Capture (CDC) | @wang allen | 用來將「紀錄改動」與資料庫實體本身解耦合
在 OLTP/OLAP, 理,搬資料跟創建副本是很常見的,但 batch 模式的搬經常不太 realtime, 以及不太好回溯。但採用 CDC 的模式比較能將改動在有 durability 的方式下紀錄 | | | Cluster Federation | @Eric Chang @wang allen | | | | DLQ | | | | | Consumer Proxy | @Eric Chang | | | | Stream Engine | | | | | Leader-based replication | | | | | Stretch Cluster | | | | | | | | | | OLAP | | | | | | | | | | | | | |