Kafka Security

Kafka is a system for continuous, high throughput messaging of event data, such as logs, to enable near real-time analytics. It is structured as a distributed message broker with incoming-event producers sending messages to topics and outgoing-event consumers.  Motivations behind its development include decoupling producers and consumers from each other for flexibility, reducing time to process events and increasing throughput. Couple analogies to think of it are a sender using sendmail to send an email to an email address (topic);  or a message “router” which decides the destination for a particular message – except Kafka persists the messages until the consumer is ready for them. It is an intermediary in the log processing pipeline – there is no processing of data itself on Kafka – there are no reads for instance. In contrast to JMS, one can send batch messages to Kafka and individual messages do not have to be acknowledged.

A design thesis of Kafka is that sequential (contiguous) disk access is very fast and can be even faster than random memory access. It uses zero copy, and uses a binary protocol over TCP, not HTTP.  A quote from design link – “This combination of pagecache and sendfile means that on a Kafka cluster where the consumers are mostly caught up you will see no read activity on the disks whatsoever as they will be serving data entirely from cache”.  This along with the distributed design makes it faster than competing pub-sub systems.

A proposal for adding security to it has been underway, for enterprise use, to control who can publish and subscribe to topics – https://cwiki.apache.org/confluence/display/KAFKA/Security . A talk on Kafka security by HortonWorks on integrating Kerberos authentication, SSL encryption with Kafka was given at a recent meetup. The slides are at – http://www.slideshare.net/harshach/kafka-security.

Of interest was an incident where the SSL patch caused the cluster to become unstable and increase latencies on a production cluster. The issue was debugged using profiling. Although SSL did increase latencies, this specific issue was narrowed to a bug unrelated to SSL in the same patch which had to do with zero copy.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s