Month: June 2019

Apache Druid – horizontally scalable time series database

Machine generated data such as clickstreams or database update streams, consists of many rows, which consist of 3 parts

  • timestamp
  • text columns or attributes, called dimensions
  • numerical values such as counts of hits, words, characters etc, called metrics

The desire to rapidly aggregate over such data with a low latency gave rise to Druid.

The data is append heavy and ingestion is a problem, as is querying the data, especially at low-latency.

Druid has two subsystems –

  • a write-optimized subsystem in the real-time nodes
  • a read-optimized subsystem in the historical nodes

The data is stored in S3 or HDFS in a column-oriented format.

A good explanatory article that goes into the Druid internal architecture – https://towardsdatascience.com/apache-druid-part-1-a-scalable-timeseries-olap-database-system-af8c18fc766d

Druid is different from Flink and Spark streaming in that it is not a streaming system. Flink can apply real-time data transformations on the data, which can then be ingested into Druid via Kafka, to power real-time dashboards.

Aviatrix Controller on AWS for secure networking

These are some notes from a talk by Aviatrix last week. Many customers get started with Aviatrix orchestration system for deploying AWS Transit Gateway (TGW) and Direct Connect. The transit gateway is the hub gateway that connects multiple VPCs with an on-premise link, possibly over Direct Connect. The Aviatrix product can then deploy and manage multiple VPCs and the communication between them, directing which VPC can talk to which other VPC. It controls the communication by simply deleting the routes.

The advanced transit controller solution is useful for multiple regions, to manage the communication between regions. Another aspect is there are high speed interconnects between the cloud providers and Aviatrix builds an overlay that bridges between public clouds. Multi-account communication and secure communication between the networks using segmentation can be enabled.

According to Aviatrix, AWS’s motto is go build, and do it yourself, it is designed for the builders. But when you go beyond 3 VPCs to 3000 VPCs, one needs a solution to manage the routes in an automated manner. This is the situation for many larger customers. For smaller ones where there are Production, Development and Edge/On-premise network components to manage it also finds use.

Remote user VPN is another use case. Not only can one VPN in and get to all the VPCs, but specify which CIDR they can get to and other restrictions.