Author: Ruchir Tewari

Bitcoin market cap reaches $1T

Bitcoin reached a $1T market cap last month. https://www.msn.com/en-us/news/technology/bitcoin-reaches-dollar1-trillion-valuation-twice-as-fast-as-amazon/ar-BB1fF3Bl

A Bitcoin halving event is scheduled to take place every 210,000 blocks. This reduces the payoff of securing a block by half. Three Bitcoin halvings have taken place so far in 2012, 2016, 2020. The next halving is predicted to occur in 2024. The corresponding block reward went from 50btc in 2009 to 25 in ‘12, 12.5 in ‘16, 6.25 in ‘20 and 3.125 in ‘24. https://www.coinwarz.com/mining/bitcoin/halving

The rate of production of bitcoin over time is shown below. Mining will continue until 21million btc are created.

VeChain is a blockchain proposal/implementation for supply chain tracking.

https://cdn.vechain.com/vechainthor_development_plan_and_whitepaper_en_v1.0.pdf

EdgeChain is an architecture for placement of applications on the edge amongst multiple resources from multiple providers. It is built on Vechain for Mobile and Edge Computing use cases.

https://vechainofficial.medium.com/vechain-introduces-michigan-state-university-as-first-veresearch-participant-and-pioneering-the-mec-c8dec3015914

Disaster Recovery: Understanding and designing for RPO and RTO

Let’s take a disaster scenario where a system loses its data-in-transit (i.e. not yet persisted) at a certain point in time. and some time after this point, a recovery process kicks in, which restores the system back to normal functioning.

Recovery Point Objective refers to the amount of tolerable data loss measured in time. It can be measured in time based on the fact that it is in-transit data of a certain max velocity, so bounding the time bounds the amount of data that can be lost. This time figure, the RPO, is used to determine how frequently the data must be persisted and replicated. An RPO of 10 minutes implies the data must be backed up every 10 minutes. If there’s a crash the system can be restored to a point not more than 10 minutes prior to the time of crash. RPO determines frequency of backups, snapshots or transaction logs.

Recovery Time Objective refers to the amount of time required to restore a system to normal behavior after a disaster has happened. This includes restoration of all infrastructure components that provide a service, not just the restoration of data.

Lower RPO/RTO is higher cost.

Matrix of RPO – high/low vs RTO – high/low can be used to categorize applications.

Low RPO, Low RTO. Critical online application like a storefront.

Low RPO, High RTO. Data sensitive application but not online, like analytics.

High RPO, Low RTO. Redundantly available data or no data. Compute clusters that are highly available.

High RPO, High RPO. Non-prod systems – dev/test/qa ?

Amount of acceptable data loss <= App (data?) criticality.
One can expect a pyramid of apps – large number with less criticality, small number with high criticality

Repeatability. Backup and recovery procedures. Must be written. Must be tested. Automation.

HA/DR spectrum of solutions:

  • Backups, save transaction logs
  • Snapshots
  • Replication – synchronous, asynchronous
  • Storage only vs in-memory as well. Application level crash consistency of backups.
  • Multiple AZs
  • Hybrid

Tech: S3 versioning and DDB streams, Global tables.

Rules of thumb:

Related terms: RPA and RTA

3 types of disasters.

  • Natural disaster – e.g. floods, earthquakes, fire
  • Technical failure – e.g. loss of power, cable pulled
  • Human error – e.g. delete all files as admin

Replication – works for first two. Continuous snapshots/backup/versioning – for the last one. Replication will just delete the data on both sides. Need the ability to go back in time and restore data.

Cost – how to optimize cost of infrastructure and its maintenance.

Which region to choose ? Key considerations: What types of disasters are the concern (Risk). How much proximity is needed to end-customers and to primary region (Performance). What’s the cost of the region (Cost) ?

SolarWinds incident overview

SolarWinds makes software for managing networks and infrastructure. Its Orion software was the target of an advanced cyberattack in 2020. Hackers acquired superuser access to certificates used to sign SAML tokens. This certificate was used to forge new tokens to allow hackers highly privileged access to networks.

Attackers may have compromised internal build or distribution systems of SolarWinds, embedding backdoor code into a legitimate SolarWinds library with the file name SolarWinds.Orion.Core.BusinessLayer.dll. This backdoor could then be distributed via automatic updates in target networks.

The malicious DLL called out to a remote network infrastructure using the domain avsvmcloud.com. to prepare possible second-stage payloads, move laterally in the organization, and compromise or exfiltrate data.

The Cybersecurity and Infrastructure Security Agency issued Emergency Directive 21–01 in response to the incident, advising all federal civilian agencies to disable Orion.

SolarWinds Sunburst attack network paths (source)

Ref. https://web.archive.org/web/20201220053318/https://msrc-blog.microsoft.com/2020/12/13/customer-guidance-on-recent-nation-state-cyber-attacks/

Processors for Deep Learning: Nvidia Ampere GPU, Tesla Dojo, AWS Inferentia, Cerebras

The NVidia Volta-100 GPU released in Dec 2017 was the first microprocessor with dedicated cores purely for matrix computations called Tensor Cores. The Ampere-100 GPU released May’20 is its successor. Ampere has 84 Streaming Multiprocessors (SMs) with 4 Tensor Cores (TCs) each for a total of 336 TCs. Tensor Cores reduce the cycle time for matrix multiplications, operating on 4×4 matrices of 16bit floating point numbers. These GPUs are aimed at Deep Learning use cases which consist of a pipeline of matrix operations.

Here’s an article on choosing the right EC2 instance type for DL – https://towardsdatascience.com/choosing-the-right-gpu-for-deep-learning-on-aws-d69c157d8c86 (G4 for inferencing, P4 for training).

How did the need for specialized DL chips arise, and why are Tensors important in DL ? In math, we have Scalars and Vectors. Scalars are used for magnitude and Vectors encode magnitude and direction. To transform Vectors, one applies Linear Transformations in the form of Matrices. Matrices for Linear Transformations have EigenVectors and EigenValues which describe the invariants of the transformation. A Tensor in math and physics is a concept that exhibits certain types invariance during transformations. In 3 dimensions, a Stress Tensor has 9 components, which can be representated as a 3×3 matrix; under a change of basis the components of the tensor change however the tensor itself does not.

In Deep Learning applications a Tensor is basically a Matrix. The Generalized Matrix Multiplication (GEMM) operation, D=AxB+C, is at the heart of Deep Learning, and Tensor Cores are designed to speed these up.

In Deep Learning, multilinear maps are interleaved with non-linear transforms to model arbitrary transforms of input to output and a specific model is arrived by a process of error reduction on training of actual data. This PyTorch Deep Learning page is an excellent resource to transition from traditional linear algebra to deep learning software – https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html .

Tesla Dojo is planned to build a processor/computer dedicated for Deep Learning to train on vast amounts of video data. Launched on Tesla AI Day, Aug’20 2021, a video at https://www.youtube.com/watch?v=DSw3IwsgNnc

AWS Inferentia is a chip for deep learning inferencing, with its four Neuron Cores.

AWS Trainium is an ML chip for training.

Generally speaking the desire in deep learning community is to have simpler processing units in larger numbers.

Updates: Cerebras announced a chip which can handle neural networks with 120 trillion parameters, with 850,000 AI optimized cores per chip.

SambaNova, Anton, Cerebras and Graphcore presentations are at https://www.anandtech.com/show/16908/hot-chips-2021-live-blog-machine-learning-graphcore-cerebras-sambanova-anton

SambaNova is building 400,000 AI cores per chip.

NVIDIA GPUAWS InstanceAzure Instance
M60G3
T4G4NVv4
V100P3NCv4
A100P4, P4dNDv4

https://lambdalabs.com/blog/nvidia-a100-vs-v100-benchmarks

Delta Lake and Spark for threat detection and response at scale

Notes on a talk on the data platform for Threat detection and response at scale at Spark+AI Summit, 2018.

The threat detection problem, use-cases and scale.

  • It’s important to focus on and build the data platform first else one can get siloed into narrow set of addressable use-cases.
  • we want to detect attacks,
  • contextualize the attacks
  • determine root cause of an attack,
  • determine what the scope of the incident might be
  • determine what we need to contain it
  • Diverse threats require diverse data sets
  • the Threat signal can be concentrated or spread in time
  • Keylines visualization library is used to build a visualization of detection, contextualization, containment

Streaming is a simple pattern that takes us very far for detection

  • Streams are left-joined with context and filtered or inner-joined with indicators
  • Can do a lot with this but not everything
  • Graphs are key. Graphs at scale are super hard.

Enabling triage and containment with search and query

  • to triage the detection, it comes down to search and query.
  • ETM does 3.5million records/sec. 100TB of data a day. 300B events a day.
  • 11 trillion rows, 0.5PB of data.

Ingestion architecture – tries to solve all these problems and balance issues.

  • data comes into s3 in a consistent json wrapper
  • there’s a single ETL job that takes all the data and writes it into a single staging table which is partitioned by date and event-type, has a long retention
  • table is optimized to stream new data in and stream data out of, but can be queried as well. you can actually go and query it using sql function
  • highest value data – we write parsers, we have discrete parsing streams and put them into a common schema and put it into separate delta tables. well parsed, well structured.
  • use optimizations from delta, z-odering.. to index over columns that are common predicates. search by IP address, domain names – those are what we order by
  • indexing and z-ordering – take advantage of data skipping
  • sometimes we parser code gets messed up.
  • single staging table.. is great . we just let the fixed parser run forward, we have all the data corrected, then we are back-corrected. don’t have to repackage code and run as a batch job. we literally just fix code and run it in the same model that’s it.
  • off of these refined tables or parsed data sets, this is where the detection comes in.
  • we have a number of detection streams in batches, that do the logic and analysis. facet-faced or statistical.
  • alerts that come out of this go to their own alert table. goes to delta again. long retection, consistent schema. another streaming job then does de-duplication and whitelisting and writes out alerts to our alert management system. we durably store all the alerts, whether or not de-duped/whitelisted
  • allows us to go back and fix things if things are not quite correct, accidentally.
  • all this gives us operational sanity, and a nice feedback loop

Thanks to z-ordering, it can go from scanning 500TB to 36TB.

  • average case is searching over weeks or months. it makes it usable for ad-hoc refinements.
  • simple, unified platform.

Michael: Demo on interactive queries over months of data

  • first attempt is sql SELECT on raw data. takes too long, cancelled. second attempt uses HMS, still too long, cancelled. why is this so hard ?
  • observation: every data problem is actually two problems 1) data engineering and 2) data science. most projects fail on step 1.
  • doing it with delta – the following command takes 17s and fires off spark job to put the data in a common schema.

CREATE TABLE connections USING delta AS SELECT * from json.'/data/connections'

then

SELECT * FROM connections WHERE dest_port = 666

this is great to query the historical data quickly.. however batch alone is not going to cut it as we may have attacks going on right now. but delta plugs into streaming as well:

INSERT INTO connections SELECT * from kafkaStream

Now we’ve Indexed both batch and streaming data.

We can run a python netviz command to visualize the connections.

Here’s a paper on the Databricks Delta Lake approach – https://databricks.com/wp-content/uploads/2020/08/p975-armbrust.pdf .

Supply Chain Logistics and SAP TM

SAP Transportation Management or SAP TM is a module used for Supply Chain Optimization.

SAP TM has four different optimizer engines –

VSR Optimizer: Plan Shipments in the best possible way on available Vehicles via available routes. TVSR (Vehicle scheduling and routing), TVSS, TVRG Applications come under this.

Load Optimizer: Arrange pallets or packages on the vehicle considering rules like Stackability, etc. (TVSO Application)

Carrier Selection: Rank carriers[1] for each shipment considering costs, Business Shares, Allocations. (TSPS Application)

Strategic Freight Management: Rank bids by carriers for long-term contracts based on Cost, Capacity & Risk. (TSFM Application)

The need for Transportation Management as a service is justified by several use cases.

Many recent announcements from leading car manufacturers and other companies whose business models are susceptible to disruption are adopting TaaS platforms (through in-house development efforts, partnerships, or acquisitions) to provide services:

The role of APIs in modernizing supply chain systems from legacy EDI based designs – https://www.coupa.com/blog/supply-chain/tech-forward-apis-emerging-player-supply-chain

A comparison of API vs EDI systems – https://arcb.com/blog/edi-vs-api-which-is-right-for-my-business

Some definitions from Wikipedia to clarify concepts-

Logistics is generally the detailed organization and implementation of a complex operation. In a general business sense, logistics is the management of the flow of things between the point of origin and the point of consumption to meet the requirements of customers or corporations.

The resources managed in logistics may include tangible goods such as materials, equipment, and supplies, as well as food and other consumable items.

Logistics management is the part of supply chain management and supply chain engineering that plans, implements, and controls the efficient, effective forward, and reverse flow and storage of goods, services, and related information between the point of origin and point of consumption to meet customer’s requirements. The complexity of logistics can be modeled, analyzed, visualized, and optimized by dedicated simulation software.

The minimization of the use of resources is a common motivation in all logistics fields.

A supply chain is the connected network of individuals, organizations, resources, activities, and technologies involved in the manufacture and sale of a product or service.

How can we be better prepared for a future crisis relative to supply chains?

Private companies have playbooks for supply chain disruptions in their network. In supply chain management, it is crucial to diversify your source of supplies so that when one supplier is impacted, you can turn to the other.

Kubernetes security

Kubernetes is a Platform-as-a-Service (PAAS) build on top of (docker) containers, but with an additional unit of abstraction called a pod, which a) is its smallest unit of execution b) has a single external IP address, c) is a group of one or more containers where d) the group of containers are connected over a network namespace and e) each pod is isolated from others by network namespaces. Within a pod, different containers can see each other over different ports over a loopback interface. Within an instance, while different pods can see each other as different IP addresses. It has a control plane built on top of etcd, a consistent, distributed, highly available key-value store, which is an independent opensource CNCF project.

It is conceptually similar to Cloud Foundry, Mesos, OpenStack, Mirantis and similar abstraction layers.

A threat matrix for Kubernetes by MS – https://www.microsoft.com/security/blog/2020/04/02/attack-matrix-kubernetes/

From RSA’20, here’s a talk on ‘The future of Kubernetes attacks’ – https://youtu.be/CH7S5rE3j8w

From Coinbase, a blog on ‘Why Kubernetes is not part of our stack’https://blog.coinbase.com/container-technologies-at-coinbase-d4ae118dcb6c makes these points

  • it needs a full-time compute team to maintain
  • securing it is neither trivial nor well understood. SPIFFE, SPIRE, Envoy, Istio, OPA, service mesh are a few of the technologies.

This blog links to – https://k8s.af/

Another viewpoint – https://pythonspeed.com/articles/dont-need-kubernetes/

A counterpoint to the Coinbase blog – https://blog.kumina.nl/2020/07/in-response-to-container-technologies-at-coinbase/

Scratch notes:

K8S is based on a Controller pattern:

  • Resources capture the desired state
  • Current state is kept centralized in etcd, a distributed key-value store, similar to Consul
  • Controllers reconcile current state with desired state

Pod is a top level resource, is the smallest deployment unit, and is a group of one or more containers described by a yaml file, similar to docker-compose.yml .

K8S Operator is a kind of resource manager, for Custom resources.

https://blog.frankel.ch/your-own-kubernetes-controller/1/

https://pushbuildtestdeploy.com/when-do-kubernetes-operators-make-sense

Spinnaker is a Continuous Delivery platform that itself runs on k8s as a set of pods which can be scaled up

A kubectl cheat sheet:

https://kubernetes.io/docs/reference/kubectl/cheatsheet

An article on cloud security https://medium.com/xm-cyber/having-fun-with-cloud-services-e281f8a7fe60 , which I think makes the point of why things are relatively complex to begin with.

One comes across the terms helm and helm charts. Helm is a way to package a complex k8s application. This adds a layer of indirection to an app – https://stepan.wtf/to-helm-or-not/ .

A repo to list failing pods – https://github.com/edrevo/suspicious-pods

Exploring networking in k8s – https://dustinspecker.com/posts/how-do-kubernetes-and-docker-create-ip-addresses/

Plugin for Pod networking on EKS using ENIs – https://github.com/aws/amazon-vpc-cni-k8s

Hardening EKS with IAM, RBAC – https://snyk.io/blog/hardening-aws-eks-security-rbac-secure-imds-audit-logging/

EKS Authentication with IAM – how does it work ?

IAM is only used for authentication of valid IAM entities. All permissions for interacting with EKS Kubernetes API is managed through the native Kubernetes RBAC system.

AWS IAM Authenticator for EKS is a component that enables access to EKS via IAM, for provisioning, managing, updating the cluster. It runs on the EKS Control Plane – https://github.com/kubernetes-sigs/aws-iam-authenticator#aws-iam-authenticator-for-kubernetes

A k8s ConfigMap is used to store non-confidential data in key-value pairs.

The above authenticator gets its configuration information from the aws-auth ConfigMap. This ConfigMap can be edited via eksctl (recommended) or be directly edited.

A Kubernetes service account provides an identity for processes that run in a pod. For more information see Managing Service Accounts in the Kubernetes documentation. If your pod needs access to AWS services, you can map the service account to an AWS Identity and Access Management identity to grant that access.

EKS Networking – https://aws.amazon.com/blogs/containers/de-mystifying-cluster-networking-for-amazon-eks-worker-nodes/

https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html

https://docs.aws.amazon.com/vpc/latest/reachability/getting-started-cli.html

https://docs.aws.amazon.com/eks/latest/userguide/dashboard-tutorial.html

RSA World 2020 – QRNG based crypto keys

Creating my first Quantum Crypto keys. The QC hardware provides the entropy source for generation of a key, which is encapsulated using one of several available mechanisms. Key encapsulation mechanism chosen is was Classic McEliece. Keys were emailed to me after generation at the Cambridge Quantum Computing booth at RSA World 2020.

Quantum-Proof Cryptography with IronBridge, TKET and Amazon Braket

Cambridge Quantum Computing has developed the first provable QRNG, known as IronBridge, which uses quantum computers to generate unbiased private entropy. IronBridge generates cryptographic keys using this entropy, resulting in quantum-proof keys (for both classical and post-quantum algorithms).

Their paper on “Practical randomness and privacy amplification” – https://arxiv.org/pdf/2009.06551.pdf

Links on McEliece, Goppa codes, FrodoKEM and PQC –

https://en.wikipedia.org/wiki/McEliece_cryptosystem

https://en.wikipedia.org/wiki/Binary_Goppa_code

FrodoKEM: https://eprint.iacr.org/2018/686.pdf : In 2016, Bos et al. proposed the key exchange scheme FrodoCCS, that is also a submission to the NIST post-quantum standardization process, modified as a key encapsulation mechanism (FrodoKEM). The security of the scheme is based on standard lattices
and the learning with errors problem

PQ Crypto Catalog: https://github.com/kriskwiatkowski/pqc  has implementations of quantum-safe signature and KEM schemes submitted to NIST PQC standardization process

AWS Builders Library

A few interesting ideas from AWS Builders Library

Toyota’s Five-why’s approach to root cause a problem – is good but not enough to find all other root causes that might also cause a problem.

Couple great talks on serverless

Smithy is an Apache-2.0 licensed, protocol-agnostic IDL for defining APIs, generating clients, servers and documentation. https://github.com/awslabs/smithy

Accuracy vs Recall vs Precision vs F1 in Machine Learning

We want to walk through some common metrics in classification problems – such as accuracy, precision and recall – to get a feel for when to use which metric. Say we are looking for a needle in a haystack. There are very few needles in a large haystack full of straws. An automated machine is sifting through the objects in the haystack and predicting for each object whether it is a straw or a needle. A reasonable predictor will predict a small number of objects as needles and a large number as straws. A prediction has two attributes – positive/negative and accurate/inaccurate.

Positive Prediction: the object at hand is predicted to be the needle. A small number.

Negative Prediction: the object at hand is predicted not to be a needle. A large number.

True_Positive: of the total number of predictions, the number of predictions that were positive and correct. Correctly predicted Positives (needles). A small number.

True_Negative: of the total number of predictions, the number of predictions that were negative and correct. Correctly predicted Negatives (straws). A large number.

False_Positive: of the total number of predictions, the number of predictions that are positive but the prediction is incorrect. Incorrectly predicted Positives (straw predicted as needle). Could be large as the number of straws is large, but assuming the total number of predicted needles is small, this is less than or equal to predicted needles, hence small.

False_Negative: of the total number of predictions, the number of predictions that are negative but the prediction is incorrect. Incorrectly predicted Negatives (needle predicted as straw). Is this a large number ? It is unknown – this class is not large just because the class of negatives is large – it depends on the predictor and a “reasonable” predictor which predicts most objects as straws, could also predict many needles as straws. This is less than or equal to the total number of needles, hence small.

Predicted_Positives = True_Positives + False_Positives = Total number of objects predicted as needles.

Actual Positives = Actual number of needles, which is independant of the number of predictions either way, however Actual Positives = True Positives + False Negatives.

Accuracy = nCorrect _Predictons/nTotal_Predictions=(nTrue_Positives+nTrue_Negatives) / (nPredicted_Positives +nPredicted_Negatives) .   # the reasonable assumption above is equivalent to a high accuracy. Most predictions will be hay, and be correct in this simply because of the skewed distribution. This does not shed light on FP or FN.

Precision = nTrue_Positives / nPredicted_Positives    # correctly_identified_needles/predicted_needles;  this sheds light on FP; Precision = 1 => FP=0 => all predictions of needles are in fact needles; a precision less than 1 means we got a bunch of hay with the needles – gives hope that with further sifting the hay can be removed.  Precision is also called Specificity and quantifies the absence of False Positives or incorrect diagnoses.

Recall = nTrue_Positives / nActual_Positives  = TP/(TP+FN)# correctly_identified_needles/all_needles; this sheds light on FN; Recall = 1 => FN = 0; a recall less than 1 is awful as some needles are left out in the sifting process. Recall is also called Sensitivity .

Precision > Recall => FN is higher than FP

Precision < Recall => FN is lower than FP

If at least one needle is correctly identified as a needle, both precision and recall will be positive; if zero needles are correctly identified, both precision and recall are zero.

F1 Score is the harmonic mean of Precision and Recall.  1/F1 = 1/2(1/P + 1/R) . F1=2PR/(P+R) .  F1=0 if P=0 or R=0. F1=1 if P=1 and R=1.

ROC/AUC rely on Recall (=TP/TP+FN) and another metric False Positive Rate defined as FP/(FP+TN)  = hay_falsely_identified_as_needles/total_hay . As TN >> FP, this should be close to zero and does not appear to be a useful metric in the context of needles in a haystack; as are ROC/AuC . The denominators are different in Recall and FPR, total needles and total hay respectively.

There’s a bit of semantic confusion when saying True Positive or False Positive. These shorthands can be interpreted as- it was known that an instance was a Positive and a label of True or False was applied to that instance. But what we mean is that it was not known whether the instance was a Positive, and that a determination was made that it was a Positive and this determination was later found to be correct (True) or incorrect (False). Mentally replace True/False with ‘Correct/Incorrectly identified as’ to remove this confusion.

Normalization: scale of 0-1, or unit norm; useful for dot products when calculating similarity.

Standardization: zero mean, divided by standard deviation; useful in neural network/classifier inputs

Regularization: used to reduce sensitivity to certain features. Uses regression. L1: Lasso regression L2: Ridge regression

Confusion matrix: holds number of predicted values vs known truth. Square matrix with size n equal to number of categories.

Bias, Variance and their tradeoff. we want both to be low. When going from a simple model to a complex one, one often goes from high bias to a high variance scenario. https://towardsdatascience.com/understanding-the-bias-variance-tradeoff-165e6942b229

Firecracker MicroVM Security

I wanted to get a better understanding of firecracker microVM security, from the bottom up. A few questions –

a) how does firecracker design achieve a smaller threat surface than a typical vm/container ?

b) what mechanisms are available to secure code running in a microvm ?

c) and lastly, how can microvms change security considerations when deploying code for web services ?

The following design elements contribute to a smaller threat surface:

  • minimal design, in a memory safe, compact, readable rust language
  • minimal guest virtual device model: a network device, a block I/O device, a timer, a  KVM clock, a serial console, and a partial keyboard
  • minimal networking; from docs/vsock.md : “The Firecracker vsock device aims to provide full virtio-vsock support to software running inside the guest VM, while bypassing vhost kernel code on the host. To that end, Firecracker implements the virtio-vsock device model, and mediates communication between AF_UNIX sockets (on the host end) and AF_VSOCK sockets (on the guest end).”
  • static linking of the firecracker process limits dependancies
  • seccomp BPF limits the system calls to 35 allowed calls, 30 with simple filtering, 5 with advanced filtering that limits the call based on parameters (SeccompFilter::new call in vmm/src/default_syscalls/filters.rs, seccomp/src/lib.rs)

The production security setup recommends using jailer to apply isolation based on cgroups, namespaces, seccomp. These techniques are typical of container isolation and act in addition to KVM based isolation.

The Firecracker Host Security Configuration recommends a series of checks to mitigate side-channel issues for a multi-tenant system:

  • Disable Simultaneous Multithreading (SMT)
  • Check Kernel Page-Table Isolation (KPTI) support
  • Disable Kernel Same-page Merging (KSM)
  • Check for speculative branch prediction issue mitigation
  • Apply L1 Terminal Fault (L1TF) mitigation
  • Apply Speculative Store Bypass (SSBD) mitigation
  • Use memory with Rowhammer mitigation support
  • Disable swapping to disk or enable secure swap

How is the firecracker process organized ? The docs/design.md has the following descriptions:

Internal Design: Each Firecracker process encapsulates one and only one microVM. The process runs the following threads: API, VMM and vCPU(s). The API thread is responsible for Firecracker’s API server and associated control plane. It’s never in the fast path of the virtual machine. The VMM thread exposes the machine model, minimal legacy device model, microVM metadata service (MMDS) and VirtIO device emulated Net and Block devices, complete with I/O rate limiting. In addition to them, there are one or more vCPU threads (one per guest CPU core). They are created via KVM and run the `KVM_RUN` main loop. They execute synchronous I/O and memory-mapped I/O operations on devices models. 

Threat Containment: From a security perspective, all vCPU threads are considered to be running malicious code as soon as they have been started; these malicious threads need to be contained. Containment is achieved by nesting several trust zones which increment from least trusted or least safe (guest vCPU threads) to most trusted or safest (host). These trusted zones are separated by barriers that enforce aspects of Firecracker security. For example, all outbound network traffic data is copied by the Firecracker I/O thread from the emulated network interface to the backing host TAP device, and I/O rate limiting is applied at this point.

What about mechanisms to secure the code running inside firecracker ? The serverless environment, AWS Lambda, and its security best practices are a place to start. Resources on these are here, here, here and here. AWS API gateway supports input validation, as described here.  While serverless reduces the attack surface, the web threats such as OWASP still apply and must be taken into account during design and testing.

For the last question – uVMs and serverless appear to offer a promising model to build a service incrementally from small secure building blocks – and this is something to explore further.

Apache Druid – horizontally scalable time series database

Machine generated data such as clickstreams or database update streams, consists of many rows, which consist of 3 parts

  • timestamp
  • text columns or attributes, called dimensions
  • numerical values such as counts of hits, words, characters etc, called metrics

The desire to rapidly aggregate over such data with a low latency gave rise to Druid.

The data is append heavy and ingestion is a problem, as is querying the data, especially at low-latency.

Druid has two subsystems –

  • a write-optimized subsystem in the real-time nodes
  • a read-optimized subsystem in the historical nodes

The data is stored in S3 or HDFS in a column-oriented format.

A good explanatory article that goes into the Druid internal architecture – https://towardsdatascience.com/apache-druid-part-1-a-scalable-timeseries-olap-database-system-af8c18fc766d

Druid is different from Flink and Spark streaming in that it is not a streaming system. Flink can apply real-time data transformations on the data, which can then be ingested into Druid via Kafka, to power real-time dashboards.

Aviatrix Controller on AWS for secure networking

These are some notes from a talk by Aviatrix last week. Many customers get started with Aviatrix orchestration system for deploying AWS Transit Gateway (TGW) and Direct Connect. The transit gateway is the hub gateway that connects multiple VPCs with an on-premise link, possibly over Direct Connect. The Aviatrix product can then deploy and manage multiple VPCs and the communication between them, directing which VPC can talk to which other VPC. It controls the communication by simply deleting the routes.

The advanced transit controller solution is useful for multiple regions, to manage the communication between regions. Another aspect is there are high speed interconnects between the cloud providers and Aviatrix builds an overlay that bridges between public clouds. Multi-account communication and secure communication between the networks using segmentation can be enabled.

According to Aviatrix, AWS’s motto is go build, and do it yourself, it is designed for the builders. But when you go beyond 3 VPCs to 3000 VPCs, one needs a solution to manage the routes in an automated manner. This is the situation for many larger customers. For smaller ones where there are Production, Development and Edge/On-premise network components to manage it also finds use.

Remote user VPN is another use case. Not only can one VPN in and get to all the VPCs, but specify which CIDR they can get to and other restrictions.