Accuracy vs Recall vs Precision vs F1

We want to walk through some common metrics in classification problems such as accuracy, precision and recall and get a feel for when to use which metric. Say we are looking for a needle in a haystack. There are very few needles in a large haystack full of straws. An automated machine is sifting through the objects in the haystack and predicting for each object whether it is a straw or a needle. A reasonable predictor will predict a small number of objects as needles and a large number as straws.

Positive Prediction: the object at hand is predicted to be the needle. A small number.

Negative Prediction: the object at hand is predicted not to be a needle. A large number.

True_Positive: of the total number of predictions, the number of predictions that were positive and correct. Correctly predicted Positives (needles). A small number.

True_Negative: of the total number of predictions, the number of predictions that were negative and correct. Correctly predicted Negatives (straws). A large number. 

False_Positive: of the total number of predictions, the number of predictions that are positive but the prediction is incorrect. Incorrectly predicted Positives (straw predicted as needle). Could be large as the number of straws is large, but assuming the total number of predicted needles is small, this is less than or equal to predicted needles, hence small.

False_Negative: of the total number of predictions, the number of predictions that are negative but the prediction is incorrect. Incorrectly predicted Negatives (needle predicted as straw). Is this a large number ? It is unknown – this class is not large just because the class of negatives is large – it depends on the predictor and a “reasonable” predictor which predicts most objects as straws, could also predict many needles as straws. This is less than or equal to the total number of needles, hence small.

Predicted_Positives = True_Positives + False_Positives = Total number of objects predicted as needles.

Actual Positives = Actual number of needles, which is independant of the number of predictions either way, however Actual Positives = True Positives + False Negatives.

Accuracy = nCorrect _Predictons/nTotal_Predictions=(nTrue_Positives+nTrue_Negatives) / (nPredicted_Positives +nPredicted_Negatives) .   # the reasonable assumption above is equivalent to a high accuracy. Most predictions will be hay, and be correct in this simply because of the skewed distribution. This does not shed light on FP or FN.

Precision = nTrue_Positives / nPredicted_Positives    # correctly_identified_needles/predicted_needles;  this sheds light on FP; Precision = 1 => FP=0 => all predictions of needles are in fact needles; a precision less than 1 means we got a bunch of hay with the needles – gives hope that with further sifting the hay can be removed.  Precision is also called Specificity and quantifies the absence of False Positives or incorrect diagnoses.

Recall = nTrue_Positives / nActual_Positives  = TP/(TP+FN)# correctly_identified_needles/all_needles; this sheds light on FN; Recall = 1 => FN = 0; a recall less than 1 is awful as some needles are left out in the sifting process. Recall is also called Sensitivity .

Precision > Recall => FN is higher than FP

Precision < Recall => FN is lower than FP

If at least one needle is correctly identified as a needle, both precision and recall will be positive; if zero needles are correctly identified, both precision and recall are zero.

F1 Score is the harmonic mean of Precision and Recall.  1/F1 = 1/2(1/P + 1/R) . F1=2PR/(P+R) .  F1=0 if P=0 or R=0. F1=1 if P=1 and R=1.

ROC/AUC rely on Recall (=TP/TP+FN) and another metric False Positive Rate defined as FP/(FP+TN)  = hay_falsely_identified_as_needles/total_hay . As TN >> FP, this should be close to zero and does not appear to be a useful metric in the context of needles in a haystack; as are ROC/AuC . The denominators are different in Recall and FPR, total needles and total hay respectively.

There’s a bit of semantic confusion when saying True Positive or False Positive. These shorthands can be interpreted as- it was known that an instance was a Positive and a label of True or False was applied to that instance. But what we mean is that it was not known whether the instance was a Positive, and that a determination was made that it was a Positive and this determination was later found to be correct (True) or incorrect (False). Mentally replace True/False with ‘Correct/Incorrectly identified as’ to remove this confusion.

Normalization: scale of 0-1, or unit norm; useful for dot products when calculating similarity.

Standardization: zero mean, divided by standard deviation; useful in neural network/classifier inputs

Regularization: used to reduce sensitivity to certain features. Uses regression. L1: Lasso regression L2: Ridge regression

Confusion matrix: holds number of predicted values vs known truth. Square matrix with size n equal to number of categories.

Firecracker MicroVM Security

I wanted to get a better understanding of firecracker microVM security, from the bottom up. A few questions –

a) how does firecracker design achieve a smaller threat surface than a typical vm/container ?

b) what mechanisms are available to secure code running in a microvm ?

c) and lastly, how can microvms change security considerations when deploying code for web services ?

The following design elements contribute to a smaller threat surface:

  • minimal design, in a memory safe, compact, readable rust language
  • minimal guest virtual device model: a network device, a block I/O device, a timer, a  KVM clock, a serial console, and a partial keyboard
  • minimal networking; from docs/ : “The Firecracker vsock device aims to provide full virtio-vsock support to software running inside the guest VM, while bypassing vhost kernel code on the host. To that end, Firecracker implements the virtio-vsock device model, and mediates communication between AF_UNIX sockets (on the host end) and AF_VSOCK sockets (on the guest end).”
  • static linking of the firecracker process limits dependancies
  • seccomp BPF limits the system calls to 35 allowed calls, 30 with simple filtering, 5 with advanced filtering that limits the call based on parameters (SeccompFilter::new call in vmm/src/default_syscalls/, seccomp/src/

The production security setup recommends using jailer to apply isolation based on cgroups, namespaces, seccomp. These techniques are typical of container isolation and act in addition to KVM based isolation.

The Host Security Configuration recommends a series of checks to mitigate side-channel issues for a multi-tenant system:

  • Disable Simultaneous Multithreading (SMT)
  • Check Kernel Page-Table Isolation (KPTI) support
  • Disable Kernel Same-page Merging (KSM)
  • Check for speculative branch prediction issue mitigation
  • Apply L1 Terminal Fault (L1TF) mitigation
  • Apply Speculative Store Bypass (SSBD) mitigation
  • Use memory with Rowhammer mitigation support
  • Disable swapping to disk or enable secure swap

How is the firecracker process organized ? The docs/ has the following descriptions:

Internal Design: Each Firecracker process encapsulates one and only one microVM. The process runs the following threads: API, VMM and vCPU(s). The API thread is responsible for Firecracker’s API server and associated control plane. It’s never in the fast path of the virtual machine. The VMM thread exposes the machine model, minimal legacy device model, microVM metadata service (MMDS) and VirtIO device emulated Net and Block devices, complete with I/O rate limiting. In addition to them, there are one or more vCPU threads (one per guest CPU core). They are created via KVM and run the `KVM_RUN` main loop. They execute synchronous I/O and memory-mapped I/O operations on devices models. 

Threat Containment: From a security perspective, all vCPU threads are considered to be running malicious code as soon as they have been started; these malicious threads need to be contained. Containment is achieved by nesting several trust zones which increment from least trusted or least safe (guest vCPU threads) to most trusted or safest (host). These trusted zones are separated by barriers that enforce aspects of Firecracker security. For example, all outbound network traffic data is copied by the Firecracker I/O thread from the emulated network interface to the backing host TAP device, and I/O rate limiting is applied at this point.

As to question about mechanisms to secure code running in firecracker, the serverless environment, AWS Lambda, and its security best practices are a good place to start . A few resources on this are here, here, here and here. AWS API gateway supports input validation, as described here. However, while serverless reduces the attack surface, the web threats such as OWASP still apply and must be taken into account during design and testing.

For the last question – uVMs and serverless appear to offer a promising model to build a service incrementally from small secure building blocks – and this is something to explore further.

Aviatrix Controller on AWS for secure networking

These are some notes from a talk by Aviatrix last week. Many customers get started with Aviatrix orchestration system for deploying AWS Transit Gateway (TGW) and Direct Connect. The transit gateway is the hub gateway that connects multiple VPCs with an on-premise link, possibly over Direct Connect. The Aviatrix product can then deploy and manage multiple VPCs and the communication between them, directing which VPC can talk to which other VPC. It controls the communication by simply deleting the routes.

The advanced transit controller solution is useful for multiple regions, to manage the communication between regions. Another aspect is there are high speed interconnects between the cloud providers and Aviatrix builds an overlay that bridges between public clouds. Multi-account communication and secure communication between the networks using segmentation can be enabled.

According to Aviatrix, AWS’s motto is go build, and do it yourself, it is designed for the builders. But when you go beyond 3 VPCs to 3000 VPCs, one needs a solution to manage the routes in an automated manner. This is the situation for many larger customers. For smaller ones where there are Production, Development and Edge/On-premise network components to manage it also finds use.

Remote user VPN is another use case. Not only can one VPN in and get to all the VPCs, but specify which CIDR they can get to and other restrictions.

Omnisci GPU based columnar database

Omnisci is a columnar database that reads a column into GPU memory, in compressed form, allowing for interactive queries on the data. A single gpu can load 10million to 50million rows of data and allows interactive querying without indexing. A demo was shown at the GTC keynote this year, by Aaron Williams. He gave a talk on vehicle analytics that I attended last month.

In the vehicle telemetry demo, they obtain vehicle telemetry data from an F1 game that has data output as UDP, 10s of thousands of packets a second – take the binary data off of UDP, and convert it to json and use it as a proxy for real telemetry data. The webserver refreshes every 3-4 seconds. The use case is analysis of increasing amounts of vehicle sensor data as discussed in this video and described in the detailed Omnisci blog post here.

Programming of the blocks is done through the UX with StreamSets which is an open source tool to build data pipelines –  a cool demo of streamsets for creating pipelines including kafka is here.

The vehicle analytics demo pipeline consisted of UDP to Kafka, Kafka to JSON, then JSON to OmniSci via pymapd .  Kafka serves as a message broker and also for playback of data.

Based on the the GPU loaded data, the database allows queries and stats on different vehicles that are running.

The entire system runs in the cloud on a VM supporting Nvidia GPUs, and can also be run on a local GPU box.

Lacework Intrusion Detection System – Cloud IDS

Lacework Polygraph is a Host based IDS for cloud workloads. It provides a graphical view of who did what on which system, reducing the time for root cause analysis for anomalies in system behaviors. It can analyze workloads on AWS, Azure and GCP.

It installs a lightweight agent on each target system which aggregates information from processes running on the system into a centralized customer specific (MT) data warehouse (Snowflake on AWS)  and then analyzes the information using machine learning to generate behavioral profiles and then looks for anomalies from the baseline profile. The design allows automating analysis of common attack scenarios using ssh, privilege changes, unauthorized access to files.

The host based model gives detailed process information such as which process talked to which other and over what api. This info is not available to a network IDS. The behavior profiles reduce the false positive rates. The graphical view is useful to drill down into incidents.

It does not have an IPS functionality – as false positives with an IPS could negatively affect the system.

Cloud based network isolation tools like Aviatrix might make IPS scenarios feasible by limiting the effect of an IPS.

Software Integrity Tools

There are a number of tools used to detect security issues in a software application codebase. A simple and free one is flawfinder. A sophisticated commercial one is Veracode.  There’s also lint, pylint, findbugs for java, and xcode clang static analyzer.

Synopsis has bought a few tools like Coverity and Blackduck for various static checks on code and binary. Blackduck can do binary analysis and scores issues with the CVSS. A common use of Blackduck is for license checking to check for conformance to open source licenses.

A more comprehensive list of static code analysis tools is here.

Dynamic analysis tools inspect the running process and find memory and execution errors. Well known examples are valgrind and Purify. More dynamic tools are listed here.

For web application security there are protocol testing and fuzzing tools like Burp suite and Tenable Nessus.

A common issue with the tools is the issue of false positives. It helps to limit the testing to certain defect types or attack scenarios and identify the most critical issues, then expand the scope of types of defects.

Code obfuscation and anti-tamper are another line of tools, for example by Arxan, Klocwork, Irdeto and Proguard .

A great talk on Adventures in fuzzing. My takeaway has been that better ways of developing secure software are really important.



Open Compute Project 2019

OCP has the mission to “design and enable the delivery of the most efficient server, storage and data center hardware designs for scalable computing”.

OCP had its global 2019 summit recently. Some interesting trends on hyperscale networks are discussed here and here with the use of F16 fabric network with its a focus on higher bandwidth but also performance at the right cost instead of at any cost. The heart of this new F16 fabric is the Minipack switch, with contribution from Arista which Facebook says will consume 50 percent less power and space than the Backpack switch it replaces in the network.  It is a 128x100Gb switch and uses a Broadcom Tomahawk-3 Asic. Quote: “a path from a rack in one building to a rack in another building over Fabric Aggregator was as many as 24 hops long before. With F16, same-fabric network paths are always the best case of six hops, and building-to-building flows always take eight hops. This results in half the number of intrafabric network hops and one-third the number of interfabric network hops between servers.”

Intel announced an industry collaboration around Platform Root of Trust at the Open Compute Project 2019 summit.

There’s a talk on Stratum and the use of P4 and Switch Abstraction Interface (SAI) for SDN, by Open Networking Foundation (ONF) and Google. Tencent has a use case for disaggregating their monolithic network into a modular switch with a network of controllers instead of a single controller.

Smaller data centers at the edge is another trend.

Safety Concepts

I have kept coming across functional safety discussions, most recently at ArmTechCon, and wanted to capture some of the terminology and concepts. To orient the discussion, think of airbags, seat-belts, and tire-pressure-monitoring-systems as safety features in a car.

Safety Function or Safety Instrumented Function: A function to take a system to a safe outcome when certain prerequisites on system inputs are not met. E.g. turn on a warning indicator when seat-belt is not used, or tire-pressure is below safe level, or deploy airbags when a collision is detected.

Safety Related Control Function: This is the control mechanism by which the safety function is achieved.

Safety Integrity Level: The reliability of a safety-related-control-function is captured with a Safety Integrity Level or SIL.

A standard is ISO26262. The V shaped functional safety process diagram is here. This process is used to achieve a Safety Integrity Level (SIL) where the SIL1, SIL2, SIL3, SIL4 reduce risk by a progressive factor of 10, i.e. by 10x, 100x, 1000x and 10000x. A HAZOP study is undertaken to understand the risks of the mechanism behaving incorrectly.

A good reference, from SIMATIC is here. Software aspects of safety function are discussed in in this whitepaper.

Safety systems for robotics are discussed here – it has a table of typical safety issues when a person enters a robot safeguarded area. Industrial robots security was briefly discussed here.

Another concept is SOTIF or Safety of the Intended Function, which comes up in functional safety discussions of AI-controlled vehicles. More links on it here.

Nvidia safe driving report here.

SIEM analytics growth

“A fortune 500 enterprise’s infrastructure can easily generate 10 terabytes of plain-text data per month. So how can enterprises effectively log, monitor, and correlate that data to obtain actionable insight? Enter the Security Information and Event Management (SIEM) solution”  – quote from Jeff Edwards, in a Solutions Review’s 2016 SIEM buyer’s guide covering AccelOps, Alert Logic, Alien Vault, Assuria, BlackStratus, CorreLog, EiQ Networks, EMC (RSA), Event Tracker, HP, IBM QRadar, Intel Security, Logentries, LogPoint, LogRhythm, Manage Engine, NetGuardians, NetIQj, Silver Sky, SolarWinds, Splunk, Sumo Logics, Tenable, and Trustwave .

SIEM and related acronyms –

SIEM – Security Information and Event Management, consists of SIM and SEM.

SIM – Security information management (SIM) is also referred to as log management, log storage, analysis and reporting.

SEM – Real-time monitoring, correlation of events, notifications and console views

Practical application of SIEM – Automating threat identification: SANS publication.

UEBA –  User and Entity Behavior Analytics. This is growing in importance, for example Exabeam focusses on behavioral analytics.

IDS – Intrusion Detection System. Detects and notifies about an intrusion.

IPS – Intrusion Prevention System. Such a device may shut off traffic based on an attack detection.

WAF – web application firewall.

Threat Modelling

Threat modelling is a set of techniques to identify the level of risk to assets from their interactions with their operating environment. Some threat modelling methodologies and tools are linked below for reference:

PASTA – Process for Attack Simulation and Threat Analysis. The link has details of an online banking use case.

DREAD – Damage [potential], Reproducibility, Exploitability, Affected users, Discoverability

STRIDE Spoofing of user identity, Tampering, Repudiation, Information disclosure (privacy breach or data leak), Denial of service , Elevation of privilege

Attack trees – similar to fault trees, it show the relatedness of cause/effect; an good example for a SCADA system is here.

VPNFilter IoT Router Malware

Over 500k routers and gateways are estimated to be infected with malware dubbed VPNFilter, first reported in .

It has 3 stages. In stage 1 it adds itself to crontab to remain after a reboot. In stage 2 it adds a plugin architecture. In stage 3 it adds modules which instruct it to do specific things.  A factory reset and router restart in protected network was recommended to remove it. Disabling remote administration and changing passwords is recommended to prevent reinfection.

The 3rd stage module modifies IPtables rules, enabling mitm attacks and javascript injection.

The first action taken by the ssler module is to configure the device’s iptables to redirect all traffic destined for port 80 to its local service listening on port 8888. It starts by using the insmod command to insert three iptables modules into the kernel (ip_tables.ko, iptable_filter.ko, iptable_nat.ko) and then executes the following shell commands:

  • iptables -I INPUT -p tcp –dport 8888 -j ACCEPT
  • iptables -t nat -I PREROUTING -p tcp –dport 80 -j REDIRECT –to-port 8888
  • Example: ./ssler logs src: dst:

-A PREROUTING -s -d -p tcp -m tcp –dport 80 -j REDIRECT –to-ports 8888

To ensure that these rules do not get removed, ssler deletes them and then adds them back approximately every four minutes.

More behaviors of the malware are described at including photobucket request, fake CA certs claiming Microsoft issued them and ipify lookups.

YARA rules for detection –

YARA (yet another recursive acronym) is a format to specify rules match malware based on string patterns, regular expressions and their frequency of occurrence. A guide to writing effective ones is here.

User-Agent rule –

Ipify self-ip address querying service, with json output.


Zero Trust Networks

Instead of  the “inside” and “outside” notion of traditional firewalls and perimeter defense technologies, the Zero Trust Network notion has its origin in the Cloud+Mobile first world where a person carrying a mobile device can be anywhere in the world (inside/outside the enterprise) and needs to be seamlessly and securely connected to online services.

The essential idea appears to be device authentication coupled with a second factor in the shape of an easy to remember password, with backend security smarts to identify the accessing device. More importantly, every service that is access externally needs to be authenticated, instead of some services being treated as internal services and being less protected.

Some properties of zero trust networks:

  • Network locality based access control is insufficient
  • Every device, user and service is authenticated
  • Policies are dynamic – they gather and utilize data inputs for making access control decisions
  • Attacks from trusted insiders are mitigated against

This is a big change from many networks which have network based defense at the core (for good reason, as it was cost effective). To create a zero-trust network, a startin point is to identify, enumerate and sequence all network flows.

I attended a talk by Centrify on this topic, which resonated with experiences in cloud, mobile and fog systems.

Related effort in Kubernetes – Progress Toward Zero Trust Kubernetes Networks, Istio Service Mesh , API Gateway to Service Mesh.  One can contrast the API gateway as being present only at the ingress point of a cloud, whereas with a Zero-trust/Service-mesh/Sidecar approach every microservice building-block has its own external proxy and ‘API’ for management added to it. The latter would add to latency concerns for real-time applications, as the new sidecar proxies are in the data path. One benefit of the service mesh is a mechanism to put in service to service security in a uniform manner.

The key original motivation behind Istio, in the second presentation by Lyft above, was greater observability and reliability across a complex cluster of microservices. This strikes me as a greater motivating use-case of this technology, than added security.  From the security point of view, there is a parallel of the Istio approach with the SDN problem statement of a horizontal and ubiquitous security layer.   Greater visibility is also a motivation behind the P4 programming language presented in disaggregated storage talk on protocol independant switch architecture or PISA here – one of the things it enables is inband telemetry.