Author: Ruchir Tewari

Lacework Intrusion Detection System – Cloud IDS

Lacework Polygraph is a Host based IDS for cloud workloads. It provides a graphical view of who did what on which system, reducing the time for root cause analysis for anomalies in system behaviors. It can analyze workloads on AWS, Azure and GCP.

It installs a lightweight agent on each target system which aggregates information from processes running on the system into a centralized customer specific (MT) data warehouse (Snowflake on AWS)  and then analyzes the information using machine learning to generate behavioral profiles and then looks for anomalies from the baseline profile. The design allows automating analysis of common attack scenarios using ssh, privilege changes, unauthorized access to files.

The host based model gives detailed process information such as which process talked to which other and over what api. This info is not available to a network IDS. The behavior profiles reduce the false positive rates. The graphical view is useful to drill down into incidents.

OSQuery is a tool for gathering data from hosts, and this is a source of data aggregated for threat detection. https://www.rapid7.com/blog/post/2016/05/09/introduction-to-osquery-for-threat-detection-dfir/

Here’s an agent for libpcap https://github.com/lacework/pcap

It does not have an intrusion prevention (IPS) functionality. False positives on an IPS could block network/host access and negatively affect the system being protected, so it’s a harder problem.

Cloud based network isolation tools like Aviatrix might make IPS scenarios feasible by limiting the effect of an IPS.

Software Integrity Tools

There are a number of tools used to detect security issues in a software application codebase. A simple and free one is flawfinder. A sophisticated commercial one is Veracode.  There’s also lint, pylint, findbugs for java, and xcode clang static analyzer.

Synopsis has bought a few tools like Coverity and Blackduck for various static checks on code and binary. Blackduck can do binary analysis and scores issues with the CVSS. A common use of Blackduck is for license checking to check for conformance to open source licenses.

A more comprehensive list of static code analysis tools is here.

Dynamic analysis tools inspect the running process and find memory and execution errors. Well known examples are valgrind and Purify. More dynamic tools are listed here.

For web application security there are protocol testing and fuzzing tools like Burp suite and Tenable Nessus.

A common issue with the tools is the issue of false positives. It helps to limit the testing to certain defect types or attack scenarios and identify the most critical issues, then expand the scope of types of defects.

Code obfuscation and anti-tamper are another line of tools, for example by Arxan, Klocwork, Irdeto and Proguard .

A great talk on Adventures in fuzzing. My takeaway has been that better ways of developing secure software are really important.

 

 

Open Compute Project 2019

OCP has the mission to “design and enable the delivery of the most efficient server, storage and data center hardware designs for scalable computing”.

OCP had its global 2019 summit recently. Some interesting trends on hyperscale networks are discussed here and here with the use of F16 fabric network with its a focus on higher bandwidth but also performance at the right cost instead of at any cost. The heart of this new F16 fabric is the Minipack switch, with contribution from Arista which Facebook says will consume 50 percent less power and space than the Backpack switch it replaces in the network.  It is a 128x100Gb switch and uses a Broadcom Tomahawk-3 Asic. Quote: “a path from a rack in one building to a rack in another building over Fabric Aggregator was as many as 24 hops long before. With F16, same-fabric network paths are always the best case of six hops, and building-to-building flows always take eight hops. This results in half the number of intrafabric network hops and one-third the number of interfabric network hops between servers.”

Intel announced an industry collaboration around Platform Root of Trust at the Open Compute Project 2019 summit.

There’s a talk on Stratum and the use of P4 and Switch Abstraction Interface (SAI) for SDN, by Open Networking Foundation (ONF) and Google. Tencent has a use case for disaggregating their monolithic network into a modular switch with a network of controllers instead of a single controller.

Smaller data centers at the edge is another trend.

Facebook storage stack and its evolution- https://thenewstack.io/facebook-storage/, mentions OCP and the disaggregated server model which separates server components across different racks.

More on cold storage in a Facebook data center – https://www.datacenterknowledge.com/archives/2013/01/18/facebook-builds-new-data-centers-for-cold-storage

SGX vs TPM vs SE – security in silicon

Hardware root of trust, Hardware security primitives

PUF https://en.m.wikipedia.org/wiki/Physical_unclonable_function

Specialized crypto ops vs small open/generic TCB.

VHDL libraries

Chain of trust, up to firmware and software layer. Level of integration vs modularity.

TouchID with SE . Samsung. Yubikey, Gemalto,

Electronic logging devices. ELD tamper proofing

Secure provisioning. Intel EPID

Rust, memory safey, cargo, toml, policy engine, fortanix, azure IoT

Key management with enclaves

Trusted VMs and cloud security

Platform security 

Isolated memory on shared infra

Git error: failed to push some refs to remote

This error often occurs after another checkin has gone in before yours, and says “the tip of your current branch is behind its remote counterpart”.

It should be resolved by

a) ‘git pull –rebase’ // this may bring in conflicts that need to be resolved

followed by

b) ‘git push’ // this works the first time

After the two steps your changes are available to team members to code review and you may need to edit your changes. After making such changes, you’d need to do

c) ‘git push -f’

to force the push.

However say this codereview-edit cycle takes some time and other changes are approved in the mean time – then you have to repeat these steps.

It can be simpler to get a fresh copy of the top of tree and add in your changes directly there and submit.

Safety Concepts – Functional safety, SOTIF

I want to capture Functional Safety concepts. This is an important topic given the number of vehicles and their increasing automation. It was a major topic at the recent ArmTechCon.

Airbags, seat-belts, ABS, and tire-pressure-monitoring-systems are some of the safety features in a car.

Safety Function or Safety Instrumented Function (SIF): A function to take a system to a safe outcome when certain prerequisites on system inputs are not met. For example, turn on a warning indicator when seat-belt is not used or when the tire-pressure is below safe level, or deploy airbags when a collision is detected.

Safety Related Control Function (SRCF): This is the control mechanism by which the safety function is achieved. For example, collision above a certain impact threshold leads to airbag deployment by an airbag control module. Quote: “The airbag control module is installed inside the center console and contains a safety sensor, G sensor, ignition judgment circuit, and a backup power supply.”

Safety Integrity Level (SIL): The reliability of a safety-related-control-function is captured with a Safety Integrity Level or SIL. SIL level go from 1 to 4 (highest). SILs are derived from a risk estimation process and are used in estimating risk of a system built using pre-built components.

A safety standard is ISO26262. The V shaped functional safety process diagram describes the design steps flowing from OEM to supplier and verification steps flowing back from supplier to OEM (here). This process is used to achieve a Safety Integrity Level (SIL) where the SIL1, SIL2, SIL3, SIL4 reduce risk by a progressive factor of 10, i.e. by 10x, 100x, 1000x and 10000x. A hazard and operability study ( HAZOP ) is undertaken to understand the risks of the mechanism behaving incorrectly.

A good reference, from SIMATIC is here. Software aspects of safety function are discussed in in this whitepaper.

Safety systems for robotics are discussed here – this has a table of typical safety issues when a person enters a robot safeguarded area. Industrial robots security was briefly discussed in this blog here.

Another concept is SOTIF or Safety of the Intended Function, which comes up in functional safety discussions of AI-controlled vehicles. More links on it here.

An Nvidia safe driving report is here.

SIEM and UEBA analytics

“A fortune 500 enterprise’s infrastructure can easily generate 10 terabytes of plain-text data per month. So how can enterprises effectively log, monitor, and correlate that data to obtain actionable insight? Enter the Security Information and Event Management (SIEM) solution”  – quote from Jeff Edwards, in a Solutions Review’s 2016 SIEM buyer’s guide covering AccelOps, Alert Logic, Alien Vault, Assuria, BlackStratus, CorreLog, EiQ Networks, EMC (RSA), Event Tracker, HP, IBM QRadar, Intel Security, Logentries, LogPoint, LogRhythm, Manage Engine, NetGuardians, NetIQj, Silver Sky, SolarWinds, Splunk, Sumo Logics, Tenable, and Trustwave .

SIEM and related acronyms –

SIEM – Security Information and Event Management, consists of SIM and SEM.

SIM – Security information management (SIM) is also referred to as log management, log storage, analysis and reporting.

SEM – Real-time monitoring, correlation of events, notifications and console views

Practical application of SIEM – Automating threat identification: SANS publication.

UBA – User Behavior Analytics

UEBA –  User and Entity Behavior Analytics. This is growing in importance, for example Exabeam focusses on behavioral analytics. The key idea in UEBA is it extends analytics to cover non-human processes and machines entities.

IDS – Intrusion Detection System. Detects and notifies about an intrusion.

IPS – Intrusion Prevention System. Such a device may shut off traffic based on an attack detection.

WAF – web application firewall.

Splunk https://www.splunk.com/pdfs/technical-briefs/splunk-for-advanced-analytics-and-threat-detection-tech-brief.pdf

https://www.splunk.com/en_us/data-insider/user-behavior-analytics-ueba.html

https://www.slideshare.net/databricks/deep-learning-in-securityan-empirical-example-in-user-and-entity-behavior-analytics-with-dr-jisheng-wang

https://github.com/topics/ueba

https://siliconangle.com/2017/07/25/detecting-malicious-insiders-behavioral-analytics-sparksummit/

https://www.analyticsvidhya.com/blog/2018/02/demystifying-security-data-science/ (good overview of evolution of solutions)

https://github.com/jivoi/awesome-ml-for-cybersecurity

Threat Modelling

Threat modelling is a set of techniques to identify the level of risk to assets from their interactions with their operating environment. Some threat modelling methodologies and tools are linked below for reference:

PASTA – Process for Attack Simulation and Threat Analysis. The link has details of an online banking use case.

DREAD – Damage [potential], Reproducibility, Exploitability, Affected users, Discoverability

STRIDE Spoofing of user identity, Tampering, Repudiation, Information disclosure (privacy breach or data leak), Denial of service , Elevation of privilege

Attack trees – similar to fault trees, it show the relatedness of cause/effect; an good example for a SCADA system is here.

VPNFilter IoT Router Malware

Over 500k routers and gateways are estimated to be infected with malware dubbed VPNFilter, first reported in https://blog.talosintelligence.com/2018/05/VPNFilter.html .

It has 3 stages. In stage 1 it adds itself to crontab to remain after a reboot. In stage 2 it adds a plugin architecture. In stage 3 it adds modules which instruct it to do specific things.  A factory reset and router restart in protected network was recommended to remove it. Disabling remote administration and changing passwords is recommended to prevent reinfection.

The 3rd stage module modifies IPtables rules, enabling mitm attacks and javascript injection.

The first action taken by the ssler module is to configure the device’s iptables to redirect all traffic destined for port 80 to its local service listening on port 8888. It starts by using the insmod command to insert three iptables modules into the kernel (ip_tables.ko, iptable_filter.ko, iptable_nat.ko) and then executes the following shell commands:

  • iptables -I INPUT -p tcp –dport 8888 -j ACCEPT
  • iptables -t nat -I PREROUTING -p tcp –dport 80 -j REDIRECT –to-port 8888
  • Example: ./ssler logs src:192.168.201.0/24 dst:10.0.0.0/16

-A PREROUTING -s 192.168.201.0/24 -d 10.0.0.0/16 -p tcp -m tcp –dport 80 -j REDIRECT –to-ports 8888

To ensure that these rules do not get removed, ssler deletes them and then adds them back approximately every four minutes.

More behaviors of the malware are described at https://news.sophos.com/en-us/2018/05/27/vpnfilter-botnet-a-sophoslabs-analysis-part-2/ including photobucket request, fake CA certs claiming Microsoft issued them and ipify lookups.

YARA rules for detection –

https://github.com/Neo23x0/signature-base/blob/master/yara/apt_vpnfilter.yar

https://github.com/Neo23x0/sigma/blob/master/rules/proxy/proxy_ua_apt.yml#L33

YARA (yet another recursive acronym) is a format to specify rules match malware based on string patterns, regular expressions and their frequency of occurrence. A guide to writing effective ones is here.

User-Agent rule –

alert http any any -> any any (msg:"VPNFilter malware User-Agent"; content:"Mozilla/6.1 (compatible|3B| MSIE 9.0|3B| Windows NT 5.3|3B| Trident/5.0)"; http_user_agent; sid:2; rev:1;)

view raw
vpnfilter-ua.rule
hosted with ❤ by GitHub

Ipify self-ip address querying service, with json output. http://api.ipify.org/

 

Zero Trust Networks

Instead of  the “inside” and “outside” notion of traditional firewalls and perimeter defense technologies, the Zero Trust Network notion has its origin in the Cloud+Mobile first world where a person carrying a mobile device can be anywhere in the world (inside/outside the enterprise) and needs to be seamlessly and securely connected to online services.

The essential idea appears to be device authentication coupled with a second factor in the shape of an easy to remember password, with backend security smarts to identify the accessing device. More importantly, every service that is access externally needs to be authenticated, instead of some services being treated as internal services and being less protected.

Some properties of zero trust networks:

  • Network locality based access control is insufficient
  • Every device, user and service is authenticated
  • Policies are dynamic – they gather and utilize data inputs for making access control decisions
  • Attacks from trusted insiders are mitigated against

This is a big change from many networks which have network based defense at the core (for good reason, as it was cost effective). To create a zero-trust network, a startin point is to identify, enumerate and sequence all network flows.

I attended a talk by Centrify on this topic, which resonated with experiences in cloud, mobile and fog systems.

Related effort in Kubernetes – Progress Toward Zero Trust Kubernetes Networks, Istio Service Mesh , API Gateway to Service Mesh.  One can contrast the API gateway as being present only at the ingress point of a cloud, whereas with a Zero-trust/Service-mesh/Sidecar approach every microservice building-block has its own external proxy and ‘API’ for management added to it. The latter would add to latency concerns for real-time applications, as the new sidecar proxies are in the data path. One benefit of the service mesh is a mechanism to put in service to service security in a uniform manner.

The key original motivation behind Istio, in the second presentation by Lyft above, was greater observability and reliability across a complex cluster of microservices. This strikes me as a greater motivating use-case of this technology, than added security.  From the security point of view, there is a parallel of the Istio approach with the SDN problem statement of a horizontal and ubiquitous security layer.   Greater visibility is also a motivation behind the P4 programming language presented in disaggregated storage talk on protocol independant switch architecture or PISA here – one of the things it enables is inband telemetry.

SCRAM: Salted Challenge Response Authentication Mechanism

SCRAM is an interesting proposal (RFC-5802) that aims to remove passwords being commonly sent across the wire. It does not appear to create additional requirements for certificates or shared secrets, so let’s see how it works.

The server is required to know the username in advance, but not the password, instead a hash of the password and a (per-user) salt and an iteration count which is used to create a challenge.

The client sends the username and a nonce. The server retrieves the salt and updates the iteration count and sends these back to the client as a challenge. The client hashes the password with the agreed upon hash function, and uses the salt and the iteration count in the calculation, and send it back to the server. The server is able to validate correctness of the hashed password with the information it has.  The server then sends back a hash which the client can check to validate the server.

There are several issues with it – the initial registration flow is left out, the requirements of the client and server to issue good nonces and maintain unique salts and iterations are high, and also the requirement for the server database itself to be secure – an exfiltration could enable brute force attacks.  Then it uses SHA-1 which is weak. The password is fixed and an update method would need to be designed for a full system.

Still it is interesting as a way to remove passwords being sent over the wire.

The protocol is used in XMPP as a standard mechanism for authentication.

 

Traffic limits with HAProxy stick-tables to protect against DDoS attacks

Websites need a traffic rate limiting feature to keep their backend safe from abusive or malfunctioning clients. This requires the ability to track user sessions of a particular type and/or from a given IP address. HAProxy is a HTTP proxy which can be configured as a reverse proxy to protect a website. It uses an in-memory table to store state related to incoming HTTP connections, indexed by a key such as the client IP address. This table is called a stick-table and is enabled using the ‘stick-table’ directive in the haproxy config file.

The stick-table directive allows specifying the key, the size of the table, the duration an entry (key) is kept in seconds and various counts such as currently active connections, connection rate, http request rate, http error rate etc.

Stick tables are very useful for rate-limiting traffic and tagging traffic that meets certain criteria such as a high connection or error rate with a header which can be used by the backend to log the traffic. They can be used to provide detection of and protection against DDoS attacks.

HAProxy receives client requests in its frontend and sends those requests to servers in its backend.  The config file has corresponding frontend and backend sections.

The origin of this rate-limiting feature request along with an example is at https://blog.serverfault.com/2010/08/26/1016491873/ . Serverfault is a high traffic website so it is a good indication if the feature works for them.

Here’s an example.

frontend http
    bind *:2550

stick-table type ip size 200k expire 10m store gpc0

# check the source before tracking counters, that will allow it to
# expire the entry even if there is still activity.
acl whitelist src 192.168.1.154
acl source_is_abuser src_get_gpc0(http) gt 0
use_backend ease-up-y0 if source_is_abuser
tcp-request connection track-sc1 src if ! source_is_abuser

acl is_test1 hdr_sub(host) -i test1.com
acl is_test2 hdr_sub(host) -i test2.com

use_backend test1  if is_test1
use_backend test2  if is_test2

backend test1 
stick-table type ip size 200k expire 30s store conn_rate(100s),bytes_out_rate(60s) 
acl whitelist src 192.168.1.154

# values below are specific to the backend
tcp-request content  track-sc2 src
acl conn_rate_abuse  sc2_conn_rate gt 3
acl data_rate_abuse  sc2_bytes_out_rate  gt 20000000

# abuse is marked in the frontend so that it's shared between all sites
acl mark_as_abuser   sc1_inc_gpc0 gt 0
tcp-request content  reject if conn_rate_abuse !whitelist mark_as_abuser
tcp-request content  reject if data_rate_abuse mark_as_abuser

server local_apache localhost:80

Note that the frontend and backend sections have their own stick-table sections.

A general strategy would be to allow enough buffer for legitimate traffic to pass in, drop abnormally high traffic and flag intermediate risk traffic to the backend so it can either drop it or log the request for appropriate action, including potentially adding the IP to an abusers list for correlation, reverse lookup and other analysis. These objectives are achievable with stick-tables.

An overview of the HAProxy config file with the sections global, defaults, frontend, backend is here.

Stick tables use elastic binary trees-

https://github.com/haproxy/haproxy/blob/master/include/types/stick_table.h

https://github.com/haproxy/haproxy/blob/master/src/stick_table.c

https://wtarreau.blogspot.com/2011/12/elastic-binary-trees-ebtree.html

Related, for analysis of packet captures in DDoS context, a useful tool is python dpkt – https://mmishou.wordpress.com/2010/04/13/passive-dns-mining-from-pcap-with-dpkt-python .

 

 

Hatman, Triton ICS Malware Analysis

A Triconex Industrial controller allows triple modular redundancy and 2/3 consensus vote based control.  The design has its origins in the 80’s industrial needs for safety for industrial controllers. The product was acquired by Schneider via Invensys in 2014. The Hatman/Triton malware framework targeting this specific controller came to light, late 2017. The Triconex is programmed with a TriStation, a Windows application which integrates with Windows directory and allows programming in FBD, LD, ST, CEM.

From the SchneiderElectric, Accenture and Mandiant analyses of the malware, more technical details appeared recently. A previous paper appeared in IEEE, Jan 2017. A brief summary is below.

Access to the controller network is necessary. The Triconex controller needs to be in Program mode. A malware program agent, TriLogger, running on Windows in the same network talks over a Tricon protocol to program the Triconex controller to install/deploy the control payload program. The malware payload program then runs like a regular program on the controller, on every scan cycle –  running in parallel in three versions.

Once on the controller, the malware looks for a way to elevate its privilege level. It starts observing the runtime, including memory inspections. There is a memory backdoor attempted, but there is a probable error handling mistake which prevents this. Now to be able to access the firmware, it takes advantage of a zero-day vulnerability in the firmware.  It is able to install itself in the firmware, overwriting a network function call. In the end it installs a Remote Access Terminal to allow remote access of the controller. This could have been a vector to download further payloads, but no evidence was found that this RAT was actually used. It attempts to remove traces of itself after installation.

Source code of the program is at  https://github.com/ICSrepo/TRISIS-TRITON-HATMAN .

Zero day attacks are a continuing challenge as by definition they are not widely known before they are used for an attack. However a secure by design approach reduces the attack surface for exploits. There were opportunities to detect the malware on the network and the windows host.

Update: A cert advisory for Triton appears in https://ics-cert.us-cert.gov/advisories/ICSA-18-107-02 and “Targeted Cyber Intrusion Detection and Mitigation Strategies” in https://ics-cert.us-cert.gov/tips/ICS-TIP-12-146-01B