Safety Concepts

I have kept coming across functional safety discussions, most recently at ArmTechCon, and wanted to capture some of the terminology and concepts. To orient the discussion, think of airbags, seat-belts, and tire-pressure-monitoring-systems as safety features in a car.

Safety Function or Safety Instrumented Function: A function to take a system to a safe outcome when certain prerequisites on system inputs are not met. E.g. turn on a warning indicator when seat-belt is not used, or tire-pressure is below safe level, or deploy airbags when a collision is detected.

Safety Related Control Function: This is the control mechanism by which the safety function is achieved.

Safety Integrity Level: The reliability of a safety-related-control-function is captured with a Safety Integrity Level or SIL.

A standard is ISO26262. The V shaped functional safety process diagram is here. This process is used to achieve a Safety Integrity Level (SIL) where the SIL1, SIL2, SIL3, SIL4 reduce risk by a progressive factor of 10, i.e. by 10x, 100x, 1000x and 10000x. A HAZOP study is undertaken to understand the risks of the mechanism behaving incorrectly.

A good reference, from SIMATIC is here. Software aspects of safety function are discussed in in this whitepaper.

Safety systems for robotics are discussed here – it has a table of typical safety issues when a person enters a robot safeguarded area. Industrial robots security was briefly discussed here.

Another concept is SOTIF or Safety of the Intended Function, which comes up in functional safety discussions of AI-controlled vehicles. More links on it here.

Nvidia safe driving report here.

VPNFilter IoT Router Malware

Over 500k routers and gateways are estimated to be infected with malware dubbed VPNFilter, first reported in https://blog.talosintelligence.com/2018/05/VPNFilter.html .

It has 3 stages. In stage 1 it adds itself to crontab to remain after a reboot. In stage 2 it adds a plugin architecture. In stage 3 it adds modules which instruct it to do specific things.  A factory reset and router restart in protected network was recommended to remove it. Disabling remote administration and changing passwords is recommended to prevent reinfection.

The 3rd stage module modifies IPtables rules, enabling mitm attacks and javascript injection.

The first action taken by the ssler module is to configure the device’s iptables to redirect all traffic destined for port 80 to its local service listening on port 8888. It starts by using the insmod command to insert three iptables modules into the kernel (ip_tables.ko, iptable_filter.ko, iptable_nat.ko) and then executes the following shell commands:

  • iptables -I INPUT -p tcp –dport 8888 -j ACCEPT
  • iptables -t nat -I PREROUTING -p tcp –dport 80 -j REDIRECT –to-port 8888
  • Example: ./ssler logs src:192.168.201.0/24 dst:10.0.0.0/16

-A PREROUTING -s 192.168.201.0/24 -d 10.0.0.0/16 -p tcp -m tcp –dport 80 -j REDIRECT –to-ports 8888

To ensure that these rules do not get removed, ssler deletes them and then adds them back approximately every four minutes.

More behaviors of the malware are described at https://news.sophos.com/en-us/2018/05/27/vpnfilter-botnet-a-sophoslabs-analysis-part-2/ including photobucket request, fake CA certs claiming Microsoft issued them and ipify lookups.

YARA rules for detection –

https://github.com/Neo23x0/signature-base/blob/master/yara/apt_vpnfilter.yar

https://github.com/Neo23x0/sigma/blob/master/rules/proxy/proxy_ua_apt.yml#L33

YARA (yet another recursive acronym) is a format to specify rules match malware based on string patterns, regular expressions and their frequency of occurrence. A guide to writing effective ones is here.

User-Agent rule –

Ipify self-ip address querying service, with json output. http://api.ipify.org/

 

Zero Trust Networks

Instead of  the “inside” and “outside” notion of traditional firewalls and perimeter defense technologies, the Zero Trust Network notion has its origins in Cloud+Mobile first world where the person carrying a mobile device can be anywhere inside/outside the enterprise and is connected securely to the services s/he needs access to.

The essential idea appears to be device authentication coupled with a second factor in the shape of an easy to remember password, with backend security smarts to identify the accessing device. More importantly, every service that is access externally needs to be authenticated, instead of some services being treated as internal services and being less protected.

Some properties of zero trust networks:

  • Network locality based access control is insufficient
  • Every device, user and service is authenticated
  • Policies are dynamic – they gather and utilize data inputs for making access control decisions
  • Attacks from trusted insiders are mitigated against

This is a big change from many networks which have network based defense at the core (for good reason, as it was very cost effective). To create such a network, from an existing network, one way to start is to identify, enumerate and sequence all network flows.

I attended a talk by Centrify on this topic, which resonated with experiences in cloud, mobile and fog systems.

Related effort in Kubernetes – Progress Toward Zero Trust Kubernetes Networks, Istio Service Mesh , API Gateway to Service Mesh.  One can contrast the API gateway as being present only at the ingress point of a cloud, whereas with a Zero-trust/Service-mesh/Sidecar approach every microservice building-block has its own external proxy and ‘API’ for management added to it. The latter would add to latency concerns for real-time applications, as the new sidecar proxies are in the data path.

The key original motivation behind Istio, in the second presentation by Lyft above, was greater observability and reliability across a complex cluster of microservices. This strikes me as a greater motivating use-case of this technology, than added security.  From the security point of view, there is a parallel of the Istio approach with SDN problem statement of a horizontal and ubiquitous security layer.

SCRAM: Salted Challenge Response Authentication Mechanism

SCRAM is an interesting proposal (RFC-5802) that aims to remove passwords being commonly sent across the wire. It does not appear to create additional requirements for certificates or shared secrets, so let’s see how it works.

The server is required to know the username in advance, but not the password, instead a hash of the password and a (per-user) salt and an iteration count which is used to create a challenge.

The client sends the username and a nonce. The server retrieves the salt and updates the iteration count and sends these back to the client as a challenge. The client hashes the password with the agreed upon hash function, and uses the salt and the iteration count in the calculation, and send it back to the server. The server is able to validate correctness of the hashed password with the information it has.  The server then sends back a hash which the client can check to validate the server.

There are several issues with it – the initial registration flow is left out, the requirements of the client and server to issue good nonces and maintain unique salts and iterations are high, and also the requirement for the server database itself to be secure – an exfiltration could enable brute force attacks.  Then it uses SHA-1 which is weak. The password is fixed and an update method would need to be designed for a full system.

Still it is interesting as a way to remove passwords being sent over the wire.

The protocol is used in XMPP as a standard mechanism for authentication.

 

Traffic limits with HAProxy stick-table

A traffic rate limiting feature is required to keep an HTTP website backend safe from abusive or malfunctioning clients.  This requires the ability to track user sessions of a particular type and/or from a given IP address. HAProxy is an HTTP proxy which (when configured as reverse proxy to protect a website), receives client requests in its frontend and sends those requests to servers in its backend.   The config file has corresponding frontend and backend sections. Haproxy also has an in-memory table to store state related to incoming HTTP connections, indexed by a key such as client IP address.  This table is called a stick-table – it is enabled using the ‘stick-table’ directive in the haproxy config file.

The stick-table directive allows  specifying the key, the size of the table, the duration an entry (key) is kept in seconds and various counts such as currently active connections, connection rate, http request rate, http error rate etc.

Stick tables are very useful for rate-limiting traffic and tagging traffic that meets certain criteria such as a high connection or error rate with a header which can be used by the backend to log the traffic.

The origin of this rate-limiting feature request along with an example is at https://blog.serverfault.com/2010/08/26/1016491873/ . Serverfault is a high traffic website so it is a good indication if the feature works for them.

frontend http
    bind *:2550

stick-table type ip size 200k expire 10m store gpc0

# check the source before tracking counters, that will allow it to
# expire the entry even if there is still activity.
acl whitelist src 192.168.1.154
acl source_is_abuser src_get_gpc0(http) gt 0
use_backend ease-up-y0 if source_is_abuser
tcp-request connection track-sc1 src if ! source_is_abuser

acl is_test1 hdr_sub(host) -i test1.com
acl is_test2 hdr_sub(host) -i test2.com

use_backend test1  if is_test1
use_backend test2  if is_test2

backend test1 
stick-table type ip size 200k expire 30s store conn_rate(100s),bytes_out_rate(60s) 
acl whitelist src 192.168.1.154

# values below are specific to the backend
tcp-request content  track-sc2 src
acl conn_rate_abuse  sc2_conn_rate gt 3
acl data_rate_abuse  sc2_bytes_out_rate  gt 20000000

# abuse is marked in the frontend so that it's shared between all sites
acl mark_as_abuser   sc1_inc_gpc0 gt 0
tcp-request content  reject if conn_rate_abuse !whitelist mark_as_abuser
tcp-request content  reject if data_rate_abuse mark_as_abuser

server local_apache localhost:80

Note that the frontend and backend sections have their own stick-table sections.

A general strategy would be to allow enough buffer for legitimate traffic to pass in, drop abnormally high traffic and flag intermediate risk traffic to the backend so it can either drop it or log the request for appropriate action, including potentially adding the IP to an abusers list for correlation, reverse lookup and other analysis. These objectives are achievable with stick-tables.

An overview of the HAProxy config file with the sections global, defaults, frontend, backend is here.

Stick tables use elastic binary trees-

https://github.com/haproxy/haproxy/blob/master/include/types/stick_table.h

https://github.com/haproxy/haproxy/blob/master/src/stick_table.c

https://wtarreau.blogspot.com/2011/12/elastic-binary-trees-ebtree.html

Related, for analysis of packet captures in DDoS context, a useful tool is python dpkt – https://mmishou.wordpress.com/2010/04/13/passive-dns-mining-from-pcap-with-dpkt-python .

 

 

Hatman, Triton ICS Malware Analysis

A Triconex Industrial controller allows triple modular redundancy and 2/3 consensus vote based control.  The design has its origins in the 80’s industrial needs for safety for industrial controllers. The product was acquired by Schneider via Invensys in 2014. The Hatman/Triton malware framework targeting this specific controller came to light, late 2017. The Triconex is programmed with a TriStation, a Windows application which integrates with Windows directory and allows programming in FBD, LD, ST, CEM.

From the SchneiderElectric, Accenture and Mandiant analyses of the malware, more technical details appeared recently. A previous paper appeared in IEEE, Jan 2017. A brief summary is below.

Access to the controller network is necessary. The Triconex controller needs to be in Program mode. A malware program agent, TriLogger, running on Windows in the same network talks over a Tricon protocol to program the Triconex controller to install/deploy the control payload program. The malware payload program then runs like a regular program on the controller, on every scan cycle –  running in parallel in three versions.

Once on the controller, the malware looks for a way to elevate its privilege level. It starts observing the runtime, including memory inspections. There is a memory backdoor attempted, but there is a probable error handling mistake which prevents this. Now to be able to access the firmware, it takes advantage of a zero-day vulnerability in the firmware.  It is able to install itself in the firmware, overwriting a network function call. In the end it installs a Remote Access Terminal to allow remote access of the controller. This could have been a vector to download further payloads, but no evidence was found that this RAT was actually used. It attempts to remove traces of itself after installation.

Source code of the program is at  https://github.com/ICSrepo/TRISIS-TRITON-HATMAN .

Zero day attacks are a continuing challenge as by definition they are not widely known before they are used for an attack. However a secure by design approach reduces the attack surface for exploits. There were opportunities to detect the malware on the network and the windows host.

Update: A cert advisory for Triton appears in https://ics-cert.us-cert.gov/advisories/ICSA-18-107-02 and “Targeted Cyber Intrusion Detection and Mitigation Strategies” in https://ics-cert.us-cert.gov/tips/ICS-TIP-12-146-01B

Javascript Timing and Meltdown

In response to meltdown/spectre side-channel vulnerabilities, which are based on fine grained observation of the CPU to infer cache state of an adjacent process or VM, a mitigration response by browsers was the reduction of the time resolution of various time apis, especially in javascript.

The authors responded with alternative sources of finding fine grained timing, available to browsers. An interpolation method allows obtaining of a fine resolution of 15 μs, from a timer that is rounded down to multiples of 100 ms.

The javascript  high resolution time api is still widely available and described at https://www.w3.org/TR/hr-time/ with a reference to previous work on cache attacks in Practical cache attacks in JS

A meltdown PoC is at https://github.com/gkaindl/meltdown-poc, to test the timing attack in its own process. The instruction RDTSC returns the Time Stamp Counter (TSC), a 64-bit register that counts the number of cycles since reset, and so has a resolution of 0.5ns on a 2GHz CPU.

int main() {
 unsigned long i;
 i = __rdtsc();
 printf("%lld\n", i);
}