Author: Ruchir Tewari

ML Transformer and GPT-2 Meetup

AI meetup, GPT-2 demo and discussion. Attention!

“The attention mechanism allows the model to create the context vector as a weighted sum of the hidden states of the encoder RNN at each previous timestamp.”

“Transformer is a type of model based entirely on attention, and does not require recurrent or convolutional layers”

Context vector is the output of the Encoder in an Encoder-Decoder network (EDN). EDNs struggle to retain all the required information for the decoder to accurately decode. Attention is a mechanism to solve this problem.

“Attention mechanisms let a model directly look at, and draw from, the state at any earlier point in the sentence. The attention layer can access all previous states and weighs them according to some learned measure of relevancy to the current token, providing sharper information about far-away relevant tokens.”

GPT: Generative Pre-Trained Transformer. Unlike BERT, it is generative and not geared to comprehension, translation or summarization tasks, but instead writing or generative tasks. It uses unsupervised learning to train a deep neural network with a seq2seq model. It does not use reinforcement learning (feedback from environment) or supervised learning. It uses “masked self-attention” to predict the next text during training on its dataset.

The term “generative” is used to emphasize GPT’s ability to generate new, original text, rather than just processing or analyzing text that already exists. A generative model is a type of machine learning model that is trained to produce data, such as text, images, or music, that is similar to the data it was trained on. GPT is a generative model because it is trained on a large corpus of text data and can then generate new text that is similar to the text in its training data. This allows GPT to produce human-like text on a wide range of topics, which can be useful for a variety of applications, such as language translation, text summarization, and question answering.

A “transformer” is a type of neural network architecture that was introduced in 2017. It is a deep learning model that is used for natural language processing tasks, such as language translation and text summarization. A transformer consists of two main components: an encoder, which processes the input text, and a decoder, which generates the output text. The encoder and decoder are connected by a series of attention mechanisms, which allow the model to focus on different parts of the input text as it generates the output. This architecture allows the model to process input text in a parallel, rather than sequential, manner, which makes it more efficient and effective than previous models. The transformer architecture has been widely adopted in natural language processing and has been shown to be highly effective for many tasks.

In a transformer, the “attention” mechanism allows the model to focus on different parts of the input text at different times as it generates the output text. This is different from previous neural network models, which processed the input text sequentially, one word at a time. The attention mechanism in a transformer works by calculating a weight for each word in the input text. This weight represents the importance of that word in the context of the current output word that the model is generating. The model then uses these weights to decide which words in the input text to focus on as it generates the output. This allows the model to selectively focus on the most relevant words in the input text.

https://en.m.wikipedia.org/wiki/Transformer_(machine_learning_model) was initially released June 2017 by Google Brain team.

GPT was released June 2018 by OpenAI.

BERT was released Oct 2018 by Google.

GPT-2 was announced Feb 2019 by OpenAI, trained on 40GB of text.

GPT-3 was introduced May 2020 and in beta testing in July 2020. Trained on 10x the data, or 400GB.

BERT is a response to GPT and GPT-2 is in turn a response to BERT.

This attention concept looks akin to a fourier or laplace transform which encodes the entire input signal in a lossless manner in a way that allows sections or bands of it to be referred to later. Although implemented differently it’s a way to keep track of and refer to global state.

AutoML and Transformer – http://ai.googleblog.com/2019/06/applying-automl-to-transformer.html

Transformer architecture

BERT and GPT are both based on the Transformer ideas. BERT is bidirectional and better at ccomprehending meaning from the whole sentence/phrase whereas GPT is better at generating text.

https://jalammar.github.io/illustrated-transformer/

https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/

Bahdanau, 2014 introduced the concept of Attention https://arxiv.org/abs/1409.0473

“The most important distinguishing feature of this approach from the basic encoder–decoder is that it does not attempt to encode a whole input sentence into a single fixed-length vector. Instead, it encodes the input sentence into a sequence of vectors and chooses a subset of these vectors adaptively while decoding the translation. This frees a neural translation model from having to squash all the information of a source sentence, regardless of its length, into a fixed-length vector. We show this allows a model to cope better with long sentences.”

This description makes it more like a wavelet transform, that does auto-correleations of a signal at different levels of granularity to make sense of it.

Conceptual progression

  1. Input -> Encoder -> Decoder -> Output
  2. Encoder maintains Hidden States to parse/grok the input. These are vectors. Once it goes through the input, it passes the final Hidden State, called the Context forward to the Decoder.
  3. This Context is the bottleneck in the operation of the Decoder.
  4. The Attention concept introduced by Bahdanau and others was to overcome the bottleneck in the Context
  5. With Attention the entire set of intermediate Hidden states is passed on to the Decoder, not just the final Context.
  6. The Decoder does a couple additional steps than before. a) it assigns a score assigned to each Hidden state b) it multiplies the Hidden state with the score. This set of scored vectors are then passed on to the Decoder to produce the Output.

Omnisci GPU based columnar database

Omnisci is a columnar database that reads a column into GPU memory, in compressed form, allowing for interactive queries on the data. A single gpu can load 10million to 50million rows of data and allows interactive querying without indexing. A demo was shown at the GTC keynote this year, by Aaron Williams. He gave a talk on vehicle analytics that I attended last month.

In the vehicle telemetry demo, they obtain vehicle telemetry data from an F1 game that has data output as UDP, 10s of thousands of packets a second – take the binary data off of UDP, and convert it to json and use it as a proxy for real telemetry data. The webserver refreshes every 3-4 seconds. The use case is analysis of increasing amounts of vehicle sensor data as discussed in this video and described in the detailed Omnisci blog post here.

Programming of the blocks is done through the UX with StreamSets which is an open source tool to build data pipelines –  a cool demo of streamsets for creating pipelines including kafka is here.

The vehicle analytics demo pipeline consisted of UDP to Kafka, Kafka to JSON, then JSON to OmniSci via pymapd .  Kafka serves as a message broker and also for playback of data.

Based on the the GPU loaded data, the database allows queries and stats on different vehicles that are running.

The entire system runs in the cloud on a VM supporting Nvidia GPUs, and can also be run on a local GPU box.

Lacework Intrusion Detection System – Cloud IDS

Lacework Polygraph is a Host based IDS for cloud workloads. It provides a graphical view of who did what on which system, reducing the time for root cause analysis for anomalies in system behaviors. It can analyze workloads on AWS, Azure and GCP.

It installs a lightweight agent on each target system which aggregates information from processes running on the system into a centralized customer specific (MT) data warehouse (Snowflake on AWS)  and then analyzes the information using machine learning to generate behavioral profiles and then looks for anomalies from the baseline profile. The design allows automating analysis of common attack scenarios using ssh, privilege changes, unauthorized access to files.

The host based model gives detailed process information such as which process talked to which other and over what api. This info is not available to a network IDS. The behavior profiles reduce the false positive rates. The graphical view is useful to drill down into incidents.

OSQuery is a tool for gathering data from hosts, and this is a source of data aggregated for threat detection. https://www.rapid7.com/blog/post/2016/05/09/introduction-to-osquery-for-threat-detection-dfir/

Here’s an agent for libpcap https://github.com/lacework/pcap

It does not have an intrusion prevention (IPS) functionality. False positives on an IPS could block network/host access and negatively affect the system being protected, so it’s a harder problem.

Cloud based network isolation tools like Aviatrix might make IPS scenarios feasible by limiting the effect of an IPS.

Software Integrity Tools

There are a number of tools used to detect security issues in a software application codebase. A simple and free one is flawfinder. A sophisticated commercial one is Veracode.  There’s also lint, pylint, findbugs for java, and xcode clang static analyzer.

Synopsis has bought a few tools like Coverity and Blackduck for various static checks on code and binary. Blackduck can do binary analysis and scores issues with the CVSS. A common use of Blackduck is for license checking to check for conformance to open source licenses.

A more comprehensive list of static code analysis tools is here.

Dynamic analysis tools inspect the running process and find memory and execution errors. Well known examples are valgrind and Purify. More dynamic tools are listed here.

For web application security there are protocol testing and fuzzing tools like Burp suite and Tenable Nessus.

A common issue with the tools is the issue of false positives. It helps to limit the testing to certain defect types or attack scenarios and identify the most critical issues, then expand the scope of types of defects.

Code obfuscation and anti-tamper are another line of tools, for example by Arxan, Klocwork, Irdeto and Proguard .

A great talk on Adventures in fuzzing. My takeaway has been that better ways of developing secure software are really important.

 

 

FaceID vs TouchID on iPhone, and FIDO on Android

As a developer Apple made it the transition from TouchId to FaceId easy with the LocalAuthentication library (more below). But as a user that had grown accustomed to TouchId, I did not like the TouchId button going away. What were the reasons for TouchId to be replaced completely by FaceId. What happened on other platforms like Android and FIDO ? Let’s explore in this post.

Both Touch ID and Face ID are biometric authentication technologies developed by Apple for their iPhone lineup. Each has distinct implementations and security features. Here’s a detailed contrast of the security aspects and implementations of Touch ID and Face ID:

Touch ID

Implementation:

  • Technology: Touch ID uses a capacitive touch sensor embedded in the Home button or, in newer models, in the power button.
  • Enrollment: Users register their fingerprint by repeatedly placing their finger on the sensor, which captures a high-resolution image of the fingerprint.
  • Authentication: When authenticating, the sensor captures the fingerprint image and compares it to the enrolled fingerprint data.

Security Features:

  • Secure Enclave: Fingerprint data is stored in an encrypted format in the Secure Enclave, a separate processor within the iPhone’s A-series chip. This ensures that fingerprint data is never accessible to the operating system or any apps.
  • False Acceptance Rate (FAR): Apple claims that the chance of a random person being able to unlock an iPhone with Touch ID is approximately 1 in 50,000.
  • Liveness Detection: Touch ID includes some measures to detect the presence of a live finger to prevent spoofing with artificial fingerprints.

Face ID

Implementation:

  • Technology: Face ID uses a combination of infrared and dot projection technology, known as the TrueDepth camera system, to create a detailed 3D map of the user’s face.
  • Enrollment: Users enroll by moving their head around so that the TrueDepth camera can capture their face from multiple angles.
  • Authentication: During authentication, the system projects over 30,000 infrared dots onto the user’s face and captures the pattern with an infrared camera to create a 3D facial map.

Security Features:

  • Secure Enclave: Facial data is also stored in the Secure Enclave in an encrypted format, similar to Touch ID, ensuring it is never accessible to the operating system or apps.
  • False Acceptance Rate (FAR): Apple claims that the probability of a random person unlocking an iPhone with Face ID is approximately 1 in 1,000,000.
  • Liveness Detection: Face ID employs advanced liveness detection to ensure the person presenting the face is real and not a photo or mask. This includes attention awareness, requiring the user’s eyes to be open and looking at the device.
  • Adaptability: Face ID can adapt to changes in the user’s appearance over time, such as growing facial hair, wearing glasses, or aging.

Comparison and Security Implications

  • Accuracy and False Acceptance Rate: Face ID is significantly more accurate than Touch ID, with a FAR of 1 in 1,000,000 compared to Touch ID’s 1 in 50,000. This means Face ID is generally more secure against random attempts to unlock the device.
  • Resistance to Spoofing: Both systems are designed to resist spoofing, but Face ID’s use of 3D facial mapping and infrared technology makes it more robust against attempts to bypass it using photos or masks. Touch ID’s liveness detection is less sophisticated compared to Face ID.
  • Environmental Conditions: Touch ID can be affected by wet or dirty fingers and may not work if the sensor is damaged. Face ID, on the other hand, can be impacted by certain lighting conditions or if the user is wearing accessories that obscure the face (e.g., certain sunglasses or masks).
  • User Convenience: Face ID generally offers a more seamless experience as it works even when the user’s hands are occupied or when wearing gloves. However, in situations where face coverings are required, Touch ID may be more convenient.

Both Touch ID and Face ID offer robust security features, but Face ID provides enhanced security through its more sophisticated technology and lower false acceptance rate. The choice between the two may come down to personal preference and specific use cases, such as the need for convenience in different environments.

Transition from Touch ID to Face ID

API Transition:

  1. Initial Introduction:
  • Touch ID (iOS 7, 2013): When Apple introduced Touch ID with the iPhone 5s, they provided APIs in the LocalAuthentication framework, allowing developers to integrate fingerprint authentication into their apps.
  • Face ID (iOS 11, 2017): With the introduction of Face ID in the iPhone X, Apple updated the LocalAuthentication framework to support facial recognition alongside fingerprint recognition.
  1. Unified API:
  • Apple designed the LocalAuthentication framework to abstract away the specifics of biometric authentication. This means developers typically don’t need to change their code to support both Touch ID and Face ID. Instead, they use general methods to request biometric authentication, and the system handles the specifics.
  • For example, the method LAContext.evaluatePolicy(_:localizedReason:reply:) works for both Touch ID and Face ID.
  1. Backward Compatibility:
  • Apple ensured backward compatibility, so apps built to support Touch ID automatically support Face ID without requiring significant changes. The system dynamically determines which biometric method to use based on the hardware capabilities of the device.

Developer Considerations:

  • Developers are encouraged to check for biometric type (LABiometryType) to provide appropriate messaging or handle specific cases where one type may be preferred.
  • Apple provides clear guidelines and updates during WWDC (Apple Worldwide Developers Conference) to help developers transition smoothly and take full advantage of new hardware features.

FIDO on Android

FIDO (Fast Identity Online):

  1. Overview:
  • The FIDO Alliance is an open industry association aimed at creating authentication standards to reduce reliance on passwords. FIDO protocols use standard public key cryptography techniques to provide stronger authentication.
  1. Adoption on Android:
  • Android has supported FIDO2 since Android 7.0 (Nougat). FIDO2 includes WebAuthn (web authentication API) and CTAP (Client to Authenticator Protocol), enabling passwordless authentication on the web and mobile apps.
  • Google Play Services provide FIDO2 API support, making it available on most modern Android devices.
  1. Biometric Integration:
  • Android’s BiometricPrompt API (introduced in Android 9.0 Pie) allows apps to integrate various types of biometric authentication (fingerprint, face, iris) in a standardized way.
  • BiometricPrompt is designed to support multiple biometric types, providing a unified API similar to Apple’s LocalAuthentication framework.

Face ID on Android:

  1. Face Authentication:
  • Android supports face authentication through the BiometricPrompt API. The specifics of face authentication can vary by manufacturer, as Android devices have diverse hardware capabilities.
  • Devices like Google Pixel 4 and some Samsung models have advanced face recognition similar to Apple’s Face ID, using infrared sensors and dot projection for 3D facial mapping.
  1. FIDO and Biometric Standards:
  • The FIDO Alliance promotes biometric standards, including facial recognition, but the implementation and quality can vary widely on Android devices.
  • Some Android manufacturers implement their own facial recognition solutions, while others rely on Google’s BiometricPrompt API for a more standardized approach.

Transition from Touch ID to Face ID:

  • Apple’s transition from Touch ID to Face ID was facilitated by a unified API (LocalAuthentication), allowing seamless integration for developers and users.

FIDO on Android:

  • FIDO2 standards are widely supported on Android, enabling strong, passwordless authentication.
  • Android supports multiple biometric methods, including facial recognition, through the BiometricPrompt API. However, the implementation quality and technology can vary between devices.

Open Compute Project 2019

OCP has the mission to “design and enable the delivery of the most efficient server, storage and data center hardware designs for scalable computing”.

OCP had its global 2019 summit recently. Some interesting trends on hyperscale networks are discussed here and here with the use of F16 fabric network with its a focus on higher bandwidth but also performance at the right cost instead of at any cost. The heart of this new F16 fabric is the Minipack switch, with contribution from Arista which Facebook says will consume 50 percent less power and space than the Backpack switch it replaces in the network.  It is a 128x100Gb switch and uses a Broadcom Tomahawk-3 Asic. Quote: “a path from a rack in one building to a rack in another building over Fabric Aggregator was as many as 24 hops long before. With F16, same-fabric network paths are always the best case of six hops, and building-to-building flows always take eight hops. This results in half the number of intrafabric network hops and one-third the number of interfabric network hops between servers.”

Intel announced an industry collaboration around Platform Root of Trust at the Open Compute Project 2019 summit.

There’s a talk on Stratum and the use of P4 and Switch Abstraction Interface (SAI) for SDN, by Open Networking Foundation (ONF) and Google. Tencent has a use case for disaggregating their monolithic network into a modular switch with a network of controllers instead of a single controller.

Smaller data centers at the edge is another trend.

Facebook storage stack and its evolution- https://thenewstack.io/facebook-storage/, mentions OCP and the disaggregated server model which separates server components across different racks.

More on cold storage in a Facebook data center – https://www.datacenterknowledge.com/archives/2013/01/18/facebook-builds-new-data-centers-for-cold-storage

SGX vs TPM vs SE – security in silicon

Hardware root of trust, Hardware security primitives

PUF https://en.m.wikipedia.org/wiki/Physical_unclonable_function

Specialized crypto ops vs small open/generic TCB.

VHDL libraries

Chain of trust, up to firmware and software layer. Level of integration vs modularity.

TouchID with SE . Samsung. Yubikey, Gemalto,

Electronic logging devices. ELD tamper proofing

Secure provisioning. Intel EPID

Rust, memory safey, cargo, toml, policy engine, fortanix, azure IoT

Key management with enclaves

Trusted VMs and cloud security

Platform security 

Isolated memory on shared infra

Git error: failed to push some refs to remote

This error often occurs after another checkin has gone in before yours, and says “the tip of your current branch is behind its remote counterpart”.

It should be resolved by

a) ‘git pull –rebase’ // this may bring in conflicts that need to be resolved

followed by

b) ‘git push’ // this works the first time

After the two steps your changes are available to team members to code review and you may need to edit your changes. After making such changes, you’d need to do

c) ‘git push -f’

to force the push.

However say this codereview-edit cycle takes some time and other changes are approved in the mean time – then you have to repeat these steps.

It can be simpler to get a fresh copy of the top of tree and add in your changes directly there and submit.

IPTables

Iptables is a Linux-based firewall utility that allows system administrators to configure rules and filters for network traffic. It works by examining each packet that passes through the system and making decisions based on rules defined by the administrator.

In the context of iptables, tables are organizational units that contain chains and rules. There are five built-in tables in iptables:

  1. filter – This is the default table and is used for packet filtering.
  2. nat – This table is used for network address translation (NAT).
  3. mangle – This table is used for specialized packet alteration.
  4. raw – This table is used for configuring exemptions from connection tracking in combination with the NOTRACK target.
  5. security – This table is used for Mandatory Access Control (MAC) networking rules.

In iptables, a chain is a sequence of rules that are applied to each packet passing through a particular table. Each table contains several built-in chains, and these chains can also have user-defined chains.

Each table contains several built-in chains and can also have user-defined chains. The chains contain rules that are used to determine what happens to packets as they pass through the firewall.

The name “chain” comes from the way that iptables processes packets. When a packet arrives at the firewall, it is first matched against the rules in the PREROUTING chain. If the packet matches a rule in that chain, it is processed according to the action specified in the rule. If the packet is not matched by any rules in the PREROUTING chain, it is then passed to the INPUT chain, where it is again matched against each rule.

This process continues through the FORWARD, OUTPUT, and POSTROUTING chains until the packet is either accepted, rejected, or dropped. Because each chain is a sequence of rules that are processed in order, it is called a “chain.”

In iptables, there are five types of chains. Not all of the five chains are present in all the tables. The PREROUTING, INPUT, and FORWARD chains are present in the “filter” and “raw” tables. The OUTPUT and POSTROUTING chains are present in the “filter”, “nat”, and “mangle” tables.

The “security” table, which is used for Mandatory Access Control (MAC) networking rules, does not have the PREROUTING, POSTROUTING, or OUTPUT chains. Instead, it has inbound and outbound chains for each network interface.

Each of these chains has a default policy that can be set to either ACCEPT, DROP, or REJECT. ACCEPT means that the packet is allowed through, DROP means that the packet is silently discarded, and REJECT means that the packet is discarded and the sender is notified.

System administrators can create custom rules that match specific criteria, such as the source or destination IP address, the protocol used, or the port number.

“mangle” is a table that is used to alter or mark packets in various ways. The mangle table provides a way to modify packet headers in ways that other tables cannot, such as changing the Time To Live (TTL) value or the Type of Service (TOS) field.

Safety Concepts – Functional safety, SOTIF

I want to capture Functional Safety concepts. This is an important topic given the number of vehicles and their increasing automation. It was a major topic at the recent ArmTechCon.

Airbags, seat-belts, ABS, and tire-pressure-monitoring-systems are some of the safety features in a car.

Safety Function or Safety Instrumented Function (SIF): A function to take a system to a safe outcome when certain prerequisites on system inputs are not met. For example, turn on a warning indicator when seat-belt is not used or when the tire-pressure is below safe level, or deploy airbags when a collision is detected.

Safety Related Control Function (SRCF): This is the control mechanism by which the safety function is achieved. For example, collision above a certain impact threshold leads to airbag deployment by an airbag control module. Quote: “The airbag control module is installed inside the center console and contains a safety sensor, G sensor, ignition judgment circuit, and a backup power supply.”

Safety Integrity Level (SIL): The reliability of a safety-related-control-function is captured with a Safety Integrity Level or SIL. SIL level go from 1 to 4 (highest). SILs are derived from a risk estimation process and are used in estimating risk of a system built using pre-built components.

A safety standard is ISO26262. The V shaped functional safety process diagram describes the design steps flowing from OEM to supplier and verification steps flowing back from supplier to OEM (here). This process is used to achieve a Safety Integrity Level (SIL) where the SIL1, SIL2, SIL3, SIL4 reduce risk by a progressive factor of 10, i.e. by 10x, 100x, 1000x and 10000x. A hazard and operability study ( HAZOP ) is undertaken to understand the risks of the mechanism behaving incorrectly.

A good reference, from SIMATIC is here. Software aspects of safety function are discussed in in this whitepaper.

Safety systems for robotics are discussed here – this has a table of typical safety issues when a person enters a robot safeguarded area. Industrial robots security was briefly discussed in this blog here.

Another concept is SOTIF or Safety of the Intended Function, which comes up in functional safety discussions of AI-controlled vehicles. More links on it here.

An Nvidia safe driving report is here.

SIEM and UEBA analytics

“A fortune 500 enterprise’s infrastructure can easily generate 10 terabytes of plain-text data per month. So how can enterprises effectively log, monitor, and correlate that data to obtain actionable insight? Enter the Security Information and Event Management (SIEM) solution”  – quote from Jeff Edwards, in a Solutions Review’s 2016 SIEM buyer’s guide covering AccelOps, Alert Logic, Alien Vault, Assuria, BlackStratus, CorreLog, EiQ Networks, EMC (RSA), Event Tracker, HP, IBM QRadar, Intel Security, Logentries, LogPoint, LogRhythm, Manage Engine, NetGuardians, NetIQj, Silver Sky, SolarWinds, Splunk, Sumo Logics, Tenable, and Trustwave .

SIEM and related acronyms –

SIEM – Security Information and Event Management, consists of SIM and SEM.

SIM – Security information management (SIM) is also referred to as log management, log storage, analysis and reporting.

SEM – Real-time monitoring, correlation of events, notifications and console views

Practical application of SIEM – Automating threat identification: SANS publication.

UBA – User Behavior Analytics

UEBA –  User and Entity Behavior Analytics. This is growing in importance, for example Exabeam focusses on behavioral analytics. The key idea in UEBA is it extends analytics to cover non-human processes and machines entities.

IDS – Intrusion Detection System. Detects and notifies about an intrusion.

IPS – Intrusion Prevention System. Such a device may shut off traffic based on an attack detection.

WAF – web application firewall.

Splunk https://www.splunk.com/pdfs/technical-briefs/splunk-for-advanced-analytics-and-threat-detection-tech-brief.pdf

https://www.splunk.com/en_us/data-insider/user-behavior-analytics-ueba.html

https://www.slideshare.net/databricks/deep-learning-in-securityan-empirical-example-in-user-and-entity-behavior-analytics-with-dr-jisheng-wang

https://github.com/topics/ueba

https://siliconangle.com/2017/07/25/detecting-malicious-insiders-behavioral-analytics-sparksummit/

https://www.analyticsvidhya.com/blog/2018/02/demystifying-security-data-science/ (good overview of evolution of solutions)

https://github.com/jivoi/awesome-ml-for-cybersecurity

Threat Modelling

Threat modelling is a set of techniques to identify the level of risk to assets from their interactions with their operating environment. Some threat modelling methodologies and tools are linked below for reference:

PASTA – Process for Attack Simulation and Threat Analysis. The link has details of an online banking use case.

DREAD – Damage [potential], Reproducibility, Exploitability, Affected users, Discoverability

STRIDE Spoofing of user identity, Tampering, Repudiation, Information disclosure (privacy breach or data leak), Denial of service , Elevation of privilege

Attack trees – similar to fault trees, it show the relatedness of cause/effect; an good example for a SCADA system is here.

VPNFilter IoT Router Malware

Over 500k routers and gateways are estimated to be infected with malware dubbed VPNFilter, first reported in https://blog.talosintelligence.com/2018/05/VPNFilter.html .

It has 3 stages. In stage 1 it adds itself to crontab to remain after a reboot. In stage 2 it adds a plugin architecture. In stage 3 it adds modules which instruct it to do specific things.  A factory reset and router restart in protected network was recommended to remove it. Disabling remote administration and changing passwords is recommended to prevent reinfection.

The 3rd stage module modifies IPtables rules, enabling mitm attacks and javascript injection.

The first action taken by the ssler module is to configure the device’s iptables to redirect all traffic destined for port 80 to its local service listening on port 8888. It starts by using the insmod command to insert three iptables modules into the kernel (ip_tables.ko, iptable_filter.ko, iptable_nat.ko) and then executes the following shell commands:

  • iptables -I INPUT -p tcp –dport 8888 -j ACCEPT
  • iptables -t nat -I PREROUTING -p tcp –dport 80 -j REDIRECT –to-port 8888
  • Example: ./ssler logs src:192.168.201.0/24 dst:10.0.0.0/16

-A PREROUTING -s 192.168.201.0/24 -d 10.0.0.0/16 -p tcp -m tcp –dport 80 -j REDIRECT –to-ports 8888

To ensure that these rules do not get removed, ssler deletes them and then adds them back approximately every four minutes.

More behaviors of the malware are described at https://news.sophos.com/en-us/2018/05/27/vpnfilter-botnet-a-sophoslabs-analysis-part-2/ including photobucket request, fake CA certs claiming Microsoft issued them and ipify lookups.

YARA rules for detection –

https://github.com/Neo23x0/signature-base/blob/master/yara/apt_vpnfilter.yar

https://github.com/Neo23x0/sigma/blob/master/rules/proxy/proxy_ua_apt.yml#L33

YARA (yet another recursive acronym) is a format to specify rules match malware based on string patterns, regular expressions and their frequency of occurrence. A guide to writing effective ones is here.

User-Agent rule –


alert http any any -> any any (msg:"VPNFilter malware User-Agent"; content:"Mozilla/6.1 (compatible|3B| MSIE 9.0|3B| Windows NT 5.3|3B| Trident/5.0)"; http_user_agent; sid:2; rev:1;)

Ipify self-ip address querying service, with json output. http://api.ipify.org/