Category: security

Kubernetes security

Kubernetes is a Platform-as-a-Service (PAAS) similar to Cloud Foundry. It has more a centralized control plane compared to Cloud Foundry.

A threat matrix for Kubernetes – https://www.microsoft.com/security/blog/2020/04/02/attack-matrix-kubernetes/

From RSA’20, a talk on The future of Kubernetes attacks – https://youtu.be/CH7S5rE3j8w

Coinbase: Why Kubernetes is not part of our stackhttps://blog.coinbase.com/container-technologies-at-coinbase-d4ae118dcb6c

  • it needs a full-time compute team
  • securing it is not trivial and well understood. SPIFFE, SPIRE, Envoy, Istio, OPA, service mesh a few of the technologies.

It links to – https://k8s.af/

Another similar viewpoint – https://pythonspeed.com/articles/dont-need-kubernetes/

A counterpoint to the Coinbase blog – https://blog.kumina.nl/2020/07/in-response-to-container-technologies-at-coinbase/

Scratch notes:

K8S is based on a Controller pattern:

  • Resources capture the desired state.
  • Current state is kept centralized in etcd, a distributed key-value store (similar to Consul).
  • Controllers reconcile current state with desired state.

Pod is a top level resource, is the smallest deployment unit, and is a group of one or more containers described by a yaml file, similar to docker-compose.yml .

K8S Operator is a kind of resource manager, for Custom resources.

https://blog.frankel.ch/your-own-kubernetes-controller/1/

https://pushbuildtestdeploy.com/when-do-kubernetes-operators-make-sense

Spinnaker – Continuous Delivery platform that itself runs on k8s as a set of pods which can be scaled up

kubectl cheat sheet:

https://kubernetes.io/docs/reference/kubectl/cheatsheet

An article on cloud security https://medium.com/xm-cyber/having-fun-with-cloud-services-e281f8a7fe60

Accuracy vs Recall vs Precision vs F1

We want to walk through some common metrics in classification problems such as accuracy, precision and recall and get a feel for when to use which metric. Say we are looking for a needle in a haystack. There are very few needles in a large haystack full of straws. An automated machine is sifting through the objects in the haystack and predicting for each object whether it is a straw or a needle. A reasonable predictor will predict a small number of objects as needles and a large number as straws.

Positive Prediction: the object at hand is predicted to be the needle. A small number.

Negative Prediction: the object at hand is predicted not to be a needle. A large number.

True_Positive: of the total number of predictions, the number of predictions that were positive and correct. Correctly predicted Positives (needles). A small number.

True_Negative: of the total number of predictions, the number of predictions that were negative and correct. Correctly predicted Negatives (straws). A large number. 

False_Positive: of the total number of predictions, the number of predictions that are positive but the prediction is incorrect. Incorrectly predicted Positives (straw predicted as needle). Could be large as the number of straws is large, but assuming the total number of predicted needles is small, this is less than or equal to predicted needles, hence small.

False_Negative: of the total number of predictions, the number of predictions that are negative but the prediction is incorrect. Incorrectly predicted Negatives (needle predicted as straw). Is this a large number ? It is unknown – this class is not large just because the class of negatives is large – it depends on the predictor and a “reasonable” predictor which predicts most objects as straws, could also predict many needles as straws. This is less than or equal to the total number of needles, hence small.

Predicted_Positives = True_Positives + False_Positives = Total number of objects predicted as needles.

Actual Positives = Actual number of needles, which is independant of the number of predictions either way, however Actual Positives = True Positives + False Negatives.

Accuracy = nCorrect _Predictons/nTotal_Predictions=(nTrue_Positives+nTrue_Negatives) / (nPredicted_Positives +nPredicted_Negatives) .   # the reasonable assumption above is equivalent to a high accuracy. Most predictions will be hay, and be correct in this simply because of the skewed distribution. This does not shed light on FP or FN.

Precision = nTrue_Positives / nPredicted_Positives    # correctly_identified_needles/predicted_needles;  this sheds light on FP; Precision = 1 => FP=0 => all predictions of needles are in fact needles; a precision less than 1 means we got a bunch of hay with the needles – gives hope that with further sifting the hay can be removed.  Precision is also called Specificity and quantifies the absence of False Positives or incorrect diagnoses.

Recall = nTrue_Positives / nActual_Positives  = TP/(TP+FN)# correctly_identified_needles/all_needles; this sheds light on FN; Recall = 1 => FN = 0; a recall less than 1 is awful as some needles are left out in the sifting process. Recall is also called Sensitivity .

Precision > Recall => FN is higher than FP

Precision < Recall => FN is lower than FP

If at least one needle is correctly identified as a needle, both precision and recall will be positive; if zero needles are correctly identified, both precision and recall are zero.

F1 Score is the harmonic mean of Precision and Recall.  1/F1 = 1/2(1/P + 1/R) . F1=2PR/(P+R) .  F1=0 if P=0 or R=0. F1=1 if P=1 and R=1.

ROC/AUC rely on Recall (=TP/TP+FN) and another metric False Positive Rate defined as FP/(FP+TN)  = hay_falsely_identified_as_needles/total_hay . As TN >> FP, this should be close to zero and does not appear to be a useful metric in the context of needles in a haystack; as are ROC/AuC . The denominators are different in Recall and FPR, total needles and total hay respectively.

There’s a bit of semantic confusion when saying True Positive or False Positive. These shorthands can be interpreted as- it was known that an instance was a Positive and a label of True or False was applied to that instance. But what we mean is that it was not known whether the instance was a Positive, and that a determination was made that it was a Positive and this determination was later found to be correct (True) or incorrect (False). Mentally replace True/False with ‘Correct/Incorrectly identified as’ to remove this confusion.

Normalization: scale of 0-1, or unit norm; useful for dot products when calculating similarity.

Standardization: zero mean, divided by standard deviation; useful in neural network/classifier inputs

Regularization: used to reduce sensitivity to certain features. Uses regression. L1: Lasso regression L2: Ridge regression

Confusion matrix: holds number of predicted values vs known truth. Square matrix with size n equal to number of categories.

Lacework Intrusion Detection System – Cloud IDS

Lacework Polygraph is a Host based IDS for cloud workloads. It provides a graphical view of who did what on which system, reducing the time for root cause analysis for anomalies in system behaviors. It can analyze workloads on AWS, Azure and GCP.

It installs a lightweight agent on each target system which aggregates information from processes running on the system into a centralized customer specific (MT) data warehouse (Snowflake on AWS)  and then analyzes the information using machine learning to generate behavioral profiles and then looks for anomalies from the baseline profile. The design allows automating analysis of common attack scenarios using ssh, privilege changes, unauthorized access to files.

The host based model gives detailed process information such as which process talked to which other and over what api. This info is not available to a network IDS. The behavior profiles reduce the false positive rates. The graphical view is useful to drill down into incidents.

OSQuery is a tool for gathering data from hosts, and this is a source of data aggregated for threat detection. https://www.rapid7.com/blog/post/2016/05/09/introduction-to-osquery-for-threat-detection-dfir/

Here’s an agent for libpcap https://github.com/lacework/pcap

It does not have an intrusion prevention (IPS) functionality. False positives on an IPS could block network/host access and negatively affect the system being protected, so it’s a harder problem.

Cloud based network isolation tools like Aviatrix might make IPS scenarios feasible by limiting the effect of an IPS.

Software Integrity Tools

There are a number of tools used to detect security issues in a software application codebase. A simple and free one is flawfinder. A sophisticated commercial one is Veracode.  There’s also lint, pylint, findbugs for java, and xcode clang static analyzer.

Synopsis has bought a few tools like Coverity and Blackduck for various static checks on code and binary. Blackduck can do binary analysis and scores issues with the CVSS. A common use of Blackduck is for license checking to check for conformance to open source licenses.

A more comprehensive list of static code analysis tools is here.

Dynamic analysis tools inspect the running process and find memory and execution errors. Well known examples are valgrind and Purify. More dynamic tools are listed here.

For web application security there are protocol testing and fuzzing tools like Burp suite and Tenable Nessus.

A common issue with the tools is the issue of false positives. It helps to limit the testing to certain defect types or attack scenarios and identify the most critical issues, then expand the scope of types of defects.

Code obfuscation and anti-tamper are another line of tools, for example by Arxan, Klocwork, Irdeto and Proguard .

A great talk on Adventures in fuzzing. My takeaway has been that better ways of developing secure software are really important.

 

 

Zero Trust Networks

Instead of  the “inside” and “outside” notion of traditional firewalls and perimeter defense technologies, the Zero Trust Network notion has its origin in the Cloud+Mobile first world where a person carrying a mobile device can be anywhere in the world (inside/outside the enterprise) and needs to be seamlessly and securely connected to online services.

The essential idea appears to be device authentication coupled with a second factor in the shape of an easy to remember password, with backend security smarts to identify the accessing device. More importantly, every service that is access externally needs to be authenticated, instead of some services being treated as internal services and being less protected.

Some properties of zero trust networks:

  • Network locality based access control is insufficient
  • Every device, user and service is authenticated
  • Policies are dynamic – they gather and utilize data inputs for making access control decisions
  • Attacks from trusted insiders are mitigated against

This is a big change from many networks which have network based defense at the core (for good reason, as it was cost effective). To create a zero-trust network, a startin point is to identify, enumerate and sequence all network flows.

I attended a talk by Centrify on this topic, which resonated with experiences in cloud, mobile and fog systems.

Related effort in Kubernetes – Progress Toward Zero Trust Kubernetes Networks, Istio Service Mesh , API Gateway to Service Mesh.  One can contrast the API gateway as being present only at the ingress point of a cloud, whereas with a Zero-trust/Service-mesh/Sidecar approach every microservice building-block has its own external proxy and ‘API’ for management added to it. The latter would add to latency concerns for real-time applications, as the new sidecar proxies are in the data path. One benefit of the service mesh is a mechanism to put in service to service security in a uniform manner.

The key original motivation behind Istio, in the second presentation by Lyft above, was greater observability and reliability across a complex cluster of microservices. This strikes me as a greater motivating use-case of this technology, than added security.  From the security point of view, there is a parallel of the Istio approach with the SDN problem statement of a horizontal and ubiquitous security layer.   Greater visibility is also a motivation behind the P4 programming language presented in disaggregated storage talk on protocol independant switch architecture or PISA here – one of the things it enables is inband telemetry.

Hatman, Triton ICS Malware Analysis

A Triconex Industrial controller allows triple modular redundancy and 2/3 consensus vote based control.  The design has its origins in the 80’s industrial needs for safety for industrial controllers. The product was acquired by Schneider via Invensys in 2014. The Hatman/Triton malware framework targeting this specific controller came to light, late 2017. The Triconex is programmed with a TriStation, a Windows application which integrates with Windows directory and allows programming in FBD, LD, ST, CEM.

From the SchneiderElectric, Accenture and Mandiant analyses of the malware, more technical details appeared recently. A previous paper appeared in IEEE, Jan 2017. A brief summary is below.

Access to the controller network is necessary. The Triconex controller needs to be in Program mode. A malware program agent, TriLogger, running on Windows in the same network talks over a Tricon protocol to program the Triconex controller to install/deploy the control payload program. The malware payload program then runs like a regular program on the controller, on every scan cycle –  running in parallel in three versions.

Once on the controller, the malware looks for a way to elevate its privilege level. It starts observing the runtime, including memory inspections. There is a memory backdoor attempted, but there is a probable error handling mistake which prevents this. Now to be able to access the firmware, it takes advantage of a zero-day vulnerability in the firmware.  It is able to install itself in the firmware, overwriting a network function call. In the end it installs a Remote Access Terminal to allow remote access of the controller. This could have been a vector to download further payloads, but no evidence was found that this RAT was actually used. It attempts to remove traces of itself after installation.

Source code of the program is at  https://github.com/ICSrepo/TRISIS-TRITON-HATMAN .

Zero day attacks are a continuing challenge as by definition they are not widely known before they are used for an attack. However a secure by design approach reduces the attack surface for exploits. There were opportunities to detect the malware on the network and the windows host.

Update: A cert advisory for Triton appears in https://ics-cert.us-cert.gov/advisories/ICSA-18-107-02 and “Targeted Cyber Intrusion Detection and Mitigation Strategies” in https://ics-cert.us-cert.gov/tips/ICS-TIP-12-146-01B

Javascript Timing and Meltdown

In response to meltdown/spectre side-channel vulnerabilities, which are based on fine grained observation of the CPU to infer cache state of an adjacent process or VM, a mitigration response by browsers was the reduction of the time resolution of various time apis, especially in javascript.

The authors responded with alternative sources of finding fine grained timing, available to browsers. An interpolation method allows obtaining of a fine resolution of 15 μs, from a timer that is rounded down to multiples of 100 ms.

The javascript  high resolution time api is still widely available and described at https://www.w3.org/TR/hr-time/ with a reference to previous work on cache attacks in Practical cache attacks in JS

A meltdown PoC is at https://github.com/gkaindl/meltdown-poc, to test the timing attack in its own process. The instruction RDTSC returns the Time Stamp Counter (TSC), a 64-bit register that counts the number of cycles since reset, and so has a resolution of 0.5ns on a 2GHz CPU.

int main() {
 unsigned long i;
 i = __rdtsc();
 printf("%lld\n", i);
}

One Time Passwords for Authentication

A MAC or Message Authentication Code protects the integrity and authenticity of a message by allowing verifiers to detect changes to the message content. It requires a random key generation algo that produces a per-message random key K, a signing algorithm which takes K and message M as input and produces signature S, and a verifying algorithm with takes K,  M and S as input and produces a binary decision to accept or reject the message. Unlike a digital signature a MAC typically does not provide non-repudiation. It is also called a protected checksum. Both sender and recipient of the K and M share a secret key.

HMAC: So called Hashed-MAC because it uses a cryptographic hash function, such as MD/SHA to create a MAC. The computed value is something only someone with the secret key can compute (sign) and check (verify). HMAC uses an inner key and an outer key to protect against length extension and collision attacks on simple MAC signature implementations. RFC 2104. It is a type of Nested MAC (NMAC) where both inner and outer keys are derived from the same key, in a way that keeps the derived keys independent.

HOTP: HMAC-based One Time Password.   HOTP is based on an incrementing counter. The incrementing counter serves as the message M, and when run through the HMAC it produces a random set of bytes, which can be verified by the receiving party. The receiving party keeps a synchronized counter, so the message M=C does not need to be send on the wire. RFC 4226.

TOTP: Time-based One-time Password Algorithm . TOTP combines a secret key with the current timestamp using a cryptographic hash function to generate a one-time password. Because network latency and out-of-sync clocks can result in the password recipient having to try a range of possible times to authenticate against, the timestamp typically increases in 30-second intervals. Here the requirement to keep the counter synchronized is replaced with time synchronization. RFC 6328.

 

OCSP validation and OCSP stapling with letsencrypt

Online Certificate Status Protocol (OCSP) is a mechanism for browsers to check the validity of certificates presented by HTTPS websites. This guards against revoked certificates. This has been an issue for big websites, which had bad certs issued and had to be revoked. Google has stated its intent to begin distrusting Symantec certs in 2018. A counterpoint to Google appears in this interesting article which notes deficiencies in Chrome’s implementation of OCSP, and of privacy issue for the website visitors with OCSP checks.

Let’s look at what Mozilla is doing about this, as they have attempted to implement OCSCP correctly.

https://bugzilla.mozilla.org/show_bug.cgi?id=1366100

Telemetry indicates that fetching OCSP results is an important cause of slowness in the first TLS handshake. Firefox is, today, the only major browser still fetching OCSP by default for DV certificates.

In Bug 1361201 we tried reducing the OCSP timeout to 1 second (based on CERT_VALIDATION_HTTP_REQUEST_SUCCEEDED_TIME), but that seems to have caused only a 2% improvement in SSL_TIME_UNTIL_HANDSHAKE_FINISHED.

This bug is to disable OCSP fetching for DV certificates. OCSP fetching should remain enabled for EV certificates.

OCSP stapling will remain fully functional. We encourage everyone to use OCSP stapling.

So they are moving away from OCSP to OSCP stapling. From wikipedia, “OCSP stapling, formally known as the TLS Certificate Status Request extension, is an alternative approach to the Online Certificate Status Protocol(OCSP) for checking the revocation status of X.509 digital certificates.[1] It allows the presenter of a certificate to bear the resource cost involved in providing OCSP responses by appending (“stapling”) a time-stamped OCSP response signed by the CA to the initial TLS handshake, eliminating the need for clients to contact the CA”

How to setup OCSP stapling with letsencrypt:

The CSR request can request OCSP_MUST_STAPLE option (PARAM_OCSP_MUST_STAPLE=”yes”). The issued cert will have the parameter  with OID 1.3.6.1.5.5.7.1.24 and an OCSP URI which are shown with

openssl x509 -in cert.pem -text  |grep 1.24

1.3.6.1.5.5.7.1.24:

openssl x509 -noout -ocsp_uri -in cert.pem

http://ocsp.int-x3.letsencrypt.org

This changes the cert itself that is issued, so independent of haproxy/nginx configuration that disables OCSP, the browser checks this cert and shows the error. Also this parameter is not shown in the safari/firefox on cert inspection, but shows up on command line as param 1.3.6.1.5.5.7.1.24 being present in the downloaded cert.

https://esham.io/2016/01/ocsp-stapling

  1. Figure out which of the Let’s Encrypt certificates was used to sign your certificate.
  2. Download that certificate in PEM format.
  3. Point nginx to this file as the “trusted certificate”.
  4. In your nginx.conf file, add these directives to the same block that contains your other ssl directives:
    ssl_stapling on;
    ssl_stapling_verify on;
    ssl_trusted_certificate ssl/chain.pem;

Weeping Angel Remote Camera Monitor

Sales of hardware camera blockers and similar devices should increase, with the Weeping Angel disclosure. Wikileaks detailed how the CIA and MI5 hacked Samsung TVs to silently monitor remote communications. Interesting to read for the level of technical detail: https://wikileaks.org/vault7/document/EXTENDING_User_Guide/EXTENDING_User_Guide.pdf . The attack was called ‘Weeping Angel’, a term borrowed from Doctor Who.

Other such schemes are described at https://wikileaks.org/vault7/releases/#Weeping%20Angel , including a iPhone implant to get data from your phone – https://wikileaks.org/vault7/document/NightSkies_v1_2_User_Guide/NightSkies_v1_2_User_Guide.pdf .

IOT security vulnerabilities and hacks

Mirror of https://github.com/nebgnahz/awesome-iot-hacks :

A curated list of hacks in IoT space so that researchers and industrial products can address the security vulnerabilities (hopefully).

Thingbots

RFID

Home Automation

Connected Doorbell

Hub

Smart Coffee

Wearable

Smart Plug

Cameras

Traffic Lights

Automobiles

Airplanes

Light Bulbs

Locks

Smart Scale

Smart Meters

Pacemaker

Thermostats

Fridge

Media Player & TV

Toilet

Toys

The Dyn DNS DDOS Attack Oct 21

DYN is a DNS provider internet infrastructure company. It’s the name behind widely used DynDNS. It supports DNS for twitter, visa, github, mongo, netflix and several other big tech sites.

Doug Madory a researcher at DYN, presented a talk  on DDoS attacks in Dallas at a meeting of the North American Network Operators Group (NANOG) 68 – his was the last talk on Oct 19, wednesday. The talk discussed the attack on Krebs on Security last month and details other such attacks.

On Friday several sites serviced by DYN were attacked in a distributed denial-of-service (DDoS) attack.

The distributed denial-of-service (DDoS) attack involved malicious DNS lookup requests from tens of millions of IP addresses including a botnet on a large number of IoT devices infected with the Mirai malware, which is designed to brute force security on any IoT device. There are cameras involved with fixed passwords that are burned into the firmware, that cannot be changed.

The implications of IoT devices that are 1) unsecure 2) impossible to secure and 3) infected by malware and 4) controlled by a botnet that is controlled by malicious intent are made clearer with this attack.

Update on DDOS mitigation:  RFC 3882,  Configuring BGP to block Denial-of-Service attacks, discusses Remote Triggered Black Hole (RTBH) method, to configure certain routers to selectively stop malicious high volume traffic which is targeting a particular IP.  The target site is made inaccessible but the rest of the network service stays active. An example configuration is at https://networkengineering.stackexchange.com/questions/10857/use-bgp-to-defend-against-a-ddos-attack-originating-from-remote-as .  The improvement is to reduce the side-effects of such a delisting, to enable faster recovery when the attack is over.