Uber Security (Keys on Github)

As information-driven physical-world services like Uber, AirBnB and Square become more common they bring up some unique security issues for the interacting parties. To make the service effective they collect and store a large amount of user data. This data can be compromised as data needs to be shared not only with users but also with third party apps.  Then there is a threat of physical assault, physical damage and stolen card data.

At minimum, it is imperative to have a comprehensive information security program that protects the core data collection/processing pipeline and extends outwards to a) services built on top of the data and b) physical identities of the parties involved to assure them of trust in a brief interaction enabled by the information.

This article discusses how 50,000 driver information were compromised at Uber. The driver database keys were found on github ? How is that possible ?

If it is possible then it is a security incident that needs visibility, not just into the information within an enterprise but also outside it. The security incident and event monitoring products that exist (e.g. ArcSight, Bit9, CrowdStrike, Tanium) barely scratch the surface of this requirement – the haystack is bigger than we think it is and the needle we don’t know in advance.

The physical security is harder to deal with. One thing becomes apparent is that the reason the supply of hotels, cabs, even credit card issuers was constrained was due to legislation and regulations that were designed to create a high bar for an offering and build a high level of trust between the interacting parties.

Those lines are being redrawn with technology. The people impacted by the technology should be part of the conversation in coming up with appropriate ways to regulate the offerings to maintain security and safety.

Cassandra and the Internet of Boilers

A fascinating story about use of Cassandra for analyzing sensor data from boilers to predict their failuresin UK homes by British Gas appeared here.

The design of Cassandra is intuitively clear to me in its use of a single primary index to distribute the query load among a set of nodes that can be scaled up linearly. It uses a ring architecture based on consistent hashing. It emphasizes Availability and Partition-Tolerance over Consistency in the CAP theorom.

The data structure is a two level hash table, with the first level key being the row key, and the second level key being the column key.

Where Cassandra differs from a SQL db is in the flexibility of the data model. In SQL one can model complex relationships, which allow for complex queries using joins to be done. Cassandra has support for CQL (Cassandra Query Language) which is like SQL but does not support joins or transactions.  The impact is that the queries with CQL cannot be as flexible (or adhoc) as those for SQL. The kind of queries that can be done have to be planned in advance. Doing other queries would be inefficient. However this drawback is mitigated by use of Spark along with Cassandra. In my understanding the Spark cluster is run in a parallel Cassandra cluster.

Why are joins important ? It goes back to relationships in an E-R diagram. Can’t we just model entities ? When we store Employees in one table and Departments in another in a SQL db, each row has an id which is a shorthand for the employee or the department. This simplification forces us to look up both tables again via a join in a query – say when asking for all employees belong to (only) the finance department. But tables like departments may be small in size so they could be replicated in memory for quickly recovering associations. And tables like employees can be naturally partitioned by the employee id which is unique. This means that SQL and complex relationships may not be needed for number of use cases. If ACID compliance is also not a requirement, then nosql is a good bet. Cassandra differs from MongoDB in that it can scale much better.

Quote from British Gas: “We’re dealing largely with time series data, and Spark is 10 to 100 times quicker as it is operating on data in-memory…Cassandra delivers what we need today and if you look at the Internet of Things space; that is what is really useful right now.”

Here’s a blog that triggered this thought along with a talk by Rachel@datastax, who also assured me that Cassandra has been hardened for security and has Kerberos support in the free version.

British Gas operates Hive, a competitor to Nest for thermostats. Note that couple months back British Gas reported 2200 of its accounts were compromised.

CERT Warns Wind Turbines Open to Compromise

Cert issued a warning that certain wind turbines are open to compromise.

“A successful attack would allow the malicious actor to lock out a legitimate administrator and take control of the device. .. the vulnerability is easy to exploit by an attacker who does not need to be authenticated to the device, or have direct physical access to it.”

A fix is issued but no OTA updates supported .. imagine climbing each turbine to upgrade the software.

Couple days earlier CERT issued an advisory about gas detectors being compromised. Incorrect gas level reports could be hazardous to equipment and human life.

DARPA asked for proposals around automatic detection and patching of security vulnerabilities.  In addition it raised an alert abut power grid vulnerability and proposed a plan to recover from a massive power grid attack. The power grid has faced hundreds of attacks, partly because it relies on 1970s era technology which cannot be upgraded as service cannot be interrupted. The addition of SmartMeters which make it more connected can increase the vulnerability level.

Spark, Storm, Ayasdi, Hadoop

The huge amount of data that IOT systems will generate will call for analyses of different types. A brief review of some systems and what they are used for.

Apache Spark: Uses distributed memory abstractions for primarily in-memory processing. Built with Scala. Useful for finding data clusters and for detecting statistical anomalies by looking for distance from the cluster. Comes with a machine learning system on top. Does not come with its own file system (use nfs/hdfs). Useful for complex processing, where state needs to be maintained in the event stream for correlations. Described as ‘batch processing with micro-streaming analysis’, but looks headed to cover streaming analyses as well.

Apache Storm: Real-time Streaming data analysis. Developed at Twitter, written in Clojure. Unlike Hadoop which has two layers (map, reduce), Storm can have N layers and a flexible topology consisting of Spouts (data source units) and Bolts (data processing units). Storm has been superceded by Heron in terms of performance. IBM Streams is a commercial offering also for stream processing..

Ayasdi: Topological data processing allows one to discover what interesting features of the data are, without knowing what to look for in advance. This is in contrast to most systems where one needs to know what one is looking for. Claims insight discovery.

Hadoop: Used for batch processing of a large amounts of data, using map/reduce primitives. Comes with HDFS. Cloudera (and others) have made significant improvements to it with an interactive SQL interface and usability improvements for BI (Impala).

InfluxDB: Time-series db for events and metrics. Optimized for writes and claims to scale to IOT workloads.

ZooKeeper: A coordination service for distributed applications.

Amazon S2N and OpenSSL

In the last few years a number of OpenSSL vulnerabilities have come to light.  Heartbleed was a critical one which was exploited in the field. It basically allowed a client to send a malicious heartbeat to the server and get back chunks of server memory – which can contain passwords. It was estimated that two thirds of the servers in the world had the vulnerability. The fix was to upgrade OpenSSL, revoke existing server certs and request new SSL server certs.

Heartbleed previously triggered OpenBSD to fork OpenSSL to LibreSSL and Google to fork OpenSSL to BoringSSL.

Amazon S2N is a TLS/SSL implementation that is 6000 lines of code – so it is small, compact, fast and its correctness can be more easily verified. It uses only crypto functions from openssl and reimplements the SSL layer. This is a healthy direction for IOT and for certification of SSL, for example FIPS. S2N is short for Signal to Noise.

A timing attack was recently identified against it and has since been mitigated.

Note that two factor auth solutions would actually solve the problem presented by Heartbleed. There are several solutions in this area – Authy, Clef, Google Authenticator, Duo, Okta, Oracle Authenticator, ..

Docker Container Security

A block diagram of docker is below and a description of docker daemon is here. The docker client commands talk to the docker-daemon to start one of the containers in the docker registry, or to start a process described in the command line as a new docker container. Docker provides a simple interface to linux container technology which is a lightweight VM.

docker-containers-vms

A few problems with this. Who has access to the docker-daemon to control the containers ? How is integrity of the containers ensured ? How is the host protected from the code running in the containers ?

Docker recently announced a few security features in Nov DockerCon

  • to lock down the container in a registry with the container image signed with a key from hardware device Yubikey; see here for a description of original issue where image checksums were not verified by docker daemon
  • to scan the official container images for vulnerabilities
  • to run containers with a userlevel namespace instead of one that allows root access to the host. This protects the host OS as explained here. The userlevel namespace feature has been available in LXC for over an year, but not in docker.

For access control to the docker daemon there is activity with a design doc here.

Twistlock is a container security and monitoring tool that attempts a comprehensive approach – access control to the containers, runtime scanning of files for malware signatures, vulnerability scanning, looking at network packets, so on. A recent meetup on Dec 1 discussed this product. It features integration with Kerberos and LDAP.

In terms of the kernel,  processes from all containers share the same kernel, the same networking layer. So what’s the level of isolation provided to container processes. This depends on vulnerabilities in the processes themselves – how many ports are open, whether injection attacks are possible etc. If two containers are running processes and a process from the one attacks a process from another – for example memory scraping, then Twistlock can detect it only if it can identify the offending process as malware using signature matching.

A Dockerfile is used to specify a container image using commands to spec the base os, rpms, utilities and scripts. USER specifies the userid under which the following RUN, CMD or ENTRYPOINT instruction run. EXPOSE specs a port to be opened for external access. A docker image is built from the dockerfile and contains the actual bits needed for the container to run. The image can be loaded directly or pushed to a docker registry from  which it can be pulled to clients. 

Commands:

docker build -t <imgnametag> . # build image from Dockerfile in current directory

docker run -i -t <imgnametag> /bin/bash

docker login // registry 

docker push

docker pull

docker-compose [down|up] // docker-compose.yaml

docker images

docker export <container>

docker save <image> -o imgtag.tar

“Computer Detective in the Cloud”

Although light on details, this is an application of AI for securing against credit card fraud in real time using cloud computing.

AI has been in the news a few times this month – Google (TensorFlow), Facebook (new milestones in AI), Microsoft releasing Cortana (Nadella welcomes our AI overlords) and mention of an AI spring from IBM and Salesforce.

Machine learning has also been applied to spam detection, intrusion detection, malicious file detection, malicious url detection, insurance claims leakage detection, activity/behaviour based authentication, threat detection and data loss prevention.

Worth noting that these successes are typically in narrow domains with narrow variations of what is being detected. Intrusion detection is a fairly hard problem for machine learning because the number of variations of attacks is high. As someone said, we’ll be using signatures for a long time.

The previous burst of activity around neural networks in the late 80’s and early 90’s had subsided around the same time as the rise of the internet in the mid to late 90’s. Around 2009, as GPU’s made parallel processing more mainstream, there was a resurgence in activity – deeper, multilayer, networks looking at overlapping regions of images (similar to wavelets) lead to convolutional neural networks being developed. These have had successes in image and voice recognition. A few resources – GPU gems for general purpose computing, visualizing convolutional netscaffe deep learning framework.

Kafka Security

Kafka is a system for continuous, high throughput messaging of event data, such as logs, to enable near real-time analytics. It is structured as a distributed message broker with incoming-event producers sending messages to topics and outgoing-event consumers.  Motivations behind its development include decoupling producers and consumers from each other for flexibility, reducing time to process events and increasing throughput. Couple analogies to think of it are a sender using sendmail to send an email to an email address (topic);  or a message “router” which decides the destination for a particular message – except Kafka persists the messages until the consumer is ready for them. It is an intermediary in the log processing pipeline – there is no processing of data itself on Kafka – there are no reads for instance. In contrast to JMS, one can send batch messages to Kafka and individual messages do not have to be acknowledged.

A design thesis of Kafka is that sequential (contiguous) disk access is very fast and can be even faster than random memory access. It uses zero copy, and uses a binary protocol over TCP, not HTTP.  A quote from design link – “This combination of pagecache and sendfile means that on a Kafka cluster where the consumers are mostly caught up you will see no read activity on the disks whatsoever as they will be serving data entirely from cache”.  This along with the distributed design makes it faster than competing pub-sub systems.

A proposal for adding security to it has been underway, for enterprise use, to control who can publish and subscribe to topics – https://cwiki.apache.org/confluence/display/KAFKA/Security . A talk on Kafka security by HortonWorks on integrating Kerberos authentication, SSL encryption with Kafka was given at a recent meetup. The slides are at – http://www.slideshare.net/harshach/kafka-security.

Of interest was an incident where the SSL patch caused the cluster to become unstable and increase latencies on a production cluster. The issue was debugged using profiling. Although SSL did increase latencies, this specific issue was narrowed to a bug unrelated to SSL in the same patch which had to do with zero copy.

How does IOT affect Identity and Access Management ? 

For the purpose of the IOT, an individual device can be abstracted as a specialized service which produces and consumes data. In addition, the device has certain capabilities to act on, or transform data on a discrete or continuous basis.

Who should have access to these services and capabilities ? It could be

  • other devices in proximity to the device
  • external services
  • certain users

Who gets access is a function of the identity of the devices, the identities of the entities accessing the service and policies governing access (which can include parameters such as location, time, role or more complex rules).

To determine access, a device should be capable of

  • identifying itself , its services and capabilities
  • obtaining authorization for the services and capabilities (before exercising them), and presenting these when requested. This authorization includes a signed access policy
  • updating or invalidating the access policy as time goes on

The access policies need to be applied to the data flows based on the identities and be rich enough to capture use cases of interest.

Identity of ‘Things’ in IOT

What’s the identity of the device ? There can be multiple identities based on whether the device is identifying itself to a user, to another device of the same type, or to other devices in the ecosystem that it is a part of (say a component of a car).

Having a unique device id and leveraging it for the services that are built on the device is a design choice. Consider the choices for iPhone and Android. In the iPhone the device id permeates the application layer; the application developer and can target his application for specific devices and must register the device for developing on it. This design choice allows the device to check the applications that are run on it are valid and their associated developer is registered with Apple. It strengthens the associations in the ecosystem of devices, developers, applications and users.

In Android the security certificates were at the JVM layer which allows self-signed certificates. Here the device id is not used as a strong identifier that is known to applications and developers. This is one reason the open system is more prone to malware.

A unique hardware identity is something to look for in IOT designs. Here’s an article from Intel/McAfee discussing EPID an immutable device ID that can be used for identifying and also anonymizing. https://blogs.mcafee.com/business/intels-iot-gateway-enhancements/

Update: On Nov 25, news came of a number of IOT devices using the same HTTPS certificate and SSH keys. See here. Large clusters of devices on the net are exposed on the internet this way.

Biometric User Identification for IOT

Two-Factor authentication solutions are based on the premise that the combined verification of (i) a thing possessed (a card) and (ii) a piece of information known to the user (a pin or password) provides a high degree of assurance to authenticate users. For financial and enterprise transactions it gives a high level of security. But 2FA is not a seamless solution – as the number and variety of services and devices for a user increases – it requires the user to carry a number of cards/tokens/devices and remember several passwords (that are unique, complex, updated). It is also not a foolproof solution as the identity theft continues to be a problem.

With the large number of IOT applications and devices appearing, the problem will become worse. Consider a health monitoring device that needs to periodically share information of a patient with her family members and doctor, while keeping the information safe from cloud attacks. Or consider keyless entry to vehicles or homes. For such common use cases entering complex passwords would be cumbersome.

With biometric authentication methods, as present with fingerprint based authentication on Apple and Samsung phones, there is a more direct identification of the user. But the way this is commonly used is not to eliminate passwords completely – it is typically used to

  1. store existing passwords securely,
  2. reduce repeated password entry by extending session created by an existing password
  3. combined with a user identifier such as a phone number or email address
  4. combined with a password (e.g. for byod deployments where multiple users can register fingerprints)

One can imagine a two factor auth where both factors are biometric, such as multiple fingerprints, or fingerprint and iris authentication. Such a two factor biometric approach could eliminate the need to remember passwords and reduce friction in accessing services securely. An example is the combination of facial recognition and fingerprint recognition.

Biometric authentication methods being worked on include gait recognition and voice biometrics. These can be included in a continuous authentication method.

SecureAuth and BehavioSec Auth Presentation, Palo Alto

IDC gave a good security landscape overview at the SecureAuth executive luncheon today in Palo Alto.

SecureAuth provides a flexible adaptive authentication system that balances security with user experience.

BehavioSec does biometric authentication based on user behavior such as the pattern of keystrokes when entering a password. It builds a statistical profile and them determines if the password is entered anomalously. It provides collector SDKs to collect this information from mobile apps and websites.

In case of a large difference between the expected pattern and the current pattern, the SecureAuth integration forces a step up auth to a second factor.

There is adoption of this kind of technology in banking, retail and other verticals.

Security Acquisitions Oct 2015

Lancope, Viewfinity, Vormetric, LogEntries, Boxer, Secure Islands, Silanis

http://www.infoworld.com/article/3000479/security/security-acquisitions-reach-a-fever-pitch.html

Lancope – StealthWatch provides a visual representation of the network to detect anomalies that could signify an attack. In the event of an infection, StealthWatch analyzes traffic between servers to determine which hosts were affected. Acquired by Cisco, $453m.

Viewfinity – Endpoint security for windows. App control features and administrative privilege capabilities to protect against zero-day attacks, malware and threats.

Vormetric – Filesystem encryption, keeping metadata in clear and enterprise key-management for third party encryption keys. Acquired by Thales Security for $400m

LogEntries – machine data search technology to help security teams  investigate security incidents deeply. Spun out of University College Dublin (UCD). Acquired by Rapid7, $68m. 3k customers.

Boxer – Android email app, acquired by VMWare

Secure Islands –  IQProtector looks at content and wraps/protects it based with policy based DRM automatically. “Secure Islands’ Data Immunization uniquely embeds protection within information itself at the moment of creation or initial organizational access. This process is automatic and accompanies sensitive information throughout its lifecycle from creation, through usage and collaboration to storage.” Acquired by Microsoft.

Silanis – e-Signatures with strong crypto algorithmic and keys

ThingWorx IOT Platform and Marketplace

The premise behind ThingWorx is that manufactured products are transforming into services. A product can be remotely monitored, maintained, and its data analyzed as part of the extended service wrapper. It is an interesting point of view on the evolution of products.

GE provides the engine not as a product but as a service, it continues to maintain it after the sale. Boeing provides the plane as a service, it continues to maintain it after the sale.

ThingWorkx claims to makes it easier for any product to be converted to such a service. It’s not clear how this works with legacy systems – whether it is an agent or a wrapper and how easy it is to add. Its security whitepaper discusses authentication, authorization, encryption, security models, audit etc.

Imagine a hyperconnected supply chain consisting of components that are tracked back by their supplier. Security and access controls would be a challenge in such a dynamic environment.

An example of a product/application on ThingWorx is Velio OBD device and Velio Webhook application.  The Webhook application displays basic data coming from OBD modules: GPS, accelerometer and OBD-II. It enables users to create customized views depicting the data that is important to them while also enabling access to both live and historical data. The application will be available in the ThingWorx Marketplace.

Some competitors include  Spark DevicesAyla NetworksCarriotsXively, Axeda, Arrayent and Berg Cloud.