Author: Ruchir Tewari

Brief history of cloud time

Amazon Web Services (AWS) started with elastic and scalable compute and storage offerings – an elastic compute cloud (EC2) and a simple storage service (S3). EC2 was based on virtual machines. S3 was a simple key value store with an object store backend.

Heroku came up with a deploy an app with a single command line model – heroku deploy. It would allow automation of deployment of rails apps. It was an early example of a platform as a service (PAAS). The app ran in a virtualized linux container called a dyno.

NASA engineers wanted to store and process big data and tasked an internal team and Rackspace to build a web application framework for it. When they tried to use the existing aws offerings, they found the time to upload the data to AWS would take too long. They came up with their own implementation to manage virtual machines and handle storage, with openstack. It provided EC2 and S3 like apis and worked with certain hypervisors.

Greenplum which is a big data company acquired by EMC, wanted a framework to make use of big data easily. EMC had brought VMware which brought Cloudfoundry. Greenplum worked with VWware and a dev company pivotal labs to build up Cloudfoundry as a mechanism to deploy apps so they could be scaled up easily. Pivotal got acquired by VMware and later spun out as Pivotal Cloudfoundry to provide enterprise level support to the open source cloudfoundry offering.  This had participation from EMC, VMWare (both were one already), and a third participant – GE with an interest to build an industrial cloud.   Cloudfoundry forms the basis of the GE predix cloud.

Cloudfoundy is now a 3rd gen PAAS which works on top of an IAAS layer such as openstack or vmware vsphere or even docker containers running locally via a Cloud Provider Interface (CPI).

Another interesting project is Apache Geode, an in-memory database along the lines of Hana.

Meanwhile Amazon and Azure increase the number of webservices  available rapidly as well as reducing their costs.

There was a meetup on Ranger recently at Pivotal labs which discussed authorization for various components in the Hadoop ecosystem, including Kafka.

OAuth Threat Model, Mobile and Oracle Mobile Security Suite

Researchers in Germany recently found two previously unknown vulnerabilities in OAuth2.0, in the publication A Comprehensive Formal Security Analysis of OAuth 2.0.

An extensive OAuth thread model RFC 6819 exists. A SAML threat model by OWASP is discussed in – On Breaking SAML: Be whoever you want to be. OAuth and SAML protocols serve as prototypes for OpenId Connect protocol – which implements an authentication layer as an extension of authorization layer of OAuth2 and has a similar threat model. The above two vulnerabilities occur in both OAuth2 and OpenID Connect.

In the Microsoft paper, OAuth demystified for Mobile Application Developers, the authors discussed why OAuth is so easy to get wrong for mobile apps – it was originally designed for websites, not mobile. The browser redirection step that OAuth heavily relies on, cannot be performed securely on iOS in the general case. Moreover the complexity of the protocol means that out of 149 apps tested, 60% were faulty and vulnerable.

This is interesting because in the Oracle Mobile Security Suite we developed at BitzerMobile, these problems were addressed in a bulletproof manner for enterprise applications, by wrapping web access protocols into a simpler challenge response protocol that was injected securely into mobile apps. The app and the device are authenticated independently of the user. This reduces the attack surface considerably, for a common use.

A useful summary of attacks and countermeasures from OAuth and OpenID Connect Risks and Vulnerabilities follows:

Attack Category Countermeasures
Extracting credentials or tokens in captured traffic TLS encryption
Impersonating authorization server or resource server TLS server authentication
Manufacturing or modifying tokens Issue tokens as signed JWTs
Redirect manipulation Require clients to declare redirect URIs during registration
Guessing or interception of client credentials Use signed JWTs for client authentication
Client session hijacking or fixation Use the State parameter to ensure continuity of client session throughout the OAuth flow

The Relying Party (RP), aka Client (in unfortunately overloaded OAuth parlance), aka ‘Relying’ Service Provider (in Kerberos parlance), is a third party service which needs access. It may not have an incentive to follow the protocol intent (e.g. enforce adequate controls). Or it may willing to accept more data than the protocol intended.

The latter is the case in the first attack, where the IDP sends a 307 temporary redirect resulting in password form data being sent to the RP. The RP receives the user creds. The fix is to use a 302 Found.

The second attack has to do with HTTPS not being used in the path between the user and the RP, which allows a MITM to manipulate the request and send it to an Attacker IDP instead of the Honest IDP. The fix is for the RP to use HTTPS instead which is immune to MITM (but who enforces this ? the redirect is too quick for the user to inspect the site certs).

Note: OAuth terminology becomes clearer to an enterprise mindset by expanding the terms auth server, resource owner, resource server, client. The “client” is an untrusted service w.r.t. user creds but needs temporary access to HTTP resources. Instead of using the “resource owners” username/password, it  must acquire an access_token for this access.  In correspondence with Kerberos/SAML, the functional roles are

1. Identity Provider (=IDP =auth server = KDC. e.g. twitter, fb as a pure login service),

2. Trusted Server (=resource server, that is in same trust zone as IDP. e.g. twitter tweets/profile, fb photos/profile),

3. Relying Service ( a delegate service that wants to provide additional functionality using data from Trusted Service and user identity from IDP, e.g. filter tweets, mash photos, partner HR app; called client in OAuth). The client_id/client_secret are created for this entity – a bit like an independent KDC principal name. The client is a Relying Service which is in a different trust domain from the IDP/Trusted Service and needs an access token. This Relying Service may need to authenticate itself to the Trusted Service – which is done using a client_secret, in cases where the RS can keep a secret (i.e. in web apps, not mobile and browser based apps). One would need as many clients_ids as the number of clients that lie in independently revocable auth domains.

4. the User (=resource owner, a person or organization who has credentials/data stored in the IDP/TS and desires use of the Relying Service with existing IDP credentials).  The “authorization grant” is a  credential that identifies the user and correspond to the Kerberos Ticket Granting Ticket aka TGT.

There are four oauth authorization grant types: authorization_code, implicit, resource owner, client_credentials. Authorization_code authenticates both user and client. Implicit authenticates only the user, not the client (no client_secret, no authorization grant). In Resource_owner the client does authenticate itself, but gets the access token directly. Client credentials (RS-TS) is the simplest one and is used for M2M communications, here the client is acting on its own behalf, not the user. The type depends on the method used by the client to request authorization and the types supported by the authorization server.

The difference from the NFS/KDC case is that in the NFS case the additional (server) principals is created for the NFS server, not the NFS client. The focus in NFS is on not sending the password over the wire. In OAuth the focus is on not giving the password to the client.

Let’s Encrypt. Less Green ?

Letsencrypt.com is a service conceived to reduce the friction in enabling HTTPS on a website, by automating SSL certificate creation, validation, signing, installation and renewal. The server certificate setup which used to take hours can be done in a minute. Encryption will reduce the incidence of man-in-the-middle (MITM) attacks, which can easily insert or modify the javascript in transit.

Some of this is driven by Mozilla and its large public backers with perhaps an interest in showing the green bar and lock for more websites. A self-signed cert would also provide free encryption, prevent MITM attacks and be easy to setup but would throw an untrusted connection alert to the user.

So is LetsEncrypt encryption enough to show a green bar for a website ? Because regular certification schemes require a purchase, one has to go through a credit card verification step before being issued their cert. Certs with Extended Validation have more steps to go through. There are three types of certs based on level of validation – DV, OV, EV. Doman Validation (DV) does not try to check identity of the user and is what LetsEncrypt automates using a challenge-response scheme. Clicking on websites which use LetsEncrypt DV confirms that they display a green lock/bar (using firefox).

The problem with a widely accepted CA which has a zero cost barrier for setting up HTTPS is similar to that with the free precursor to OpenDNS.  A number of less than trustworthy websites can set themselves up as mirror images of trustworthy websites and send phishing attacks by email or sms, and an end-user has no way of telling the difference. Here’s a link on how to do just such a phishing attack with LetsEncrypt. So is LetsEncrypt making the web less secure ?

It’s true that the large number of CAs with their diverse validation mechanisms makes the existing scheme not so great – especially when CAs are compromised and/or issue bad certs (e.g Superfish, Comodo, NIC). However one could inspect the CA trusted authority and if there was reason to believe it is not trustworthy – e.g. see this pic (Chris Palmer), one could avoid clicking the link.

I think the average user should receive a better visual indication on the level of trust provided by a LetsEncrypt cert that has undergone a lower level of validation by design. Use a less green color ?

End users should be more aware of the certification process and get into the habit of explicitly checking Cert chains for HTTPS by clicking on the green lock displayed next to the URL.

Update: The owner field is not defined in a Domain Validated cert like ones issued by LetsEncrypt. https://superuser.com/questions/1042383/how-to-set-the-owner-of-certificate-generated-by-lets-encrypt

Cloud Access Security Brokers

Skyhigh Networks which was mentioned in my previous post  is one example of a Cloud Access Security Broker. There are several other companies in this field – NetSkope, Bitglass, Palerra, CipherCloud, Elastica, Adallom, Zscaler, some more familiar than others.

Gartner says the penetration will be 25% in the enterprise in 2016. Here’s a comparison table.

There is considerable scope for innovation in this space based on different user requirements – for instance the amount and type of sharing of the data affects the architecture. Homomorphic encryption while theoretically possible is not yet practical. I expect to see more differentiation before consolidation of different technologies in the domain.

Real World Crypto 2016

SSL/TLS dominated the conference with talks on its use at FB, Google, Amazon and modifications in the direction of TLS 1.3, with presentations on  QUIC, OPTLS, S2N and more, covering things like lower latency, forward secrecy, better ciphers. Some of the optimizations can have an impact on data center operations efficiency.

The timing attack on S2N protocol was interesting – the KL divergence measures the difference between two probability distributions and can be used to leak information from an SSL stream.

Privacy preserving operations on encrypted data (Skyhigh) talk was also interesting. Paul Grubbs discussed searchable symmetric encryption tradeoffs and open questions around a stateless SSE. In case of an encryption proxy, who maintains it ? The client would find it cumbersome. So an encrypted index is maintained by Skyhigh. This is not easy to manage. Also if one wants *both* security and privacy the search times and/or the number of roundtrips goes up.

Cryptol is a software from Galois to simulate ciphers and is useful to model, verify and even implement them. It is written in Haskell and is open source. It comes with several examples including the Enigma cipher. I tried this and will blog about it later.

I expected some presentations on DNS security – e.g. DNSSEC and DANE; talked to attendees from Verisign on their offerings (ddos monitoring, threat intelligence graph). DNS operates over IP (vs an out-of-band method for updates/insertions); with DNSSEC the DNS server needs to trust the same CA as the origin server which feeds it the DNS record. The general feeling I think is the trust problem has to be solved at the application layer and attacks like the Kaminsky attack have been mitigated against.

Here’s a diagram of the QUIC protocol. The claim is zero RTT for a repeat (secure) connection to the server (75% of the time), by combining TCP and SSL handshakes into one and caching state on the client. A practical attack on QUIC is discussed in this paper, a type of adaptive chosen ciphertext attack, which references this paper by Zhang, Reiter et al, discussing more general PAAS attacks including an SAML SSO attack.

Perfect forward secrecy definition: A public-key system has the property of forward secrecy if it generates one random secret key per session to complete a key agreement, without using a deterministic algorithm  .

If the attacker starts recording SSL sessions and later gets a compromised server private key, he can decrypt the sessions, without forward secrecy. With TLS 1.2 forward secrecy is optional and with session resumption optimization it is effectively disabled. TLS1.3 mandates forward secrecy with DH key exchanges.

Spark and Scala

Spark is a general-purpose distributed data processing engine that is used for for variety of big data use cases – e.g. analysis of logs and event data for security, fraud detection and intrusion detection. It has the notion of Resilient Distributed Datasets. The “resilience” has to do with lineage of a datastructure, not to replication. Lineage means the set of operators applied to the original datastructure. Lineage and metadata are used to recover lost data, in case of node failures, using recomputation.

Spark word count example discussed in today’s meetup.

val textfile = sc.textFile("obama.txt")
val counts = textFile.flatMap(line=>line.split(" ")).filter(_.length>4).map(word=>(word,1)).reduceByKey(_+_)
val sortedCounts = counts.map(_.swap).sortByKey(false)
sortedCounts.take(10)

Scala is a functional programming language which is used in Spark. It prefers immutable datastructures. Sounds great! How are state changes done then ? Through function calls. Recursion has a bigger role to play because it is a way for state changes to happen via function calls. The stack is utilized for the writes, rather than the heap. I recalled seeing a spiral scala program earlier and found one here on the web. Modified it to find the reverse spiral. Here’s the resulting code. The takeaway is that functional programs are structured differently – one could do some things more naturally. It feels closer to how the mind works. As long as one get the base cases right, one can build large amount of complexity trivially. On the other hand, if one has to start top down and must debug a large call stack, it could be challenging.

// rt annotated spiral program.
// source http://www.kaiyin.co.vu/2015/10/draw-plain-text-spiral-in-scala.html
// reference: http://www.cis.upenn.edu/~matuszek/Concise%20Guides/Concise%20Scala.html
// syntax highlight: http://bsnyderblog.blogspot.com/2012/12/vim-syntax-highlighting-for-scala-bash.html
import java.io.{File, PrintWriter}
import scala.io.Source

object SpiralObj {   // object keyword => a singleton object of a class defined implicitly by the same name
  object Element {   // subclass. how is element a singleton ? there are several elements. has 3 subclasses which are not singetons
    private class ArrayElement(  // subsubclass, not a singleton
                                val contents: Array[String]  // "primary constructor" is defined in class declaration, must be called
                                ) extends Element
    private class LineElement(s: String) extends Element {
      val contents = Array(s)
    }
    private class UniformElement(  // height and width of a line segment. what if we raise width to 2. works.
                                  ch: Char,
                                  override val width: Int,   // override keyword is required to override an inherited method
                                  override val height: Int
                                  ) extends Element {
      private val line = ch.toString * width  // fills the characters in a line
      def contents = Array.fill(height)(line) // duplicates line n(=height) times, to create a width*height rectangle
    }
    // three constructor like methods
    def elem(contents: Array[String]): Element = {
      new ArrayElement(contents)
    }
    def elem(s: String): Element = {
      new ArrayElement(Array(s))
    }
    def elem(chr: Char, width: Int, height: Int): Element = {
      new UniformElement(chr, width, height)
    }
  }


  abstract class Element {
    import Element.elem
    // contents to be implemented
    def contents: Array[String]

    def width: Int = contents(0).length

    def height: Int = contents.length

    // prepend this to that, so it appears above
    def above(that: Element): Element = {      // above uses widen
      val this1 = this widen that.width
      val that1 = that widen this.width
      elem(this1.contents ++ that1.contents)
    }

    // prefix new bar line by line
    def beside(that: Element): Element = {     // beside uses heighten
      val this1 = this heighten that.height
      val that1 = that heighten this.height
      elem(
        for ((line1, line2) <- this1.contents zip that1.contents)
          yield line1 + line2
      )
    }

    // add padding above and below
    def heighten(h: Int): Element = {          // heighten uses above
      if (h <= height) this
      else {
        val top = elem(' ', width, (h - height) / 2)
        val bottom = elem(' ', width, h - height - top.height)
        top above this above bottom
      }
    }

    // add padding left and right
    def widen(w: Int): Element = {             // widen uses beside
      if (w <= width) this
      else {
        val left = elem(' ', (w - width) / 2, height)
        val right = elem(' ', w - width - left.width, height)
        left beside this beside right
      }
    }

    override def toString = contents mkString "\n"
  }


  object Spiral {
    import Element._
    val space = elem("*")
    val corner1 = elem("/")
    val corner2 = elem("\\")
    def spiral(nEdges: Int, direction: Int): Element = { // clockwise spiral
      if(nEdges == 0) elem("+")
      else {
        //val sp = spiral(nEdges - 1, (direction + 1) % 4) // or (direction - 1) % 4, but we don't want negative numbers
        val sp = spiral(nEdges - 1, (direction + 3) % 4) // or (direction - 1) % 4, but we don't want negative numbers
        var verticalBar = elem('|', 1, sp.height)        // vertBar and horizBar have last two params order switched
        var horizontalBar = elem('-', sp.width, 1)
    val thick = 1
        // at this stage, assume the n-1th spiral exists and you are adding another "line" to it (not a whole round)
        // use "above" and "beside" operators to attach the line to the spiral
        if(direction == 0) {
          horizontalBar = elem('r', sp.width, thick)
          (corner1 beside horizontalBar) above (sp beside space) //  order is left to right
        }else if(direction == 1) {
          verticalBar = elem('d',thick, sp.height)
          (sp above space) beside (corner2 above verticalBar)
        } else if(direction == 2) {
          horizontalBar = elem('l', sp.width, thick)
          (space beside sp) above (horizontalBar beside corner1)
        } else {
          verticalBar = elem('u',thick, sp.height)
          (verticalBar above corner2) beside (space above sp)
        }
      }
    }

    def revspiral(nEdges: Int, direction: Int): Element = { // try counterclockwise
      if(nEdges == 0) elem("+")
      else {
        //val sp = spiral(nEdges - 1, (direction + 1) % 4) // or (direction - 1) % 4, but we don't want negative numbers
        val sp = revspiral(nEdges - 1, (direction + 3) % 4) // or (direction - 1) % 4, but we don't want negative numbers
        var verticalBar = elem('|', 1, sp.height)        // vertBar and horizBar have last two params order switched
        var horizontalBar = elem('-', sp.width, 1)
    val thick = 1
        // at this stage, assume the n-1th spiral exists and you are adding another "line" to it (not a whole round)
        if(direction == 0) { // right
          horizontalBar = elem('r', sp.width, thick)
          (sp beside space) above (corner2 beside horizontalBar)
        }else if(direction == 1) { // up
          verticalBar = elem('u',thick, sp.height)
          (space above sp) beside (verticalBar above corner1)
        } else if(direction == 2) { // left
          horizontalBar = elem('l', sp.width, thick)
          (horizontalBar beside corner2 ) above (space beside sp) 
        } else { // down
          verticalBar = elem('d',thick, sp.height)
          (corner1 above verticalBar) beside (sp above space)
        }
      }
    }
    def draw(n: Int): Unit = {
      println()
      println(spiral(n, n % 4))  // %4 returns 0,1,2,3 .    right, down, left, up
      println()
      println(revspiral(n, n % 4))  // %4 returns 0,1,2,3   
    }
  }
}

object Main {
  def usage() {
      print("usage: scala Main szInt");
  }

  def main(args: Array[String]) {
    
    import SpiralObj._
    if(args.length > 0) {
        val spsize = args(0)
        Spiral.draw(spsize.toInt)
    } else {
    usage()
        println()
    }
  }
}


A note on tail-call recursion. If the last statement of function is a call to another function, then the return position of the called function is the same as that of the calling function. The current stack position is valid for the called function. Such a function is tail recursive and the effect is that of a loop – a series of function calls can be made without consuming stack space.

Uber Security (Keys on Github)

As information-driven physical-world services like Uber, AirBnB and Square become more common they bring up some unique security issues for the interacting parties. To make the service effective they collect and store a large amount of user data. This data can be compromised as data needs to be shared not only with users but also with third party apps.  Then there is a threat of physical assault, physical damage and stolen card data.

At minimum, it is imperative to have a comprehensive information security program that protects the core data collection/processing pipeline and extends outwards to a) services built on top of the data and b) physical identities of the parties involved to assure them of trust in a brief interaction enabled by the information.

This article discusses how 50,000 driver information were compromised at Uber. The driver database keys were found on github ? How is that possible ?

If it is possible then it is a security incident that needs visibility, not just into the information within an enterprise but also outside it. The security incident and event monitoring products that exist (e.g. ArcSight, Bit9, CrowdStrike, Tanium) barely scratch the surface of this requirement – the haystack is bigger than we think it is and the needle we don’t know in advance.

The physical security is harder to deal with. One thing becomes apparent is that the reason the supply of hotels, cabs, even credit card issuers was constrained was due to legislation and regulations that were designed to create a high bar for an offering and build a high level of trust between the interacting parties.

Those lines are being redrawn with technology. The people impacted by the technology should be part of the conversation in coming up with appropriate ways to regulate the offerings to maintain security and safety.

Cassandra and the Internet of Boilers

A fascinating story about use of Cassandra for analyzing sensor data from boilers to predict their failuresin UK homes by British Gas appeared here.

The design of Cassandra is intuitively clear to me in its use of a single primary index to distribute the query load among a set of nodes that can be scaled up linearly. It uses a ring architecture based on consistent hashing. It emphasizes Availability and Partition-Tolerance over Consistency in the CAP theorom.

The data structure is a two level hash table, with the first level key being the row key, and the second level key being the column key.

Where Cassandra differs from a SQL db is in the flexibility of the data model. In SQL one can model complex relationships, which allow for complex queries using joins to be done. Cassandra has support for CQL (Cassandra Query Language) which is like SQL but does not support joins or transactions.  The impact is that the queries with CQL cannot be as flexible (or adhoc) as those for SQL. The kind of queries that can be done have to be planned in advance. Doing other queries would be inefficient. However this drawback is mitigated by use of Spark along with Cassandra. In my understanding the Spark cluster is run in a parallel Cassandra cluster.

Why are joins important ? It goes back to relationships in an E-R diagram. Can’t we just model entities ? When we store Employees in one table and Departments in another in a SQL db, each row has an id which is a shorthand for the employee or the department. This simplification forces us to look up both tables again via a join in a query – say when asking for all employees belong to (only) the finance department. But tables like departments may be small in size so they could be replicated in memory for quickly recovering associations. And tables like employees can be naturally partitioned by the employee id which is unique. This means that SQL and complex relationships may not be needed for number of use cases. If ACID compliance is also not a requirement, then nosql is a good bet. Cassandra differs from MongoDB in that it can scale much better.

Quote from British Gas: “We’re dealing largely with time series data, and Spark is 10 to 100 times quicker as it is operating on data in-memory…Cassandra delivers what we need today and if you look at the Internet of Things space; that is what is really useful right now.”

Here’s a blog that triggered this thought along with a talk by Rachel@datastax, who also assured me that Cassandra has been hardened for security and has Kerberos support in the free version.

British Gas operates Hive, a competitor to Nest for thermostats. Note that couple months back British Gas reported 2200 of its accounts were compromised.

CERT Warns Wind Turbines Open to Compromise

Cert issued a warning that certain wind turbines are open to compromise.

“A successful attack would allow the malicious actor to lock out a legitimate administrator and take control of the device. .. the vulnerability is easy to exploit by an attacker who does not need to be authenticated to the device, or have direct physical access to it.”

A fix is issued but no OTA updates supported .. imagine climbing each turbine to upgrade the software.

Couple days earlier CERT issued an advisory about gas detectors being compromised. Incorrect gas level reports could be hazardous to equipment and human life.

DARPA asked for proposals around automatic detection and patching of security vulnerabilities.  In addition it raised an alert abut power grid vulnerability and proposed a plan to recover from a massive power grid attack. The power grid has faced hundreds of attacks, partly because it relies on 1970s era technology which cannot be upgraded as service cannot be interrupted. The addition of SmartMeters which make it more connected can increase the vulnerability level.

Spark, Storm, Ayasdi, Hadoop

The huge amount of data that IOT systems will generate will call for analyses of different types. A brief review of some systems and what they are used for.

Apache Spark: Uses distributed memory abstractions for primarily in-memory processing. Built with Scala. Useful for finding data clusters and for detecting statistical anomalies by looking for distance from the cluster. Comes with a machine learning system on top. Does not come with its own file system (use nfs/hdfs). Useful for complex processing, where state needs to be maintained in the event stream for correlations. Described as ‘batch processing with micro-streaming analysis’, but looks headed to cover streaming analyses as well.

Apache Storm: Real-time Streaming data analysis. Developed at Twitter, written in Clojure. Unlike Hadoop which has two layers (map, reduce), Storm can have N layers and a flexible topology consisting of Spouts (data source units) and Bolts (data processing units). Storm has been superceded by Heron in terms of performance. IBM Streams is a commercial offering also for stream processing..

Ayasdi: Topological data processing allows one to discover what interesting features of the data are, without knowing what to look for in advance. This is in contrast to most systems where one needs to know what one is looking for. Claims insight discovery.

Hadoop: Used for batch processing of a large amounts of data, using map/reduce primitives. Comes with HDFS. Cloudera (and others) have made significant improvements to it with an interactive SQL interface and usability improvements for BI (Impala).

InfluxDB: Time-series db for events and metrics. Optimized for writes and claims to scale to IOT workloads.

ZooKeeper: A coordination service for distributed applications.

Amazon S2N and OpenSSL

In the last few years a number of OpenSSL vulnerabilities have come to light.  Heartbleed was a critical one which was exploited in the field. It basically allowed a client to send a malicious heartbeat to the server and get back chunks of server memory – which can contain passwords. It was estimated that two thirds of the servers in the world had the vulnerability. The fix was to upgrade OpenSSL, revoke existing server certs and request new SSL server certs.

Heartbleed previously triggered OpenBSD to fork OpenSSL to LibreSSL and Google to fork OpenSSL to BoringSSL.

Amazon S2N is a TLS/SSL implementation that is 6000 lines of code – so it is small, compact, fast and its correctness can be more easily verified. It uses only crypto functions from openssl and reimplements the SSL layer. This is a healthy direction for IOT and for certification of SSL, for example FIPS. S2N is short for Signal to Noise.

A timing attack was recently identified against it and has since been mitigated.

Note that two factor auth solutions would actually solve the problem presented by Heartbleed. There are several solutions in this area – Authy, Clef, Google Authenticator, Duo, Okta, Oracle Authenticator, ..

Docker Container Security

A block diagram of docker is below and a description of docker daemon is here. The docker client commands talk to the docker-daemon to start one of the containers in the docker registry, or to start a process described in the command line as a new docker container. Docker provides a simple interface to linux container technology which is a lightweight VM.

docker-containers-vms

A few problems with this. Who has access to the docker-daemon to control the containers ? How is integrity of the containers ensured ? How is the host protected from the code running in the containers ?

Docker recently announced a few security features in Nov DockerCon

  • to lock down the container in a registry with the container image signed with a key from hardware device Yubikey; see here for a description of original issue where image checksums were not verified by docker daemon
  • to scan the official container images for vulnerabilities
  • to run containers with a userlevel namespace instead of one that allows root access to the host. This protects the host OS as explained here. The userlevel namespace feature has been available in LXC for over an year, but not in docker.

For access control to the docker daemon there is activity with a design doc here.

Twistlock is a container security and monitoring tool that attempts a comprehensive approach – access control to the containers, runtime scanning of files for malware signatures, vulnerability scanning, looking at network packets, so on. A recent meetup on Dec 1 discussed this product. It features integration with Kerberos and LDAP.

In terms of the kernel,  processes from all containers share the same kernel, the same networking layer. So what’s the level of isolation provided to container processes. This depends on vulnerabilities in the processes themselves – how many ports are open, whether injection attacks are possible etc. If two containers are running processes and a process from the one attacks a process from another – for example memory scraping, then Twistlock can detect it only if it can identify the offending process as malware using signature matching.

A Dockerfile is used to specify a container image using commands to spec the base os, rpms, utilities and scripts. USER specifies the userid under which the following RUN, CMD or ENTRYPOINT instruction run. EXPOSE specs a port to be opened for external access. A docker image is built from the dockerfile and contains the actual bits needed for the container to run. The image can be loaded directly or pushed to a docker registry from  which it can be pulled to clients. 

Commands:

docker build -t <imgnametag> . # build image from Dockerfile in current directory

docker run -i -t <imgnametag> /bin/bash

docker login // registry 

docker push

docker pull

docker-compose [down|up] // docker-compose.yaml

docker images

docker export <container>

docker save <image> -o imgtag.tar

“Computer Detective in the Cloud”

Although light on details, this is an application of AI for securing against credit card fraud in real time using cloud computing.

AI has been in the news a few times this month – Google (TensorFlow), Facebook (new milestones in AI), Microsoft releasing Cortana (Nadella welcomes our AI overlords) and mention of an AI spring from IBM and Salesforce.

Machine learning has also been applied to spam detection, intrusion detection, malicious file detection, malicious url detection, insurance claims leakage detection, activity/behaviour based authentication, threat detection and data loss prevention.

Worth noting that these successes are typically in narrow domains with narrow variations of what is being detected. Intrusion detection is a fairly hard problem for machine learning because the number of variations of attacks is high. As someone said, we’ll be using signatures for a long time.

The previous burst of activity around neural networks in the late 80’s and early 90’s had subsided around the same time as the rise of the internet in the mid to late 90’s. Around 2009, as GPU’s made parallel processing more mainstream, there was a resurgence in activity – deeper, multilayer, networks looking at overlapping regions of images (similar to wavelets) lead to convolutional neural networks being developed. These have had successes in image and voice recognition. A few resources – GPU gems for general purpose computing, visualizing convolutional netscaffe deep learning framework.

Kafka Security

Kafka is a system for continuous, high throughput messaging of event data, such as logs, to enable near real-time analytics. It is structured as a distributed message broker with incoming-event producers sending messages to topics and outgoing-event consumers.  Motivations behind its development include decoupling producers and consumers from each other for flexibility, reducing time to process events and increasing throughput. Couple analogies to think of it are a sender using sendmail to send an email to an email address (topic);  or a message “router” which decides the destination for a particular message – except Kafka persists the messages until the consumer is ready for them. It is an intermediary in the log processing pipeline – there is no processing of data itself on Kafka – there are no reads for instance. In contrast to JMS, one can send batch messages to Kafka and individual messages do not have to be acknowledged.

A design thesis of Kafka is that sequential (contiguous) disk access is very fast and can be even faster than random memory access. It uses zero copy, and uses a binary protocol over TCP, not HTTP.  A quote from design link – “This combination of pagecache and sendfile means that on a Kafka cluster where the consumers are mostly caught up you will see no read activity on the disks whatsoever as they will be serving data entirely from cache”.  This along with the distributed design makes it faster than competing pub-sub systems.

A proposal for adding security to it has been underway, for enterprise use, to control who can publish and subscribe to topics – https://cwiki.apache.org/confluence/display/KAFKA/Security . A talk on Kafka security by HortonWorks on integrating Kerberos authentication, SSL encryption with Kafka was given at a recent meetup. The slides are at – http://www.slideshare.net/harshach/kafka-security.

Of interest was an incident where the SSL patch caused the cluster to become unstable and increase latencies on a production cluster. The issue was debugged using profiling. Although SSL did increase latencies, this specific issue was narrowed to a bug unrelated to SSL in the same patch which had to do with zero copy.