Interesting charts here with 2017 predictions of industrial ethernet marketshare – http://www.profibus.com/uploads/media/pxddamkey%5B17892%5D_IE-Book.pdf
A fascinating story about use of Cassandra for analyzing sensor data from boilers to predict their failuresin UK homes by British Gas appeared here.
The design of Cassandra is intuitively clear to me in its use of a single primary index to distribute the query load among a set of nodes that can be scaled up linearly. It uses a ring architecture based on consistent hashing. It emphasizes Availability and Partition-Tolerance over Consistency in the CAP theorom.
The data structure is a two level hash table, with the first level key being the row key, and the second level key being the column key.
Where Cassandra differs from a SQL db is in the flexibility of the data model. In SQL one can model complex relationships, which allow for complex queries using joins to be done. Cassandra has support for CQL (Cassandra Query Language) which is like SQL but does not support joins or transactions. The impact is that the queries with CQL cannot be as flexible (or adhoc) as those for SQL. The kind of queries that can be done have to be planned in advance. Doing other queries would be inefficient. However this drawback is mitigated by use of Spark along with Cassandra. In my understanding the Spark cluster is run in a parallel Cassandra cluster.
Why are joins important ? It goes back to relationships in an E-R diagram. Can’t we just model entities ? When we store Employees in one table and Departments in another in a SQL db, each row has an id which is a shorthand for the employee or the department. This simplification forces us to look up both tables again via a join in a query – say when asking for all employees belong to (only) the finance department. But tables like departments may be small in size so they could be replicated in memory for quickly recovering associations. And tables like employees can be naturally partitioned by the employee id which is unique. This means that SQL and complex relationships may not be needed for number of use cases. If ACID compliance is also not a requirement, then nosql is a good bet. Cassandra differs from MongoDB in that it can scale much better.
Quote from British Gas: “We’re dealing largely with time series data, and Spark is 10 to 100 times quicker as it is operating on data in-memory…Cassandra delivers what we need today and if you look at the Internet of Things space; that is what is really useful right now.”
Here’s a blog that triggered this thought along with a talk by Rachel@datastax, who also assured me that Cassandra has been hardened for security and has Kerberos support in the free version.
British Gas operates Hive, a competitor to Nest for thermostats. Note that couple months back British Gas reported 2200 of its accounts were compromised.
Cert issued a warning that certain wind turbines are open to compromise.
“A successful attack would allow the malicious actor to lock out a legitimate administrator and take control of the device. .. the vulnerability is easy to exploit by an attacker who does not need to be authenticated to the device, or have direct physical access to it.”
A fix is issued but no OTA updates supported .. imagine climbing each turbine to upgrade the software.
Couple days earlier CERT issued an advisory about gas detectors being compromised. Incorrect gas level reports could be hazardous to equipment and human life.
DARPA asked for proposals around automatic detection and patching of security vulnerabilities. In addition it raised an alert abut power grid vulnerability and proposed a plan to recover from a massive power grid attack. The power grid has faced hundreds of attacks, partly because it relies on 1970s era technology which cannot be upgraded as service cannot be interrupted. The addition of SmartMeters which make it more connected can increase the vulnerability level.
For the purpose of the IOT, an individual device can be abstracted as a specialized service which produces and consumes data. In addition, the device has certain capabilities to act on, or transform data on a discrete or continuous basis.
Who should have access to these services and capabilities ? It could be
- other devices in proximity to the device
- external services
- certain users
Who gets access is a function of the identity of the devices, the identities of the entities accessing the service and policies governing access (which can include parameters such as location, time, role or more complex rules).
To determine access, a device should be capable of
- identifying itself , its services and capabilities
- obtaining authorization for the services and capabilities (before exercising them), and presenting these when requested. This authorization includes a signed access policy
- updating or invalidating the access policy as time goes on
The access policies need to be applied to the data flows based on the identities and be rich enough to capture use cases of interest.
What’s the identity of the device ? There can be multiple identities based on whether the device is identifying itself to a user, to another device of the same type, or to other devices in the ecosystem that it is a part of (say a component of a car).
Having a unique device id and leveraging it for the services that are built on the device is a design choice. Consider the choices for iPhone and Android. In the iPhone the device id permeates the application layer; the application developer and can target his application for specific devices and must register the device for developing on it. This design choice allows the device to check the applications that are run on it are valid and their associated developer is registered with Apple. It strengthens the associations in the ecosystem of devices, developers, applications and users.
In Android the security certificates were at the JVM layer which allows self-signed certificates. Here the device id is not used as a strong identifier that is known to applications and developers. This is one reason the open system is more prone to malware.
A unique hardware identity is something to look for in IOT designs. Here’s an article from Intel/McAfee discussing EPID an immutable device ID that can be used for identifying and also anonymizing. https://blogs.mcafee.com/business/intels-iot-gateway-enhancements/
Update: On Nov 25, news came of a number of IOT devices using the same HTTPS certificate and SSH keys. See here. Large clusters of devices on the net are exposed on the internet this way.
The premise behind ThingWorx is that manufactured products are transforming into services. A product can be remotely monitored, maintained, and its data analyzed as part of the extended service wrapper. It is an interesting point of view on the evolution of products.
GE provides the engine not as a product but as a service, it continues to maintain it after the sale. Boeing provides the plane as a service, it continues to maintain it after the sale.
ThingWorkx claims to makes it easier for any product to be converted to such a service. It’s not clear how this works with legacy systems – whether it is an agent or a wrapper and how easy it is to add. Its security whitepaper discusses authentication, authorization, encryption, security models, audit etc.
Imagine a hyperconnected supply chain consisting of components that are tracked back by their supplier. Security and access controls would be a challenge in such a dynamic environment.
An example of a product/application on ThingWorx is Velio OBD device and Velio Webhook application. The Webhook application displays basic data coming from OBD modules: GPS, accelerometer and OBD-II. It enables users to create customized views depicting the data that is important to them while also enabling access to both live and historical data. The application will be available in the ThingWorx Marketplace.
Several interesting companies I talked to in the mobile and enterprise space
- BlueBox Security – automatic containerization of mobile apps
- SkyCure – identified some DoS attacks that can occur against iOS devices
- Okta – allows integrations with some 4000 different applications from a single identity console.
- BlueCoat Systems – network traffic analysis for malware detection
- Microsoft – integration of admin and user policies for Office365 with Email.
- Shape Security – changes the shape of the traffic by detecting the large fraction of traffic that is not coming from real users and blocking it from hitting the webservers
- German pavilion with several technologies including database encryption and controls
On the IOT side of things there were hacking demos of Nest thermostats, Vera home automation systems, remotely connected storage devices. Read more about the “Internet of Crappy Things” at the Kaspersky blog – https://blog.kaspersky.com/internet-of-crappy-things-2/8518/