NVidia Volta GPU vs Google TPU

A Graphics Processing Unit (GPU) allows multiple hardware processors to act in parallel on a single array of data, allowing a divide and conquer approach to large computational tasks such as video frame rendering, image recognition, and various types of mathematical analysis including convolutional neural networks (CNNs). The GPU is typically placed on a larger chip which includes CPU(s) to direct data to the GPUs. This trend is making supercomputing tasks much cheaper than before .

Tesla_v100 is a System on Chip (SoC) which contains the Volta GPU which contains TensorCores, designed specifically for accelerating deep learning, by accelerating the matrix operation D = A*B+C, each input being a 4×4 matrix.  More on Volta at https://devblogs.nvidia.com/parallelforall/inside-volta/ . It is helpful to read the architecture of the previous Pascal P100 chip which contains the GP100 GPU, described here – http://wccftech.com/nvidia-pascal-specs/ .  Background on why NVidia builds chips this way (SIMD < SIMT < SMT) is here – http://yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html .

Volta GV100 GPU = 6 GraphicsProcessingClusters x  7 TextureProcessingCluster/GraphicsProcessingCluster x 2 StreamingMultiprocessor/TextureProcessingCluster x (64 FP32Units +64 INT32Units + 32 FP64Units +8 TensorCoreUnits +4 TextureUnits)

The FP32 cores are referred to as CUDA cores, which means 84×64 = 5376 CUDA cores per Volta GPU. The Tesla V100 which is the first product (SoC) to use the Volta GPU uses only 80 of the 84 SMs, or 80×64=5120 cores. The frequency of the chip is 1.455Ghz. The Fused-Multiply-Add (FMA) instruction does a multiplication and addition in a single instruction (a*b+c), resulting in 2 FP operations per instruction, giving a FLOPS of 1.455*2*5120=14.9 Tera FLOPs due to the CUDA cores alone. The TensorCores do a 3d Multiply-and-Add with 7x4x4+4×4=128 FP ops/cycle, for a total of 1.455*80*8*128 = 120TFLOPS for deep learning apps.

3D matrix multiplication3d_matrix_multiply

The Volta GPU uses a 12nm manufacturing process, down from 16nm for Pascal. For comparison the Jetson TX1 claims 1TFLOPS and the TX2 twice that (or same performance with half the power of TX1). The VOLTA will be available on Azure, AWS and platforms such as Facebook.  Several applications in Amazon. MS Cognitive toolkit will use it.

For comparison, the Google TPU runs at 700Mhz, and is manufactured with a 28nm process. Instead of FP operations, it uses quantization to integers and a systolic array approach to minimize the watts per matrix multiplication, and optimizes for neural network calculations instead of more general GPU operations.  The TPU uses a design based on an array of 256×256 multiply-accumulate (MAC) units, resulting in 92 Tera Integer ops/second.

Given that NVidia is targeting additional use cases such as computer vision and graphics rendering along with neural network use cases, this approach would not make sense.

Miscellaneous conference notes:

Nvidia DGX-1. “Personal Supercomputer” for $69000 was announced. This contains eight Tesla_v100 accelerators connected over NVLink.

Tesla. FHHL, Full Height, Half Length. Inferencing. Volta is ideal for inferencing, not just training. Also for data centers. Power and cooling use 40% of the datacenter.

As AI data floods the data centers, Volta can replace 500 CPUswith 33 GPUs.
Nvidia GPU cloud. Download the container of your choice. First hybrid deep learning cloud network. Nvidia.com/cloud . Private beta extended to gtc attendees.

Containerization with GPU support. Host has the right NVidia driver. Docker from GPU cloud adapts to the host version. Single docker. Nvidiadocker tool to initialize the drivers.

Moores law comes to an end. Need AI at the edge, far from the data center. Need it to be compact and cheap.

Jetson board had a Tegra SoC chip which has 6cpus and a Pascal GPU.

AWS Device Shadows vs GE Digital Twins. Different focus. Availabaility+connectivity vs operational efficiency. Manufacturing perspective vs operational perspective. Locomotive may  be simulated when disconnected .

DeepInstinct analysed malware data using convolutional neural networks on GPUs, to better detect malware and its variations.

Omni.ai – deep learning for time series data to detect anomalous conditions on sensors on the field such as pressure in a gas pipeline.

GANS applications to various problems – will be refined in next few years.

GeForce 960 video card. Older but popular card for gamers, used the Maxwell GPU, which is older than Pascal GPU.

Cooperative Groups in Cuda9. More on Cuda9.

Weeping Angel Remote Camera Monitor

Sales of hardware camera blockers and similar devices should increase, with the Weeping Angel disclosure. Wikileaks detailed how the CIA and MI5 hacked Samsung TVs to silently monitor remote communications. Interesting to read for the level of technical detail: https://wikileaks.org/vault7/document/EXTENDING_User_Guide/EXTENDING_User_Guide.pdf . The attack was called ‘Weeping Angel’, a term borrowed from Doctor Who.

Other such schemes are described at https://wikileaks.org/vault7/releases/#Weeping%20Angel , including a iPhone implant to get data from your phone – https://wikileaks.org/vault7/document/NightSkies_v1_2_User_Guide/NightSkies_v1_2_User_Guide.pdf .

Golang interface ducktype and type assertion

The interface{} type in Golang is like the duck type in python. If it walks like a duck, it’s a duck – the type is determined by an attribute of the variable. This duck typing support in python often leaves one searching for the actual type of the object that a function takes or returns; but with richer names, or a naming convention, one gets past this drawback. Golang tries to implement a more limited and stricter duck typing: the programmer can define the type of a variable as an interface{}, but when it comes time to determine the type of the duck, one must assert it explicitly. This is called type assertion. During type assertion the progream can receive an error indicating that there was a type mismatch.

explicit ducktype creation example:

var myVariableOfDuckType interface{} = “thisStringSetsMyVarToTypeString”

var mySecondVarOfDuckType interface{} = 123 // sets type to int

ducktype assertion

getMystring, isOk := myVariableOfDuckType.(string) // isOk is true, assertion to string passed

getMystring, isOk := mySecondVarOfDuckType.(string) // isOk is false, assertion to string failed

In python, int(123) and int(“123”) both return 123.

In Golang, int(mySecondVarOfDuckType) will not return 123, even though the value actually happens to be an int. It will instead return a “type assertion error”.

cannot convert val (type interface {}) to type int: need type assertion

This is very odd. The type here is interface, not int – the “int subtype” is held within the “interface type”. The concrete type can be printed with %T and can be used in a type assertion switch.

GO Hello world – 3 types of declarations of an int

package main
import "fmt"
func main() {
 i:=1
 var j int
 j = 2
 var k int = 5;
 fmt.Println("Hello, world.", i, j, k)
}

GO Function call syntax

package main
import "fmt"

func main() {
 first := 2
 second := 3
 firstDoubled, secondDoubled := doubleTwo(first, second)
 fmt.Println("Numbers: ", first, second)
 fmt.Println("Doubling: ", firstDoubled, secondDoubled)
 fmt.Println("Tripling: ", triple(first), triple(second))
}

//function with named return variables
func doubleTwo(a, b int) (aDoubled int, bDoubled int) {
 aDoubled = a * 2
 bDoubled = b * 2
return
}

//function without named return variables
func triple(a int) int {
 return (a * 3)
}

ICS Threat Landscape

Kaspersky labs released a report on Industrial and Control System (ICS) security trends. The data was reported to be gathered using Kaspersky Security Network (KSN), a distributed antivirus network.

https://ics-cert.kaspersky.com/reports/2017/03/28/threat-landscape-for-industrial-automation-systems-in-the-second-half-of-2016/

The report is here – https://ics-cert.kaspersky.com/wp-content/uploads/sites/6/2017/03/KL-ICS-CERT_H2-2016_report_FINAL_EN.pdf

ics_cert_en_1

The rising trend is partly due to the isolation strategy currently being followed for ICS network security no longer being effective in protecting industrial networks.

Git Merge. You are in the middle of a merge. Cannot Amend.

Let’s say I made changes to branch “abc”, committed and pushed them.  This fired a build and a code-review after which the code is ready to be merged. But before these changes are merged, another party merged changes from a branch “xyz” into “abc”.

How do I replay my changes on top of changes from “xyz” in “abc” ? Here’s a seemingly obvious command to try:

$ git pull origin abc // tldr don't do this. try git pull --rebase
Auto-merging file1
CONFLICT (content): Merge conflict in file1
Automatic merge failed; fix conflicts and then commit the result.


At this point, I resolve the commits and attempt a git commit –amend. But this gives an error.

$ git commit --amend
fatal: You are in the middle of a merge -- cannot amend.
$ git rebase 
First, rewinding head to replay your work on top of it...
Applying: change name
Using index info to reconstruct a base tree...
M file1
Falling back to patching base and 3-way merge...
Auto-merging file1
CONFLICT (content):



The problem is the pull step, which implicitly does a fetch+merge and where the merge fails. (check git –help pull).

To recover from this situation, first squash the unnecessary merge and then do a rebase.

$ git reset --merge 
$ git rebase
First, rewinding head to replay your work on top of it..



This shows a list of conflicting files. Find the conflicting files and edit them to resolve conflicts.














$ git rebase --continue

file: needs merge

You must edit all merge conflicts and then mark them as resolved using git add



  
$ git add file1 file2



$ git rebase --continue

Applying: change name


  
$ git commit --amend




$ git push origin HEAD:refs/for/abc

Here’s the git rebase doc for reference. The rebase command is composed of multiple cherry-pick commands. Cherry-pick is a change-parent operation and changes commit-id = hash(parent-commit-id + changes). Here it re-chains my commit on top of a different commit than the one I started with. The commit –amend would not have changed the commit-id, and so it would not change the parent, so it fails.

Git merge is an operation to merge two different branches. It is the opposite of git branch, which creates a branch. In the above example we are applying our local changes to the same branch, which has undergone some changes since our last fetch.

Another error sometimes seen during a cherry-pick is the “fatal: You are in the middle of a cherry-pick — cannot amend”.  This happens on a “git commit –amend”  and here it expects you to do a plain “git commit” to first complete the cherry-pick operation.

A git commit references the current set of changes and the parent commit. A git merge references two parent commits. A git merge may involve a new commit if the two parents have diverged and conflicts need to be resolved.

Some idempotent (safe) and useful git commands.

$ git reflog [--date=iso]  # show history of local commits to index
$ git show  # or git show commit-hash, git show HEAD^^
$ git diff [ --word-diff | --color-diff ]
$ git status # shows files staged for commit (added via git add to the index). different from git diff
$ git branch [-a]
$ git tag -l   # list all tags 
$ git describe # most recent tag reachable from current commit
$ git config --list
$ git remote -v # git remote is a repo that contains the same proj 


$ git fetch --dry-run

$ git log --decorate --graph --abbrev-commit --date=relative --author=username

$ git log --decorate --pretty="format:%C(yellow)%h%C(green)%d%Creset %s -> %C(green)%an%C(blue), %C(red)%ar%Creset"

$ git log -g --abrev-commit --pretty=online 'HEAD@{now}'

$ git grep login -- "*yml"

$ tig    # use d, t, g to change views; use / to search

$ git fsck [--unreachable]

$ git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head  # list of largest files

$ git cat-file -p <hash> # view object - tree/commit/blob

$ git clean -xdn

Difference between ‘origin’ and ‘upstream’ terms is explained here. An interactive git cheat sheet is here. There is only one origin, the one you cloned your repo from, there can be many remotes, which are copies of your project.

Update sequence: working_directory -> index -> commit -> remote

Some terms defined:

origin = main remote repository

master = default (master/main) branch on the remote repository

HEAD = current revision

git uses the SHA-1 cryptographic hash function for content-addressing individual files (blobs) and directories (trees)  as well as commits – which track changes to the entire tree. Since each tree is a SHA-1 hash over the SHA-1 hashes of the files it contains, the SHA-1 change of a child that is n-levels deep, propagates to the root of the tree.

The index used in ‘git add’ is under .git/index . It can be viewed with

$ git ls-files --stage --abbrev --full-name --debug

This does not store the file contents, just their paths. The content for a new commit is stored under .git/objects/<xy>/<z>    where xy is the first two digits of the hash and z is the rest of 40 char hash.  If you do a fresh checkout the .git/object is empty except for the .git/objects/pack directory which stores the entire commit history in compressed form in a .pack file and a .idx index file. An mmap is done on these files for fast searching.

 

RSA World 2017

I was struck by the large number of vendors offering visibility as a key selling point. Into various types of network traffic. Network monitors, industrial network gateways, SSL inspection, traffic in VM farms, between containers, and on mobile.

More expected are the vendors offering reduced visibility – via encryption, TPMs, Secure Enclaves, mobile remote desktop, one way links, encrypted storage in the cloud etc. The variety of these solutions is also remarkable.

Neil Degrasse’s closing talk was so different yet strangely related to these topics, the universe slowly making itself visible to us through light from distant places, with insights from physicists and mathematics building up to the experiments that recently confirmed gravitational waves – making the invisible, visible. . This tenuous connection between security and physics left me misty eyed. What an amazing time to be living in.

Industrial Robot Safety Systems

Robot safety systems must be concerned with graceful degradation in the face of component failures, bad inputs and extreme operating conditions. With the increasing complexity and prevelance of robots, one can expect these requirements to grow.

This document “Robot System Safety” describes the following safety features of Kuka robots-
— Restricted envelope
— EMERGENCY STOP
— Enabling switches
— Guard interlock

An interlock system described here for general control systems, is a mechanism for guaranteeing that an undesired combination of states does not occur – for e.g. robot moving when the cell door is open, or an elevator moving when the door is open. The combination of the controlled stop with the guard interlock is described as follows:

“The robot controller features a two–channel safety input, to which the guard interlock can be connected. In the automatic modes, the opening of the guard connected to this input causes a controlled stop, with power to the drives being maintained in order to ensure this controlled stop. The power is only disconnected once the robot has come to a standstill. Motion in Automatic mode is prevented until the guard connected to this input is closed. This input has no effect in Test mode. The guard must be designed in such a way that it is only possible to acknowledge the stop from outside the safeguarded space.”

This Stanford Linear Accelerator (SLAC) paper describes the advantages of PLCs for interlock design as:

. flexible system configuration due to modular hardware and software
. regularly scheduled background tests of PLC system and sensitive I/O
. comprehensive system self-tests
. intelligent fault diagnostics simplify trouble-shooting
. easy reconfiguration of the interlock logic
. no mechanical wear and tear
. improved security due to logic encapsulation in firmware

The SLAC use case is to lock the doors unless (a) the power has been shutoff or (b) the power is on but there is an explicit bypass for hot maintenance. They implement it on a Siemens PLC using statement lists (vs ladder logic or control system flowchart programming).
GE has a number of industrial interlocks for various safety functions – http://www.clrwtr.com/PDF/GE-Security/Sentrol-Catalog.pdf

A very useful article on interlocking devices – http://machinerysafety101.com/2012/06/01/interlocking-devices-the-good-the-bad-and-the-ugly/

Neural Network learning

Notes from Autonomous Driving Udacity course.

Weights

 

 

IOT security vulnerabilities and hacks

Mirror of https://github.com/nebgnahz/awesome-iot-hacks :

A curated list of hacks in IoT space so that researchers and industrial products can address the security vulnerabilities (hopefully).

Thingbots

RFID

Home Automation

Connected Doorbell

Hub

Smart Coffee

Wearable

Smart Plug

Cameras

Traffic Lights

Automobiles

Airplanes

Light Bulbs

Locks

Smart Scale

Smart Meters

Pacemaker

Thermostats

Fridge

Media Player & TV

Toilet

Toys

Software Defined Networking Security

Software Defined Networking seeks to centralize control of a large network. The abstractions around computer networking evolved from the connecting nodes via switches, to applications that run on top with the OSI model, to the controllers that manage the network. The controller abstraction was relatively weak – this had been the domain of telcos and ISPs, and as the networks of software intensive companies like Google approached the size of telco networks, they moved to reinvent the controller stack.  Traffic engineering and security which were done in disparate regions were attempted to be centralized in order to better achieve economies of scale. Google adopted openflow for this, developed by Nicira, which was soon after acquired by VMWare; Cisco internal discussions concluded that such a centralization wave would reduce Cisco revenues in half, so they spun out Insieme networks for SDN capabilities and quickly acquired it back. This has morphed into the APIC offering.

The centralization wave is a bit at odds with the security and resilience of networks because of their inherent distributed and heterogenous nature. Distributed systems provide availability, part of the security CIA triad, and for many systems availability trumps security. The centralized controllers would become attractive targets for compromise. This is despite the intention of SDN, as envisioned by Nicira founder M. Casado, to have security as its cornerstone as described here. Casado’s problem statement is interesting: “We don’t have a ubiquitous and horizontal security layer that provides both context and isolation. Where do we normally put security controls? We put it in one of two places. We might put it in the physical infrastructure, which is great because you have isolation. If I have ACLs [access control lists] or a firewall or an IDS [intrusion detection system], I put it in a separate box and I put it away from the applications so that the attack surface is pretty small and it’s totally isolated… you have isolation, but you have no context. ..  Another place we put security is in the end host, an application or operating system. This suffers from the opposite problem. You have all the context that you need — applications, users, the data being accessed — but you have absolutely no isolation.” The centralization imperative comes from the need to isolate and minimize the trusted computing base.

In the short term, there may be some advantage to be gained by complexity reduction through centralized administration, but the recommendation of dumb switches that respond to a tightly controlled central brain, go against the tenets of compartmentalization of risk and if such networks are put into practice widely they can result in failures that are catastrophic instead of isolated.

What the goal should be is a distributed system which is also responsive.

Lessons from SF Muni Ransomware – malware

On Nov 25, a hacker going by “andy saolis” infected the San Francisco Municipal Transportation Agency’s (SMFTA) network with ransomware that encrypted data on 900 office computers, spreading through the system’s Windows operating system. Saolis threatened to publish 30 gigabytes of data, including contracts, employee data, customer information.  SMFTA’s ticketing system was shut down to prevent the malware from spreading. The attacker demanded a 100 Bitcoin ransom, around $73,000, to unlock the affected files. Salted hash reported the malware is likely a variant of HDDCryptor, which uses commercial tools to encrypt hard drives and network shares.

The service was restored due to backups . However consider these systems were in an ICS scenario. An unexpected downtime would result, which would be unacceptable.

Containers and Privileges

Cgroups limit how much you can do. Namespaces limit how much you can see.

Linux containers are based on cgroups and namespaces and can be privileged or unprivileged. A privileged container is one without the “User Namespace” implying it has direct visibility into all users of the underlying host. The User Namespace allows remapping the user identities in the container so even if the process thinks it is running as root, it is not.

Using cgroups to limit fork bombs from Jessie’s talk:

$ sudo su
# echo 2 > /sys/fs/cgroup/pids/parent/pids.max
# echo $$ > /sys/fs/cgroup/pids/parent/cgroups.procs  // put current pid
# cat /sys/fs/cgroup/pids/parent/pids.current
2
# (echo "foobar" | cat )
bash: for retry: No child processes

Link to the 122 page paper on linux containers security here.  Includes this quote on linux kernel attacks.

“Kernel vulnerabilities can take various forms, from information leaks and Denial of Service (DoS) risks to privilege escalation and arbitrary code execution. Of the roughly 400 Linux system calls, a number have contained privilege escalation vulnerabilities, as recently as of 2016 with keyctl(2). Over the years this included, but is not limited to: futex(2), vmsplice(2), mremap(2), unmap(2), do_brk(2), splice(2), and modify_ldt(2). In addition to system calls, old or obscure networking code including but not limited
to SCTP, IPX, ATM, AppleTalk, X.25, DECNet, CANBUS, Econet and NETLINK has contributed to a great number of privilege escalation vulnerabilities through various use cases or socket options. Finally, the “perf” subsystem, used for performance monitoring, has historically contained a number of issues, such as perf_swevent_init (CVE-2013-2094).”

Which makes the case for seccomp, as containers both privileged and unprivileged can lead to bad things –

““Containers will always (by design) share the same kernel as the host. Therefore, any vulnerabilities in the kernel interface, unless the container is forbidden the use of that interface (i.e. using seccomp)”- LXC Security Documentation by Serge Hallyn, Canonical”

The paper has several links on restricting access, including grsecurity, SELinux, App Armor and firejail. A brief comparison of the first three is here. SELinux has a powerful access control mechanism – it attaches labels to all files, processes and objects; however it is complex and often people end up making things too permissive, instead of taking advantage of available controls.  AppArmor works by labeling  files by pathname and applying policies to the pathname – it is recommended with SUSE/OpenSUSE, not CentOS.  Grsecurity policies are described here, its ACLs support process–based resource restrictions, including memory/cpu/files open, etc.

Blockchain ideas

I think of bitcoin as a self-securing system. Value is created by solving the security problem of verifying the last block of bitcoin transactions. This verification serves as a decentralized stamp of approval, and the verification step consists of hashing a nonce and a block of transactions through a one way hash function and arriving at a checksum with a certain structure (which is hard because hash is random and meeting the structure requirement is a low probability event).

What happens if parties collude to get greater hashing power and increase their share of mining ? This is what happened with GPU mining farms on bitcoin. It was one of the motivations behind Ethereum, which enables code to run as part of transactions, and for the hashing algorithm to be not easily parallelized over a GPU. But it is not economical to mine on desktops as the motivation seems to suggest.

The important aspect I think is the self-securing idea – how can a set of computational systems be designed so that they are incentivized to cooperate and become more secure as a result of that cooperation.

At a recent blockchain conference, some interesting topics of discussion were zero knowledge proofs, consensus algorithms,  greater network-member-ownership in a network with network-effects instead of a centralized rent collection, game theoretic system designs and various etherereum blockchain applications.

Update: this self securing self sustaining view is explored further here. https://medium.com/@jordanmmck/cryptocurrencies-as-abstract-lifeforms-9e35138d63ed