Category: software development

DevSecOps – Securing the Software Supply Chain

A position paper from CNCF on securing the software supply chain, talks about hardening the software construction process by hardening each of the links in the software production chain –

Quote – “To operationalize these principles in a secure software factory several stages are needed. The software factory must ensure that internal, first party source code repositories and the entities associated with them are protected and secured through commit signing, vulnerability scanning, contribution rules, and policy enforcement. Then it must critically examine all ingested second and third party materials, verify their contents, scan them for security issues, evaluate material trustworthiness, and material immutability. The validated materials should then be stored in a secure, internal repository from which all dependencies in the build process will be drawn. To further harden these materials for high assurance systems it is suggested they should be built directly from source.

Additionally, the build pipeline itself must be secured, requiring the “separation of concerns” between individual build steps and workers, each of which are concerned with a separate stage in the build process. Build Workers should consider hardened inputs, validation, and reproducibility at each build. Finally, the artifacts produced by the supply chain must be accompanied by signed metadata which attests to their contents and can be verified independently, as well as revalidated at consumption and deployment.”

The issue is that software development is a highly collaborative process. Walking down the chain and ensuring the ingested software packages are bug-free is where it gets challenging.

The Department of Defense Enterprise DevSecOps Reference design, speaks to the aspect of securing the build pipeline –

The DoD Container Hardening Guide referenced in the CNCF doc is at –

which has a visual Iron Bank flow diagram on p.20

Software Integrity Tools

There are a number of tools used to detect security issues in a software application codebase. A simple and free one is flawfinder. A sophisticated commercial one is Veracode.  There’s also lint, pylint, findbugs for java, and xcode clang static analyzer.

Synopsis has bought a few tools like Coverity and Blackduck for various static checks on code and binary. Blackduck can do binary analysis and scores issues with the CVSS. A common use of Blackduck is for license checking to check for conformance to open source licenses.

A more comprehensive list of static code analysis tools is here.

Dynamic analysis tools inspect the running process and find memory and execution errors. Well known examples are valgrind and Purify. More dynamic tools are listed here.

For web application security there are protocol testing and fuzzing tools like Burp suite and Tenable Nessus.

A common issue with the tools is the issue of false positives. It helps to limit the testing to certain defect types or attack scenarios and identify the most critical issues, then expand the scope of types of defects.

Code obfuscation and anti-tamper are another line of tools, for example by Arxan, Klocwork, Irdeto and Proguard .

A great talk on Adventures in fuzzing. My takeaway has been that better ways of developing secure software are really important.



Git error: failed to push some refs to remote

This error often occurs after another checkin has gone in before yours, and says “the tip of your current branch is behind its remote counterpart”.

It should be resolved by

a) ‘git pull –rebase’ // this may bring in conflicts that need to be resolved

followed by

b) ‘git push’ // this works the first time

After the two steps your changes are available to team members to code review and you may need to edit your changes. After making such changes, you’d need to do

c) ‘git push -f’

to force the push.

However say this codereview-edit cycle takes some time and other changes are approved in the mean time – then you have to repeat these steps.

It can be simpler to get a fresh copy of the top of tree and add in your changes directly there and submit.

Javascript Timing and Meltdown

In response to meltdown/spectre side-channel vulnerabilities, which are based on fine grained observation of the CPU to infer cache state of an adjacent process or VM, a mitigration response by browsers was the reduction of the time resolution of various time apis, especially in javascript.

The authors responded with alternative sources of finding fine grained timing, available to browsers. An interpolation method allows obtaining of a fine resolution of 15 μs, from a timer that is rounded down to multiples of 100 ms.

The javascript  high resolution time api is still widely available and described at with a reference to previous work on cache attacks in Practical cache attacks in JS

A meltdown PoC is at, to test the timing attack in its own process. The instruction RDTSC returns the Time Stamp Counter (TSC), a 64-bit register that counts the number of cycles since reset, and so has a resolution of 0.5ns on a 2GHz CPU.

int main() {
 unsigned long i;
 i = __rdtsc();
 printf("%lld\n", i);

Ethereum Security and the DAO Solidity Attack

The basics of Ethereum are described in the Gavin Wood paper. A list of keywords in Solidity are described in this file from its source, which includes “address”, “contract”, “event”, “mapping” and “wei” ( 1 Eth= 10^18 Wei). This list does not include “gas”, which is a mechanism described in Wood’s paper to combat abuse. Interestingly the paper says “The first example of utilising the proof-of-work as a strong economic signal to secure a currency was by Vishnumurthy et al [2003]”, aka the Karma paper.

The karma paper talks about solving a cryptographic puzzle as enabling one to join the network and be assigned a bank set: “the node certifies that it completed this computation by encrypting challenges provided by its bank-set nodes with its private key. Thus each node is assigned an id beyond its immediate control, and acquires a public-private key pair that can be used in later stages of the protocol without having to rely on a public-key infrastructure”. Crypto puzzles for diverse problems have been proposed before, a survey and comparison is at

The DAO attack had 3 components, a hacker, a malicious contract and a vulnerable contract. The malicious contract was used to withdraw funds from the vulnerable contract so that it never got to the code to decrement its balance. Oddly enough the gas mechanism which is supposed to limit (runaway) computation did not kick in to stop this repeated remittance.

A few weeks before the DAO attack someone had pointed out to me that security of solidity was a bit of an open problem. My feeling was contracts should be layered above the value exchange mechanism, not built into it. Bitcoin based protocols with the simpler OP_RETURN semantics appeared more solid. Later around October’16 at an Ethereum meetup, Fred Ehrsam made the comment that most new projects would be using Ethereum instead of bitcoin. But Bitcoin meetups had more real-world use cases being discussed. The technical limitations exist, which are being addressed by forks such as SegWit2x this November. Today saw a number of interesting proposals with Ethereum, including Dharma, DataWallet and BloomIDs. Security would be a continuing  concern with the expanding scope of such projects.

Sequence diagram of the attack, showing a double-withdraw –

More on Ethereum –

It uses the Keccac-256 hash function.

It supports the Ethereum Virtual Machine. The EVM is formally specified in section 9 of Gavin Wood paper. Some properties –

  • computation on EVM is intrinsically tied to a parameter ‘gas’ which limits the amount of computation.
  • stack based machine, with max stack depth of1024
  • 256-bit word size. 256-bit stack item size. 256-bit chosen to facilitate Keccak-256 hash and Elliptic Curve computations
  • machine halts on various exceptions including OOG (out of gas)
  • memory model is a word-addressed byte-array
  • program code is store in a ‘virtual ROM’ accessed via a specialized instruction
  • runs state transition function

The state transition function takes as input – state σ, remaining gas g, and a tuple I.
The execution agent must provide in the execution environment, certain information, that are contained in the tuple I:
Ia, the address of the account which owns the code that is executing.
Io, the sender address of the transaction that originated this execution.
Ip, the price of gas in the transaction that originated this execution.
Id, the byte array that is the input data to this execution; if the execution agent is a transaction,
this would be the transaction data.
Is, the address of the account which caused the code to be executing; if the execution agent is a
transaction, this would be the transaction sender.
Iv , the value, in Wei, passed to this account as part of the same procedure as execution; if the
execution agent is a transaction, this would be the transaction value.
Ib, the byte array that is the machine code to be executed.
IH , the block header of the present block.
Ie, the depth of the present message-call or contract-creation (i.e. the number of CALLs or
CREATEs being executed at present).
The execution model defines the function Ξ, which can compute the resultant state σ′, the remaining gas g′, the suicide list s, the log series l, the refunds r and the resul- tant output, o, given these definitions:
(115) (σ′, g′, s, l, r, o) ≡ Ξ(σ, g, I)

The official code is written in Golang (~640,000 lines). A code walkthrough is at

The table of Opcodes is at

But what is a formal notion of a contract (specifically on the EVM which aims to generalize contracts) ? Is it any function. Is it a function of a specific template, expecting certain inputs/outputs. Can these functions be composed ?

The notions of contracts are delved in

“The contractual phases of search, negotiation, commitment, performance, and adjudication constitute the realm of smart contracts.”

Example – a vending machine. The machine ‘takes in coins and dispenses change and product according to the displayed price’, it is ‘a contract with bearer: anybody with coins can participate in an exchange with the vendor’.

So one can think of a smart contract as a digital vending machine. That’s actually a useful mental model. With a digital vending machine (DVM ?), the buyer can make a choice of what product buyer wants (of possibly very many – not limited to a physical machine), and be served that product upon due payment without interaction with another person or agency.

Golang interface ducktype and type assertion

The interface{} type in Golang is like the duck type in python. If it walks like a duck, it’s a duck – the type is determined by an attribute of the variable. This duck typing support in python often leaves one searching for the actual type of the object that a function takes or returns; but with richer names, or a naming convention, one gets past this drawback. Golang tries to implement a more limited and stricter duck typing: the programmer can define the type of a variable as an interface{}, but when it comes time to determine the type of the duck, one must assert it explicitly. This is called type assertion. During type assertion the progream can receive an error indicating that there was a type mismatch.

explicit ducktype creation example:

var myVariableOfDuckType interface{} = “thisStringSetsMyVarToTypeString”

var mySecondVarOfDuckType interface{} = 123 // sets type to int

ducktype assertion

getMystring, isOk := myVariableOfDuckType.(string) // isOk is true, assertion to string passed

getMystring, isOk := mySecondVarOfDuckType.(string) // isOk is false, assertion to string failed

In python, int(123) and int(“123”) both return 123.

In Golang, int(mySecondVarOfDuckType) will not return 123, even though the value actually happens to be an int. It will instead return a “type assertion error”.

cannot convert val (type interface {}) to type int: need type assertion

This is very odd. The type here is interface, not int – the “int subtype” is held within the “interface type”. The concrete type can be printed with %T and can be used in a type assertion switch.

GO Hello world – 3 types of declarations of an int

package main
import "fmt"
func main() {
 var j int
 j = 2
 var k int = 5;
 fmt.Println("Hello, world.", i, j, k)

GO Function call syntax

package main
import "fmt"

func main() {
 first := 2
 second := 3
 firstDoubled, secondDoubled := doubleTwo(first, second)
 fmt.Println("Numbers: ", first, second)
 fmt.Println("Doubling: ", firstDoubled, secondDoubled)
 fmt.Println("Tripling: ", triple(first), triple(second))

//function with named return variables
func doubleTwo(a, b int) (aDoubled int, bDoubled int) {
 aDoubled = a * 2
 bDoubled = b * 2

//function without named return variables
func triple(a int) int {
 return (a * 3)

Git Merge. You are in the middle of a merge. Cannot Amend.

Let’s say I made changes to branch “abc”, committed and pushed them.  This fired a build and a code-review after which the code is ready to be merged. But before these changes are merged, another party merged changes from a branch “xyz” into “abc”.

How do I replay my changes on top of changes from “xyz” in “abc” ? Here’s a seemingly obvious command to try:

$ git pull origin abc // tldr don't do this. try git pull --rebase
Auto-merging file1
CONFLICT (content): Merge conflict in file1
Automatic merge failed; fix conflicts and then commit the result.

At this point, I resolve the commits and attempt a git commit –amend. But this gives an error.

$ git commit --amend
fatal: You are in the middle of a merge -- cannot amend.
$ git rebase 
First, rewinding head to replay your work on top of it...
Applying: change name
Using index info to reconstruct a base tree...
M file1
Falling back to patching base and 3-way merge...
Auto-merging file1
CONFLICT (content):

The problem is the pull step, which implicitly does a fetch+merge and where the merge fails. (check git –help pull).

To recover from this situation, first squash the unnecessary merge and then do a rebase.

$ git reset --merge 
$ git rebase
First, rewinding head to replay your work on top of it..

This shows a list of conflicting files. Find the conflicting files and edit them to resolve conflicts.

$ git rebase --continue

file: needs merge

You must edit all merge conflicts and then mark them as resolved using git add

$ git add file1 file2

$ git rebase --continue

Applying: change name

$ git commit --amend

$ git push origin HEAD:refs/for/abc

Here’s the git rebase doc for reference. The rebase command is composed of multiple cherry-pick commands. Cherry-pick is a change-parent operation and changes commit-id = hash(parent-commit-id + changes). Here it re-chains my commit on top of a different commit than the one I started with. The commit –amend would not have changed the commit-id, and so it would not change the parent, so it fails.

Git merge is an operation to merge two different branches. It is the opposite of git branch, which creates a branch. In the above example we are applying our local changes to the same branch, which has undergone some changes since our last fetch.

Another error sometimes seen during a cherry-pick is the “fatal: You are in the middle of a cherry-pick — cannot amend”.  This happens on a “git commit –amend”  and here it expects you to do a plain “git commit” to first complete the cherry-pick operation.

A git commit references the current set of changes and the parent commit. A git merge references two parent commits. A git merge may involve a new commit if the two parents have diverged and conflicts need to be resolved.

Some idempotent (safe) and useful git commands.

$ git reflog [--date=iso]  # show history of local commits to index
$ git show  # or git show commit-hash, git show HEAD^^
$ git diff [ --word-diff | --color-diff ]
$ git status # shows files staged for commit (added via git add to the index). different from git diff
$ git branch [-a]
$ git tag -l   # list all tags 
$ git describe # most recent tag reachable from current commit
$ git config --list
$ git remote -v # git remote is a repo that contains the same proj 

$ git fetch --dry-run

$ git log --decorate --graph --abbrev-commit --date=relative --author=username

$ git log --decorate --pretty="format:%C(yellow)%h%C(green)%d%Creset %s -> %C(green)%an%C(blue), %C(red)%ar%Creset"

$ git log -g --abrev-commit --pretty=online 'HEAD@{now}'

$ git grep login -- "*yml"

$ tig    # use d, t, g to change views; use / to search

$ git fsck [--unreachable]

$ git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head  # list of largest files

$ git cat-file -p <hash> # view object - tree/commit/blob

$ git clean -xdn

Difference between ‘origin’ and ‘upstream’ terms is explained here. An interactive git cheat sheet is here. There is only one origin, the one you cloned your repo from, there can be many remotes, which are copies of your project.

Update sequence: working_directory -> index -> commit -> remote

Some terms defined:

origin = main remote repository

master = default (master/main) branch on the remote repository

HEAD = current revision

git uses the SHA-1 cryptographic hash function for content-addressing individual files (blobs) and directories (trees)  as well as commits – which track changes to the entire tree. Since each tree is a SHA-1 hash over the SHA-1 hashes of the files it contains, the SHA-1 change of a child that is n-levels deep, propagates to the root of the tree.

The index used in ‘git add’ is under .git/index . It can be viewed with

$ git ls-files --stage --abbrev --full-name --debug

This does not store the file contents, just their paths. The content for a new commit is stored under .git/objects/<xy>/<z>    where xy is the first two digits of the hash and z is the rest of 40 char hash.  If you do a fresh checkout the .git/object is empty except for the .git/objects/pack directory which stores the entire commit history in compressed form in a .pack file and a .idx index file. An mmap is done on these files for fast searching.


WebServices Composition with AWS

Some interesting diagrams on composition of a device data processing pipeline with AWS are at –

The services listed are:
Amazon Cognito: Identity and Security. Gets token with role for API access by a certain user.
Amazon Kinesis: Massive data ingestion. Uses token auth, but token signing can be a easiest.
AWS Lambda: Serverless Data Compute. Supposed to save on EC2 instance costs ( at the expense of lock-in ).
Amazon S3: Virtually unlimited storage. This is what really makes AWS tick.
Amazon Redshift: Petabyte-scale data analysis
It does not say what data goes to S3 and what data goes to the database.
On Redshift, here’s a comment from Nokia:” where their volume of data “literally broke the database”, prompting them to look for more scalable solutions.
There is a tension between “servers” and “services”, which goes back to IAAS vs PAAS distinction. PAAS can be faster to develop with reduced focus on server maintenance. However the number of PAAS concepts to deal with is neither small nor particularly inviting, as instead of a single server, one now has to deal with multiple services, each has to be authenticated, priced,  guarded for possible misuse and each has the potential for surprises. A key to simplicity is how composable the services are.

Spark and Scala

Spark is a general-purpose distributed data processing engine that is used for for variety of big data use cases – e.g. analysis of logs and event data for security, fraud detection and intrusion detection. It has the notion of Resilient Distributed Datasets. The “resilience” has to do with lineage of a datastructure, not to replication. Lineage means the set of operators applied to the original datastructure. Lineage and metadata are used to recover lost data, in case of node failures, using recomputation.

Spark word count example discussed in today’s meetup.

val textfile = sc.textFile("obama.txt")
val counts = textFile.flatMap(line=>line.split(" ")).filter(_.length>4).map(word=>(word,1)).reduceByKey(_+_)
val sortedCounts =

Scala is a functional programming language which is used in Spark. It prefers immutable datastructures. Sounds great! How are state changes done then ? Through function calls. Recursion has a bigger role to play because it is a way for state changes to happen via function calls. The stack is utilized for the writes, rather than the heap. I recalled seeing a spiral scala program earlier and found one here on the web. Modified it to find the reverse spiral. Here’s the resulting code. The takeaway is that functional programs are structured differently – one could do some things more naturally. It feels closer to how the mind works. As long as one get the base cases right, one can build large amount of complexity trivially. On the other hand, if one has to start top down and must debug a large call stack, it could be challenging.

// rt annotated spiral program.
// source
// reference:
// syntax highlight:
import{File, PrintWriter}

object SpiralObj {   // object keyword => a singleton object of a class defined implicitly by the same name
  object Element {   // subclass. how is element a singleton ? there are several elements. has 3 subclasses which are not singetons
    private class ArrayElement(  // subsubclass, not a singleton
                                val contents: Array[String]  // "primary constructor" is defined in class declaration, must be called
                                ) extends Element
    private class LineElement(s: String) extends Element {
      val contents = Array(s)
    private class UniformElement(  // height and width of a line segment. what if we raise width to 2. works.
                                  ch: Char,
                                  override val width: Int,   // override keyword is required to override an inherited method
                                  override val height: Int
                                  ) extends Element {
      private val line = ch.toString * width  // fills the characters in a line
      def contents = Array.fill(height)(line) // duplicates line n(=height) times, to create a width*height rectangle
    // three constructor like methods
    def elem(contents: Array[String]): Element = {
      new ArrayElement(contents)
    def elem(s: String): Element = {
      new ArrayElement(Array(s))
    def elem(chr: Char, width: Int, height: Int): Element = {
      new UniformElement(chr, width, height)

  abstract class Element {
    import Element.elem
    // contents to be implemented
    def contents: Array[String]

    def width: Int = contents(0).length

    def height: Int = contents.length

    // prepend this to that, so it appears above
    def above(that: Element): Element = {      // above uses widen
      val this1 = this widen that.width
      val that1 = that widen this.width
      elem(this1.contents ++ that1.contents)

    // prefix new bar line by line
    def beside(that: Element): Element = {     // beside uses heighten
      val this1 = this heighten that.height
      val that1 = that heighten this.height
        for ((line1, line2) <- this1.contents zip that1.contents)
          yield line1 + line2

    // add padding above and below
    def heighten(h: Int): Element = {          // heighten uses above
      if (h <= height) this
      else {
        val top = elem(' ', width, (h - height) / 2)
        val bottom = elem(' ', width, h - height - top.height)
        top above this above bottom

    // add padding left and right
    def widen(w: Int): Element = {             // widen uses beside
      if (w <= width) this
      else {
        val left = elem(' ', (w - width) / 2, height)
        val right = elem(' ', w - width - left.width, height)
        left beside this beside right

    override def toString = contents mkString "\n"

  object Spiral {
    import Element._
    val space = elem("*")
    val corner1 = elem("/")
    val corner2 = elem("\\")
    def spiral(nEdges: Int, direction: Int): Element = { // clockwise spiral
      if(nEdges == 0) elem("+")
      else {
        //val sp = spiral(nEdges - 1, (direction + 1) % 4) // or (direction - 1) % 4, but we don't want negative numbers
        val sp = spiral(nEdges - 1, (direction + 3) % 4) // or (direction - 1) % 4, but we don't want negative numbers
        var verticalBar = elem('|', 1, sp.height)        // vertBar and horizBar have last two params order switched
        var horizontalBar = elem('-', sp.width, 1)
    val thick = 1
        // at this stage, assume the n-1th spiral exists and you are adding another "line" to it (not a whole round)
        // use "above" and "beside" operators to attach the line to the spiral
        if(direction == 0) {
          horizontalBar = elem('r', sp.width, thick)
          (corner1 beside horizontalBar) above (sp beside space) //  order is left to right
        }else if(direction == 1) {
          verticalBar = elem('d',thick, sp.height)
          (sp above space) beside (corner2 above verticalBar)
        } else if(direction == 2) {
          horizontalBar = elem('l', sp.width, thick)
          (space beside sp) above (horizontalBar beside corner1)
        } else {
          verticalBar = elem('u',thick, sp.height)
          (verticalBar above corner2) beside (space above sp)

    def revspiral(nEdges: Int, direction: Int): Element = { // try counterclockwise
      if(nEdges == 0) elem("+")
      else {
        //val sp = spiral(nEdges - 1, (direction + 1) % 4) // or (direction - 1) % 4, but we don't want negative numbers
        val sp = revspiral(nEdges - 1, (direction + 3) % 4) // or (direction - 1) % 4, but we don't want negative numbers
        var verticalBar = elem('|', 1, sp.height)        // vertBar and horizBar have last two params order switched
        var horizontalBar = elem('-', sp.width, 1)
    val thick = 1
        // at this stage, assume the n-1th spiral exists and you are adding another "line" to it (not a whole round)
        if(direction == 0) { // right
          horizontalBar = elem('r', sp.width, thick)
          (sp beside space) above (corner2 beside horizontalBar)
        }else if(direction == 1) { // up
          verticalBar = elem('u',thick, sp.height)
          (space above sp) beside (verticalBar above corner1)
        } else if(direction == 2) { // left
          horizontalBar = elem('l', sp.width, thick)
          (horizontalBar beside corner2 ) above (space beside sp) 
        } else { // down
          verticalBar = elem('d',thick, sp.height)
          (corner1 above verticalBar) beside (sp above space)
    def draw(n: Int): Unit = {
      println(spiral(n, n % 4))  // %4 returns 0,1,2,3 .    right, down, left, up
      println(revspiral(n, n % 4))  // %4 returns 0,1,2,3   

object Main {
  def usage() {
      print("usage: scala Main szInt");

  def main(args: Array[String]) {
    import SpiralObj._
    if(args.length > 0) {
        val spsize = args(0)
    } else {

A note on tail-call recursion. If the last statement of function is a call to another function, then the return position of the called function is the same as that of the calling function. The current stack position is valid for the called function. Such a function is tail recursive and the effect is that of a loop – a series of function calls can be made without consuming stack space.

On Software Requirements

There are a couple high level tradeoffs in the requirements specification process. Each tradeoff can be thought as an axis: Specificity (detailed vs vague), Audacity (visionary vs trivial/checkmark), Customer-driven (needs vs wants; with timelines).

It is possible for them to be too detailed – the more detailed and specific the requirements are, the less understandable they are and the less flexible they are in a rapidly changing context. But if the requirements are too vague, then they are likely to be misunderstood or ignored by a development team. This is a case where directly talking to the end users and clear communication between team members to flesh out use cases will help.

Also if the requirements are too visionary then they may appear infeasible to the team.  Showing they are achievable by looking at related products is one solution. Decomposing the target into achievable modules is another. If they are too near-term, then they may appear trivial and fail to excite the team.

Finally the requirements should be well grounded in customer use cases and narrowly stated, rather than inherited as a long list from past successful technical products. This is probably the most important and hardest thing in practice.

Specifying the right amount of detail for development targets that are grounded, challenging and achievable is an important skill.

Another take on this topic is Joel Spolsky’s series on writing painless functional specifications.