Month: February 2016

Bytecode VMs

I came to be looking at different language VMs recently- Erlang BEAM and the Smalltalk VM. Smalltalk VM was a bytecode based stack VM (just like Java VM), and likely influenced the VM of Erlang. 

Java bytecode verification checks (and their vulnerabilities in bytecode loaders) are well known, but the security characteristics of other VMs and even languages running on JVM is less well known (clojure/scala ?).

BEAM is the Erlang VM (Bogdan/Björn’s Erlang Abstract Machine). It is the successor of JAM (Joe’s Abstract Machine) which was inspired by Prolog WAM. Details on it are found in the Hitchhiker’s Tour of BEAM, with ways to crash the BEAM VM such as creating too many atoms (which never get deleted). BEAM can run on bare-metal on XEN – . The format is described as based on EA IFF 85 – Standard for Interchange Format Files with a FOR1 starting 4 bytes. Here’s the full 6k lines of beam_load.c .

The code-to-BEAM-bytecode processing pipeline is described here as including a preprocessing step (see -E, -S, -P options to erlc). An interesting problem of peeking and pattern-matching inflight messages is discussed here. I find it interesting to think what would happen if one froze an erlang VM to see all inflight messages – like putting a breakpoint in the kernel. The way to get a stacktrace is erlang:get_stacktrace()

The hot-swapping capability is worrying, coming from the objective-c world that had code-signing to lock down changes. Erlang does have a strong process isolation model.

This article on Dart makes the case for non-bytecode VMs with the following  line “When your VM’s input is program source code, it can rely on the grammar of the language itself to enforce important invariants”. Sounds relevant to byte-code VMs as well.

Brief history of cloud time

Amazon Web Services (AWS) started with elastic and scalable compute and storage offerings – an elastic compute cloud (EC2) and a simple storage service (S3). EC2 was based on virtual machines. S3 was a simple key value store with an object store backend.

Heroku came up with a deploy an app with a single command line model – heroku deploy. It would allow automation of deployment of rails apps. It was an early example of a platform as a service (PAAS).

NASA engineers wanted to store and process big data and tasked an internal team and Rackspace to build a web application framework for it. When they tried to use the existing aws offerings, they found the time to upload the data to AWS would take too long. They came up with their own implementation to manage virtual machines and handle storage, with openstack. It provided EC2 and S3 like apis and worked with certain hypervisors.

Greenplum which is a big data company acquired by EMC, wanted a framework to make use of big data easily. EMC had brought VMware which brought Cloudfoundry. Greenplum worked with VWware and a dev company pivotal labs to build up Cloudfoundry as a mechanism to deploy apps so they could be scaled up easily. Pivotal got acquired by VMware and later spun out as Pivotal Cloudfoundry to provide enterprise level support to the open source cloudfoundry offering.  This had participation from EMC, VMWare (both were one already), and a third participant – GE with an interest to build an industrial cloud.   Cloudfoundry forms the basis of the GE predix cloud.

Cloudfoundy is now a 3rd gen PAAS which works on top of an IAAS layer such as openstack or vmware vsphere or even docker containers running locally via a Cloud Provider Interface (CPI).

Another interesting project is Apache Geode, an in-memory database along the lines of Hana.

Meanwhile Amazon and Azure increase the number of webservices  available rapidly as well as reducing their costs.

There was a meetup on Ranger recently at Pivotal labs which discussed authorization for various components in the Hadoop ecosystem, including Kafka.