Month: October 2021

Reinforcement learning in chip design

Deep learning is being applied to combinatorial optimization problems. A very intriguing talk by Anna Goldie discussed an application of RL to chip design that cuts down the time for layout optimization and which in turn enables optimizing of the chip design for a target software stack in simulation before the chip goes to production. Here’s a paper – graph placement methodology for fast chip design.

A snippet on how the research direction evolved to a learning problem.

Chip floorplanning as a learning problem

The underlying problem is a high-dimensional contextual bandits problem but, as in prior work, we have chosen to reformulate it as a sequential Markov decision process (MDP), because this allows us to more easily incorporate the problem constraints as described below. Our MDP consists of four key elements:
(1) States encode information about the partial placement, including the netlist (adjacency matrix), node features (width, height, type), edge features (number of connections), current node (macro) to be placed, and metadata of the netlist graph (routing allocations, total number of wires, macros and standard cell clusters).
(2) Actions are all possible locations (grid cells of the chip canvas) onto which the current macro can be placed without violating any hard constraints on density or blockages.
(3) State transitions define the probability distribution over next states, given a state and an action.
(4) Rewards are 0 for all actions except the last action, where the reward is a negative weighted sum of proxy wirelength, congestion and density, as described below.

We train a policy (an RL agent) modelled by a neural network that, through repeated episodes (sequences of states, actions and rewards), learns to take actions that will maximize cumulative reward (see Fig. 1).
We use proximal policy optimization (PPO) to update the parameters of the policy network, given the cumulative reward for each placement.”

Their diagram:

“An embedding layer encodes information about the netlist adjacency, node features and the current macro to be placed. The policy and value networks then output a probability distribution over available grid cells and an estimate of the expected reward for the current placement, respectively. id: identification number; fc: fullyconnected layer; de-conv: deconvolution layer”

A graph placement methodology for fast chip design | Nature

“Fig. 1 | Overview of our method and training regimen.In each training iteration, the RL agent places macros one at a time (actions, states and rewards are denoted byai, si and ri, respectively). Once all macros are placed, the standard cells are placed using a force-directed method. The intermediate rewards are zero. The reward at the end of each iteration is calculated as a linear combination of the approximate wirelength, congestion and density, and is provided as feedback to the agent to optimize its parameters for the next iteration.”

The references mention a number of applications of ML to chip design. A project exploring these is at https://github.com/The-OpenROAD-Project at https://theopenroadproject.org/wp-content/uploads/2021/11/demo-lounge-slides.pdf

PyTorch

PyTorch is an open source machine learning framework that is primarily used for building deep learning models. The framework is built on top of the Torch library and is implemented in Python, with support for C++ and CUDA.

The main C++ classes in PyTorch are:

  1. Tensor: This is the core object in PyTorch and represents a multi-dimensional array. Tensors are the basic building blocks of a PyTorch model and are used to store and manipulate data.
  2. Autograd: This is PyTorch’s automatic differentiation engine, which allows developers to compute gradients of tensors with respect to a loss function. The autograd module also provides a set of functions for computing gradients of complex functions.
  3. nn.Module: This is a base class for all neural network modules in PyTorch. It provides a convenient way to define and organize layers of a neural network, as well as a set of useful methods for training and evaluating the model.
  4. Optimizer: This is a class that implements various optimization algorithms, such as stochastic gradient descent (SGD), Adam, and Adagrad. The optimizer is used to update the parameters of a model during training.
  5. DataLoader: This is a utility class that provides an efficient way to load and preprocess large datasets for training a model. The DataLoader class can be used to batch and shuffle data, as well as to apply various transformations to the data.

PyTorch’s autograd engine implements a variant of reverse-mode automatic differentiation, which is also known as backpropagation. This algorithm efficiently calculates the gradients of the output with respect to each input variable by traversing the computational graph in reverse order, propagating the gradients backwards through each operation using the chain rule.

Chainer Variables implements this implementation of the chain-rule of differentiation and their doc explains this well – https://docs.chainer.org/en/latest/guides/variables.html

Step by step example of automatic differentiation – https://stats.stackexchange.com/questions/224140/step-by-step-example-of-reverse-mode-automatic-differentiation

https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html

Getting started with CUDA via pytorch –


>>> import torch
>>> torch.cuda
<module 'torch.cuda' from '/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/cuda/__init__.py'
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name()
'Tesla V100-SXM2-16GB'
>>> torch.cuda.memory_allocated()
0
>>> torch.cuda.get_device_properties(0).total_memory
16935419904
>>> import pynvml
>>> from pynvml import *
>>> nvmlInit()
>>> h = nvmlDeviceGetHandleByIndex(0)
>>> torch.cuda.mem_get_info()
(16112549888, 16935419904)
>>> var1=torch.FloatTensor([1.0,2.0,3.0]).cuda()
>>> var1
tensor([1., 2., 3.], device='cuda:0')
>>> var1.device
device(type='cuda', index=0)
>>> import torch.nn as nn