Quick Start Tutorial for Compiling Deep Learning Models

Author: Yao Wang

This example shows how to build a neural network with NNVM python frontend and generate runtime library for Nvidia GPU with TVM. Notice that you need to build TVM with cuda and llvm enabled.

Overview for Supported Hardware Backend of TVM

The image below shows hardware backend currently supported by TVM:

https://github.com/dmlc/web-data/raw/master/tvm/tutorial/tvm_support_list.png

In this tutorial, we’ll choose cuda and llvm as target backends. To begin with, let’s import NNVM and TVM.

import numpy as np

import nnvm.compiler
import nnvm.testing
import tvm
from tvm.contrib import graph_runtime

Define Neural Network in NNVM

First, let’s define a neural network with nnvm python frontend. For simplicity, we’ll use pre-defined resnet-18 network in NNVM. Parameters are initialized with Xavier initializer. NNVM also supports other model formats such as MXNet, CoreML, ONNX and Tensorflow.

In this tutorial, we assume we will do inference on our device and the batch size is set to be 1. Input images are RGB color images of size 224 * 224. We can call the nnvm.symbol.debug_str to show the network structure.

batch_size = 1
num_class = 1000
image_shape = (3, 224, 224)
data_shape = (batch_size,) + image_shape
out_shape = (batch_size, num_class)

net, params = nnvm.testing.resnet.get_workload(
    layers=18, batch_size=batch_size, image_shape=image_shape)
print(net.debug_str())

Out:

Symbol Outputs:
        output[0]=softmax(0)
Variable:data
Variable:bn_data_gamma
Variable:bn_data_beta
Variable:bn_data_moving_mean
Variable:bn_data_moving_var
--------------------
Op:batch_norm, Name=bn_data
Inputs:
        arg[0]=data(0) version=0
        arg[1]=bn_data_gamma(0) version=0
        arg[2]=bn_data_beta(0) version=0
        arg[3]=bn_data_moving_mean(0) version=1
        arg[4]=bn_data_moving_var(0) version=1
Attrs:
        epsilon=2e-05
        scale=False
Variable:conv0_weight
--------------------
Op:conv2d, Name=conv0
Inputs:
        arg[0]=bn_data(0)
        arg[1]=conv0_weight(0) version=0
Attrs:
        channels=64
        kernel_size=(7, 7)
        padding=(3, 3)
        strides=(2, 2)
        use_bias=False
Variable:bn0_gamma
Variable:bn0_beta
Variable:bn0_moving_mean
Variable:bn0_moving_var
--------------------
Op:batch_norm, Name=bn0
Inputs:
        arg[0]=conv0(0)
        arg[1]=bn0_gamma(0) version=0
        arg[2]=bn0_beta(0) version=0
        arg[3]=bn0_moving_mean(0) version=1
        arg[4]=bn0_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=relu0
Inputs:
        arg[0]=bn0(0)
--------------------
Op:max_pool2d, Name=max_pool2d0
Inputs:
        arg[0]=relu0(0)
Attrs:
        padding=(1, 1)
        pool_size=(3, 3)
        strides=(2, 2)
Variable:stage1_unit1_bn1_gamma
Variable:stage1_unit1_bn1_beta
Variable:stage1_unit1_bn1_moving_mean
Variable:stage1_unit1_bn1_moving_var
--------------------
Op:batch_norm, Name=stage1_unit1_bn1
Inputs:
        arg[0]=max_pool2d0(0)
        arg[1]=stage1_unit1_bn1_gamma(0) version=0
        arg[2]=stage1_unit1_bn1_beta(0) version=0
        arg[3]=stage1_unit1_bn1_moving_mean(0) version=1
        arg[4]=stage1_unit1_bn1_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage1_unit1_relu1
Inputs:
        arg[0]=stage1_unit1_bn1(0)
Variable:stage1_unit1_conv1_weight
--------------------
Op:conv2d, Name=stage1_unit1_conv1
Inputs:
        arg[0]=stage1_unit1_relu1(0)
        arg[1]=stage1_unit1_conv1_weight(0) version=0
Attrs:
        channels=64
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage1_unit1_bn2_gamma
Variable:stage1_unit1_bn2_beta
Variable:stage1_unit1_bn2_moving_mean
Variable:stage1_unit1_bn2_moving_var
--------------------
Op:batch_norm, Name=stage1_unit1_bn2
Inputs:
        arg[0]=stage1_unit1_conv1(0)
        arg[1]=stage1_unit1_bn2_gamma(0) version=0
        arg[2]=stage1_unit1_bn2_beta(0) version=0
        arg[3]=stage1_unit1_bn2_moving_mean(0) version=1
        arg[4]=stage1_unit1_bn2_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage1_unit1_relu2
Inputs:
        arg[0]=stage1_unit1_bn2(0)
Variable:stage1_unit1_conv2_weight
--------------------
Op:conv2d, Name=stage1_unit1_conv2
Inputs:
        arg[0]=stage1_unit1_relu2(0)
        arg[1]=stage1_unit1_conv2_weight(0) version=0
Attrs:
        channels=64
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage1_unit1_sc_weight
--------------------
Op:conv2d, Name=stage1_unit1_sc
Inputs:
        arg[0]=stage1_unit1_relu1(0)
        arg[1]=stage1_unit1_sc_weight(0) version=0
Attrs:
        channels=64
        kernel_size=(1, 1)
        strides=(1, 1)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add0
Inputs:
        arg[0]=stage1_unit1_conv2(0)
        arg[1]=stage1_unit1_sc(0)
Variable:stage1_unit2_bn1_gamma
Variable:stage1_unit2_bn1_beta
Variable:stage1_unit2_bn1_moving_mean
Variable:stage1_unit2_bn1_moving_var
--------------------
Op:batch_norm, Name=stage1_unit2_bn1
Inputs:
        arg[0]=elemwise_add0(0)
        arg[1]=stage1_unit2_bn1_gamma(0) version=0
        arg[2]=stage1_unit2_bn1_beta(0) version=0
        arg[3]=stage1_unit2_bn1_moving_mean(0) version=1
        arg[4]=stage1_unit2_bn1_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage1_unit2_relu1
Inputs:
        arg[0]=stage1_unit2_bn1(0)
Variable:stage1_unit2_conv1_weight
--------------------
Op:conv2d, Name=stage1_unit2_conv1
Inputs:
        arg[0]=stage1_unit2_relu1(0)
        arg[1]=stage1_unit2_conv1_weight(0) version=0
Attrs:
        channels=64
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage1_unit2_bn2_gamma
Variable:stage1_unit2_bn2_beta
Variable:stage1_unit2_bn2_moving_mean
Variable:stage1_unit2_bn2_moving_var
--------------------
Op:batch_norm, Name=stage1_unit2_bn2
Inputs:
        arg[0]=stage1_unit2_conv1(0)
        arg[1]=stage1_unit2_bn2_gamma(0) version=0
        arg[2]=stage1_unit2_bn2_beta(0) version=0
        arg[3]=stage1_unit2_bn2_moving_mean(0) version=1
        arg[4]=stage1_unit2_bn2_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage1_unit2_relu2
Inputs:
        arg[0]=stage1_unit2_bn2(0)
Variable:stage1_unit2_conv2_weight
--------------------
Op:conv2d, Name=stage1_unit2_conv2
Inputs:
        arg[0]=stage1_unit2_relu2(0)
        arg[1]=stage1_unit2_conv2_weight(0) version=0
Attrs:
        channels=64
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add1
Inputs:
        arg[0]=stage1_unit2_conv2(0)
        arg[1]=elemwise_add0(0)
Variable:stage2_unit1_bn1_gamma
Variable:stage2_unit1_bn1_beta
Variable:stage2_unit1_bn1_moving_mean
Variable:stage2_unit1_bn1_moving_var
--------------------
Op:batch_norm, Name=stage2_unit1_bn1
Inputs:
        arg[0]=elemwise_add1(0)
        arg[1]=stage2_unit1_bn1_gamma(0) version=0
        arg[2]=stage2_unit1_bn1_beta(0) version=0
        arg[3]=stage2_unit1_bn1_moving_mean(0) version=1
        arg[4]=stage2_unit1_bn1_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage2_unit1_relu1
Inputs:
        arg[0]=stage2_unit1_bn1(0)
Variable:stage2_unit1_conv1_weight
--------------------
Op:conv2d, Name=stage2_unit1_conv1
Inputs:
        arg[0]=stage2_unit1_relu1(0)
        arg[1]=stage2_unit1_conv1_weight(0) version=0
Attrs:
        channels=128
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(2, 2)
        use_bias=False
Variable:stage2_unit1_bn2_gamma
Variable:stage2_unit1_bn2_beta
Variable:stage2_unit1_bn2_moving_mean
Variable:stage2_unit1_bn2_moving_var
--------------------
Op:batch_norm, Name=stage2_unit1_bn2
Inputs:
        arg[0]=stage2_unit1_conv1(0)
        arg[1]=stage2_unit1_bn2_gamma(0) version=0
        arg[2]=stage2_unit1_bn2_beta(0) version=0
        arg[3]=stage2_unit1_bn2_moving_mean(0) version=1
        arg[4]=stage2_unit1_bn2_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage2_unit1_relu2
Inputs:
        arg[0]=stage2_unit1_bn2(0)
Variable:stage2_unit1_conv2_weight
--------------------
Op:conv2d, Name=stage2_unit1_conv2
Inputs:
        arg[0]=stage2_unit1_relu2(0)
        arg[1]=stage2_unit1_conv2_weight(0) version=0
Attrs:
        channels=128
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage2_unit1_sc_weight
--------------------
Op:conv2d, Name=stage2_unit1_sc
Inputs:
        arg[0]=stage2_unit1_relu1(0)
        arg[1]=stage2_unit1_sc_weight(0) version=0
Attrs:
        channels=128
        kernel_size=(1, 1)
        strides=(2, 2)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add2
Inputs:
        arg[0]=stage2_unit1_conv2(0)
        arg[1]=stage2_unit1_sc(0)
Variable:stage2_unit2_bn1_gamma
Variable:stage2_unit2_bn1_beta
Variable:stage2_unit2_bn1_moving_mean
Variable:stage2_unit2_bn1_moving_var
--------------------
Op:batch_norm, Name=stage2_unit2_bn1
Inputs:
        arg[0]=elemwise_add2(0)
        arg[1]=stage2_unit2_bn1_gamma(0) version=0
        arg[2]=stage2_unit2_bn1_beta(0) version=0
        arg[3]=stage2_unit2_bn1_moving_mean(0) version=1
        arg[4]=stage2_unit2_bn1_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage2_unit2_relu1
Inputs:
        arg[0]=stage2_unit2_bn1(0)
Variable:stage2_unit2_conv1_weight
--------------------
Op:conv2d, Name=stage2_unit2_conv1
Inputs:
        arg[0]=stage2_unit2_relu1(0)
        arg[1]=stage2_unit2_conv1_weight(0) version=0
Attrs:
        channels=128
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage2_unit2_bn2_gamma
Variable:stage2_unit2_bn2_beta
Variable:stage2_unit2_bn2_moving_mean
Variable:stage2_unit2_bn2_moving_var
--------------------
Op:batch_norm, Name=stage2_unit2_bn2
Inputs:
        arg[0]=stage2_unit2_conv1(0)
        arg[1]=stage2_unit2_bn2_gamma(0) version=0
        arg[2]=stage2_unit2_bn2_beta(0) version=0
        arg[3]=stage2_unit2_bn2_moving_mean(0) version=1
        arg[4]=stage2_unit2_bn2_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage2_unit2_relu2
Inputs:
        arg[0]=stage2_unit2_bn2(0)
Variable:stage2_unit2_conv2_weight
--------------------
Op:conv2d, Name=stage2_unit2_conv2
Inputs:
        arg[0]=stage2_unit2_relu2(0)
        arg[1]=stage2_unit2_conv2_weight(0) version=0
Attrs:
        channels=128
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add3
Inputs:
        arg[0]=stage2_unit2_conv2(0)
        arg[1]=elemwise_add2(0)
Variable:stage3_unit1_bn1_gamma
Variable:stage3_unit1_bn1_beta
Variable:stage3_unit1_bn1_moving_mean
Variable:stage3_unit1_bn1_moving_var
--------------------
Op:batch_norm, Name=stage3_unit1_bn1
Inputs:
        arg[0]=elemwise_add3(0)
        arg[1]=stage3_unit1_bn1_gamma(0) version=0
        arg[2]=stage3_unit1_bn1_beta(0) version=0
        arg[3]=stage3_unit1_bn1_moving_mean(0) version=1
        arg[4]=stage3_unit1_bn1_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage3_unit1_relu1
Inputs:
        arg[0]=stage3_unit1_bn1(0)
Variable:stage3_unit1_conv1_weight
--------------------
Op:conv2d, Name=stage3_unit1_conv1
Inputs:
        arg[0]=stage3_unit1_relu1(0)
        arg[1]=stage3_unit1_conv1_weight(0) version=0
Attrs:
        channels=256
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(2, 2)
        use_bias=False
Variable:stage3_unit1_bn2_gamma
Variable:stage3_unit1_bn2_beta
Variable:stage3_unit1_bn2_moving_mean
Variable:stage3_unit1_bn2_moving_var
--------------------
Op:batch_norm, Name=stage3_unit1_bn2
Inputs:
        arg[0]=stage3_unit1_conv1(0)
        arg[1]=stage3_unit1_bn2_gamma(0) version=0
        arg[2]=stage3_unit1_bn2_beta(0) version=0
        arg[3]=stage3_unit1_bn2_moving_mean(0) version=1
        arg[4]=stage3_unit1_bn2_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage3_unit1_relu2
Inputs:
        arg[0]=stage3_unit1_bn2(0)
Variable:stage3_unit1_conv2_weight
--------------------
Op:conv2d, Name=stage3_unit1_conv2
Inputs:
        arg[0]=stage3_unit1_relu2(0)
        arg[1]=stage3_unit1_conv2_weight(0) version=0
Attrs:
        channels=256
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage3_unit1_sc_weight
--------------------
Op:conv2d, Name=stage3_unit1_sc
Inputs:
        arg[0]=stage3_unit1_relu1(0)
        arg[1]=stage3_unit1_sc_weight(0) version=0
Attrs:
        channels=256
        kernel_size=(1, 1)
        strides=(2, 2)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add4
Inputs:
        arg[0]=stage3_unit1_conv2(0)
        arg[1]=stage3_unit1_sc(0)
Variable:stage3_unit2_bn1_gamma
Variable:stage3_unit2_bn1_beta
Variable:stage3_unit2_bn1_moving_mean
Variable:stage3_unit2_bn1_moving_var
--------------------
Op:batch_norm, Name=stage3_unit2_bn1
Inputs:
        arg[0]=elemwise_add4(0)
        arg[1]=stage3_unit2_bn1_gamma(0) version=0
        arg[2]=stage3_unit2_bn1_beta(0) version=0
        arg[3]=stage3_unit2_bn1_moving_mean(0) version=1
        arg[4]=stage3_unit2_bn1_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage3_unit2_relu1
Inputs:
        arg[0]=stage3_unit2_bn1(0)
Variable:stage3_unit2_conv1_weight
--------------------
Op:conv2d, Name=stage3_unit2_conv1
Inputs:
        arg[0]=stage3_unit2_relu1(0)
        arg[1]=stage3_unit2_conv1_weight(0) version=0
Attrs:
        channels=256
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage3_unit2_bn2_gamma
Variable:stage3_unit2_bn2_beta
Variable:stage3_unit2_bn2_moving_mean
Variable:stage3_unit2_bn2_moving_var
--------------------
Op:batch_norm, Name=stage3_unit2_bn2
Inputs:
        arg[0]=stage3_unit2_conv1(0)
        arg[1]=stage3_unit2_bn2_gamma(0) version=0
        arg[2]=stage3_unit2_bn2_beta(0) version=0
        arg[3]=stage3_unit2_bn2_moving_mean(0) version=1
        arg[4]=stage3_unit2_bn2_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage3_unit2_relu2
Inputs:
        arg[0]=stage3_unit2_bn2(0)
Variable:stage3_unit2_conv2_weight
--------------------
Op:conv2d, Name=stage3_unit2_conv2
Inputs:
        arg[0]=stage3_unit2_relu2(0)
        arg[1]=stage3_unit2_conv2_weight(0) version=0
Attrs:
        channels=256
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add5
Inputs:
        arg[0]=stage3_unit2_conv2(0)
        arg[1]=elemwise_add4(0)
Variable:stage4_unit1_bn1_gamma
Variable:stage4_unit1_bn1_beta
Variable:stage4_unit1_bn1_moving_mean
Variable:stage4_unit1_bn1_moving_var
--------------------
Op:batch_norm, Name=stage4_unit1_bn1
Inputs:
        arg[0]=elemwise_add5(0)
        arg[1]=stage4_unit1_bn1_gamma(0) version=0
        arg[2]=stage4_unit1_bn1_beta(0) version=0
        arg[3]=stage4_unit1_bn1_moving_mean(0) version=1
        arg[4]=stage4_unit1_bn1_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage4_unit1_relu1
Inputs:
        arg[0]=stage4_unit1_bn1(0)
Variable:stage4_unit1_conv1_weight
--------------------
Op:conv2d, Name=stage4_unit1_conv1
Inputs:
        arg[0]=stage4_unit1_relu1(0)
        arg[1]=stage4_unit1_conv1_weight(0) version=0
Attrs:
        channels=512
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(2, 2)
        use_bias=False
Variable:stage4_unit1_bn2_gamma
Variable:stage4_unit1_bn2_beta
Variable:stage4_unit1_bn2_moving_mean
Variable:stage4_unit1_bn2_moving_var
--------------------
Op:batch_norm, Name=stage4_unit1_bn2
Inputs:
        arg[0]=stage4_unit1_conv1(0)
        arg[1]=stage4_unit1_bn2_gamma(0) version=0
        arg[2]=stage4_unit1_bn2_beta(0) version=0
        arg[3]=stage4_unit1_bn2_moving_mean(0) version=1
        arg[4]=stage4_unit1_bn2_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage4_unit1_relu2
Inputs:
        arg[0]=stage4_unit1_bn2(0)
Variable:stage4_unit1_conv2_weight
--------------------
Op:conv2d, Name=stage4_unit1_conv2
Inputs:
        arg[0]=stage4_unit1_relu2(0)
        arg[1]=stage4_unit1_conv2_weight(0) version=0
Attrs:
        channels=512
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage4_unit1_sc_weight
--------------------
Op:conv2d, Name=stage4_unit1_sc
Inputs:
        arg[0]=stage4_unit1_relu1(0)
        arg[1]=stage4_unit1_sc_weight(0) version=0
Attrs:
        channels=512
        kernel_size=(1, 1)
        strides=(2, 2)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add6
Inputs:
        arg[0]=stage4_unit1_conv2(0)
        arg[1]=stage4_unit1_sc(0)
Variable:stage4_unit2_bn1_gamma
Variable:stage4_unit2_bn1_beta
Variable:stage4_unit2_bn1_moving_mean
Variable:stage4_unit2_bn1_moving_var
--------------------
Op:batch_norm, Name=stage4_unit2_bn1
Inputs:
        arg[0]=elemwise_add6(0)
        arg[1]=stage4_unit2_bn1_gamma(0) version=0
        arg[2]=stage4_unit2_bn1_beta(0) version=0
        arg[3]=stage4_unit2_bn1_moving_mean(0) version=1
        arg[4]=stage4_unit2_bn1_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage4_unit2_relu1
Inputs:
        arg[0]=stage4_unit2_bn1(0)
Variable:stage4_unit2_conv1_weight
--------------------
Op:conv2d, Name=stage4_unit2_conv1
Inputs:
        arg[0]=stage4_unit2_relu1(0)
        arg[1]=stage4_unit2_conv1_weight(0) version=0
Attrs:
        channels=512
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
Variable:stage4_unit2_bn2_gamma
Variable:stage4_unit2_bn2_beta
Variable:stage4_unit2_bn2_moving_mean
Variable:stage4_unit2_bn2_moving_var
--------------------
Op:batch_norm, Name=stage4_unit2_bn2
Inputs:
        arg[0]=stage4_unit2_conv1(0)
        arg[1]=stage4_unit2_bn2_gamma(0) version=0
        arg[2]=stage4_unit2_bn2_beta(0) version=0
        arg[3]=stage4_unit2_bn2_moving_mean(0) version=1
        arg[4]=stage4_unit2_bn2_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=stage4_unit2_relu2
Inputs:
        arg[0]=stage4_unit2_bn2(0)
Variable:stage4_unit2_conv2_weight
--------------------
Op:conv2d, Name=stage4_unit2_conv2
Inputs:
        arg[0]=stage4_unit2_relu2(0)
        arg[1]=stage4_unit2_conv2_weight(0) version=0
Attrs:
        channels=512
        kernel_size=(3, 3)
        padding=(1, 1)
        strides=(1, 1)
        use_bias=False
--------------------
Op:elemwise_add, Name=elemwise_add7
Inputs:
        arg[0]=stage4_unit2_conv2(0)
        arg[1]=elemwise_add6(0)
Variable:bn1_gamma
Variable:bn1_beta
Variable:bn1_moving_mean
Variable:bn1_moving_var
--------------------
Op:batch_norm, Name=bn1
Inputs:
        arg[0]=elemwise_add7(0)
        arg[1]=bn1_gamma(0) version=0
        arg[2]=bn1_beta(0) version=0
        arg[3]=bn1_moving_mean(0) version=1
        arg[4]=bn1_moving_var(0) version=1
Attrs:
        epsilon=2e-05
--------------------
Op:relu, Name=relu1
Inputs:
        arg[0]=bn1(0)
--------------------
Op:global_avg_pool2d, Name=pool1
Inputs:
        arg[0]=relu1(0)
--------------------
Op:flatten, Name=flatten0
Inputs:
        arg[0]=pool1(0)
Variable:fc1_weight
Variable:fc1_bias
--------------------
Op:dense, Name=fc1
Inputs:
        arg[0]=flatten0(0)
        arg[1]=fc1_weight(0) version=0
        arg[2]=fc1_bias(0) version=0
Attrs:
        units=1000
--------------------
Op:softmax, Name=softmax
Inputs:
        arg[0]=fc1(0)

Compilation

Next step is to compile the model using the NNVM/TVM pipeline. Users can specify the optimization level of the compilation. Currently this value can be 0 to 3. The optimization passes include operator fusion, pre-computation, layout transformation and so on.

nnvm.compiler.build returns three components: the execution graph in json format, the TVM module library of compiled functions specifically for this graph on the target hardware, and the parameter blobs of the model. During the compilation, NNVM does the graph-level optimization while TVM does the tensor-level optimization, resulting in an optimized runtime module for model serving.

We’ll first compile for Nvidia GPU. Behind the scene, nnvm.compiler.build first does a number of graph-level optimizations, e.g. pruning, fusing, etc., then registers the operators (i.e. the nodes of the optimized graphs) to TVM implementations to generate a tvm.module. To generate the module library, TVM will first transfer the High level IR into the lower intrinsic IR of the specified target backend, which is CUDA in this example. Then the machine code will be generated as the module library.

opt_level = 3
target = tvm.target.cuda()
with nnvm.compiler.build_config(opt_level=opt_level):
    graph, lib, params = nnvm.compiler.build(
        net, target, shape={"data": data_shape}, params=params)

Run the generate library

Now we can create graph runtime and run the module on Nvidia GPU.

# create random input
ctx = tvm.gpu()
data = np.random.uniform(-1, 1, size=data_shape).astype("float32")
# create module
module = graph_runtime.create(graph, lib, ctx)
# set input and parameters
module.set_input("data", data)
module.set_input(**params)
# run
module.run()
# get output
out = module.get_output(0, tvm.nd.empty(out_shape))
# convert to numpy
out.asnumpy()

# Print first 10 elements of output
print(out.asnumpy().flatten()[0:10])

Out:

[0.00089283 0.00103331 0.0009094  0.00102275 0.00108751 0.00106737
 0.00106262 0.00095838 0.00110792 0.00113151]

Save and Load Compiled Module

We can also save the graph, lib and parameters into files and load them back in deploy environment.

# save the graph, lib and params into separate files
from tvm.contrib import util

temp = util.tempdir()
path_lib = temp.relpath("deploy_lib.tar")
lib.export_library(path_lib)
with open(temp.relpath("deploy_graph.json"), "w") as fo:
    fo.write(graph.json())
with open(temp.relpath("deploy_param.params"), "wb") as fo:
    fo.write(nnvm.compiler.save_param_dict(params))
print(temp.listdir())

Out:

['deploy_param.params', 'deploy_lib.tar', 'deploy_graph.json']
# load the module back.
loaded_json = open(temp.relpath("deploy_graph.json")).read()
loaded_lib = tvm.module.load(path_lib)
loaded_params = bytearray(open(temp.relpath("deploy_param.params"), "rb").read())
input_data = tvm.nd.array(np.random.uniform(size=data_shape).astype("float32"))

module = graph_runtime.create(loaded_json, loaded_lib, ctx)
module.load_params(loaded_params)
module.run(data=input_data)
out = module.get_output(0).asnumpy()

Total running time of the script: ( 0 minutes 15.187 seconds)

Gallery generated by Sphinx-Gallery