HLS Backend Example

TVM supports Xilinx FPGA board with SDAccel. Here is a tutorial for how to deploy TVM to AWS F1 FPGA instance.

Note

This feature is still experimental. We cannot use SDAccel to deploy an end to end neural networks for now.

We use two python scripts for this tutorial.

  • build.py - a script to synthesize FPGA bitstream.

    import tvm
    from tvm import te
    
    tgt_host="llvm"
    tgt="sdaccel"
    
    n = te.var("n")
    A = te.placeholder((n,), name='A')
    B = te.placeholder((n,), name='B')
    C = te.compute(A.shape, lambda i: A[i] + B[i], name="C")
    
    s = te.create_schedule(C.op)
    px, x = s[C].split(C.op.axis[0], nparts=1)
    
    s[C].bind(px, tvm.thread_axis("pipeline"))
    
    fadd = tvm.build(s, [A, B, C], tgt, target_host=tgt_host, name="myadd")
    
    fadd.save("myadd.o")
    fadd.imported_modules[0].save("myadd.xclbin")
    
    tvm.contrib.cc.create_shared("myadd.so", ["myadd.o"])
    
  • run.py - a script to use FPGA as an accelerator.

    import tvm
    import numpy as np
    import os
    
    tgt="sdaccel"
    
    fadd = tvm.runtime.load("myadd.so")
    if os.environ.get("XCL_EMULATION_MODE"):
        fadd_dev = tvm.runtime.load("myadd.xclbin")
    else:
        fadd_dev = tvm.runtime.load("myadd.awsxclbin")
    fadd.import_module(fadd_dev)
    
    ctx = tvm.context(tgt, 0)
    
    n = 1024
    a = tvm.nd.array(np.random.uniform(size=n).astype("float32"), ctx)
    b = tvm.nd.array(np.random.uniform(size=n).astype("float32"), ctx)
    c = tvm.nd.array(np.zeros(n, dtype="float32"), ctx)
    
    fadd(a, b, c)
    tvm.testing.assert_allclose(c.asnumpy(), a.asnumpy() + b.asnumpy())
    

Setup

  • Launch an instance using the FPGA Developer AMI. We don’t need an F1 instance for emulation and synthesis, so it is recommended to use a lower cost instance for them.

  • Setup AWS FPGA development kit.

    git clone https://github.com/aws/aws-fpga.git
    cd aws-fpga
    source sdaccel_setup.sh
    source ${XILINX_SDX}/settings64.sh
    
  • Setup TVM with OpenCL enabled.

Emulation

  • Create emconfig.json for emulation.

    emconfigutil --platform ${AWS_PLATFORM} --nd 1
    
  • Copy emconfig.json to the python binary directory. It is because the current Xilinx toolkit assumes that both host binary and the emconfig.json file are in the same path.

    cp emconfig.json $(dirname $(which python))
    
  • Run software emulation

    export XCL_EMULATION_MODE=1
    export XCL_TARGET=sw_emu
    
    python build.py
    python run.py
    
  • Run hardware emulation

    export XCL_EMULATION_MODE=1
    export XCL_TARGET=hw_emu
    
    python build.py
    python run.py
    

Synthesis

  • Run synthesis with the following script.

    unset XCL_EMULATION_MODE
    export XCL_TARGET=hw
    
    python build.py
    
  • Create AWS FPGA image and upload it to AWS S3.

    ${SDACCEL_DIR}/tools/create_sdaccel_afi.sh \
        -xclbin=myadd.xclbin -o=myadd \
        -s3_bucket=<bucket-name> -s3_dcp_key=<dcp-folder-name> \
        -s3_logs_key=<logs-folder-name>
    

    This also generates an awsxclbin file, which is necessary to use the AWS FPGA image on F1 instances.

Run

  • Launch Amazon EC2 F1 instance.

  • Copy myadd.so, myadd.awsxclbin, and run.py to the F1 instance.

  • Setup AWS FPGA development kit.

    git clone https://github.com/aws/aws-fpga.git
    cd aws-fpga
    source sdaccel_setup.sh
    
  • Setup TVM with OpenCL enabled.

  • Become root and setup environment variables.

    sudo sh
    source ${INSTALL_ROOT}/setup.sh
    
  • Run

    python run.py