Introduction to Module Serialization

When to deploy TVM runtime module, no matter whether it is CPU or GPU, TVM only needs one single dynamic shared library. The key is our unified module serialization mechanism. This document will introduce TVM module serialization format standard and implementation details.

Module Export Example

Let us build one ResNet-18 workload for GPU as an example first.

from tvm import relay
from tvm.relay import testing
from tvm.contrib import util
import tvm

# Resnet18 workload
resnet18_mod, resnet18_params = relay.testing.resnet.get_workload(num_layers=18)

# build
with relay.build_config(opt_level=3):
    _, resnet18_lib, _ = relay.build_module.build(resnet18_mod, "cuda", params=resnet18_params)

# create one tempory directory
temp = util.tempdir()

# path lib
file_name = "deploy.so"
path_lib = temp.relpath(file_name)

# export library
resnet18_lib.export_library(path_lib)

# load it back
loaded_lib = tvm.runtime.load(path_lib)
assert loaded_lib.type_key == "library"
assert loaded_lib.imported_modules[0].type_key == "cuda"

Serialization

The entrance API is export_library of tvm.module.Module. Inside this function, we will do the following steps:

  1. Collect all DSO modules (LLVM modules and C modules)

  2. Once we have DSO modules, we will call save function to save them into files.

  3. Next, we will check whether we have imported modules, such as CUDA, OpenCL or anything else. We don’t restrict the module type here. Once we have imported modules, we will create one file named devc.o / dev.cc (so that we could embed the binary blob data of import modules into one dynamic shared library), then call function _PackImportsToLLVM or _PackImportsToC to do module serialization.

  4. Finally, we call fcompile which invokes _cc.create_shared to get dynamic shared library.

Note

  1. For C source modules, we will compile them and link them together with the DSO module.

  2. Use _PackImportsToLLVM or _PackImportsToC depends on whether we enable LLVM in TVM. They achieve the same goal in fact.

Under the Hood of Serialization and Format Standard

As said before, we will do the serialization work in the _PackImportsToLLVM or _PackImportsToC. They both call SerializeModule to serialize the runtime module. In SerializeModule function, we firstly construct one helper class ModuleSerializer. It will take module to do some initialization work, like marking module index. Then we could use its SerializeModule to serialize module.

For better understanding, let us dig the implementation of this class a little deeper.

The following code is used to construct ModuleSerializer:

explicit ModuleSerializer(runtime::Module mod) : mod_(mod) {
  Init();
}
private:
void Init() {
  CreateModuleIndex();
  CreateImportTree();
}

In CreateModuleIndex(), We will inspect module import relationship using DFS and create index for them. Note the root module is fixed at location 0. In our example, we have module relationship like this:

llvm_mod:imported_modules
  - cuda_mod

So LLVM module will have index 0, CUDA module will have index 1.

After constructing module index, we will try to construct import tree (CreateImportTree()), which will be used to restore module import relationship when we load the exported library back. In our design, we use CSR format to store import tree, each row is parent index, the child indices correspond to its children index. In code, we use import_tree_row_ptr_ and import_tree_child_indices_ to represent them.

After initialization, we could serialize module using SerializeModule function. In its function logic, we will assume the serialization format like this:

binary_blob_size
binary_blob_type_key
binary_blob_logic
binary_blob_type_key
binary_blob_logic
...
_import_tree
_import_tree_logic

binary_blob_size is the number of blobs we will have in this serialization step. There will be three blobs in our example which are created for LLVM module, CUDA module, and _import_tree, respectively.

binary_blob_type_key is the blob type key of module. For LLVM / C module, whose blob type key is _lib. For CUDA module, it is cuda, which could be got by module->type_key().

binary_blob_logic is the logic handling of blob. For most of blob (like CUDA, OpenCL), we will call SaveToBinary function to serialize blob into binary. However, like LLVM / C module, we will only write _lib to indicate this is a DSO module.

Note

Whether or not it is required to implement the SaveToBinary virtual function depends on how the module is used. For example, If the module has information we need when we load the dynamic shared library back, we should do. Like CUDA module, we need its binary data passing to GPU driver when we load the dynamic shared library, so we should implement SaveToBinary to serialize its binary data. But for host module (like DSO), we don’t need other information when we load the dynamic shared library, so we don’t need to implement SaveToBinary. However, if in the future, we want to record some meta information of DSO module, we could implement SaveToBinary for DSO module too.

Finally, we will write one key _import_tree unless our module only has one DSO module and it is in the root. It is used to reconstruct the module import relationship when we load the exported library back as said before. The import_tree_logic is just to write import_tree_row_ptr_ and import_tree_child_indices_ into stream.

After this step, we will pack it into a symbol runtime::symbol::tvm_dev_mblob that can be recovered in the dynamic libary.

Now, we complete the serialization part. As you have seen, we could support arbitrary modules to import ideally.

Deserialization

The entrance API is tvm.runtime.load. This function is to call _LoadFromFile in fact. If we dig it a little deeper, this is Module::LoadFromFile. In our example, the file is deploy.so, according to the function logic, we will call module.loadfile_so in dso_library.cc. The key is here:

// Load the imported modules
const char* dev_mblob = reinterpret_cast<const char*>(lib->GetSymbol(runtime::symbol::tvm_dev_mblob));
Module root_mod;
if (dev_mblob != nullptr) {
root_mod = ProcessModuleBlob(dev_mblob, lib);
} else {
// Only have one single DSO Module
root_mod = Module(n);
}

As said before, we will pack the blob into the symbol runtime::symbol::tvm_dev_mblob. During deserialization part, we will inspect it. If we have runtime::symbol::tvm_dev_mblob, we will call ProcessModuleBlob, whose logic like this:

READ(blob_size)
READ(blob_type_key)
for (size_t i = 0; i < blob_size; i++) {
    if (blob_type_key == "_lib") {
      // construct dso module using lib
    } else if (blob_type_key == "_import_tree") {
      // READ(_import_tree_row_ptr)
      // READ(_import_tree_child_indices)
    } else {
      // call module.loadbinary_blob_type_key, such as module.loadbinary_cuda
      // to restore.
    }
}
// Using _import_tree_row_ptr and _import_tree_child_indices to
// restore module import relationship. The first module is the
// root module according to our invariance as said before.
return root_module;

After this, we will set the ctx_address to be the root_module so that allow lookup of symbol from root (so all symbols are visible).

Finally, we complete the deserialization part.