Relay Operator Strategy

In order to lower Relay operators to the implementations defined in TOPI library, a compute and schedule function need to be registered to each Relay operator. However, compute and schedule functions are usually specialized for each target, and further, even for the same target, we may have multiple algorithms and implementations available. To deal with the complexity, we introduce operator strategy to allow developers to define a flexible lowering strategy for each operator and target.

Operator Strategy Design

The basic element in operator strategy is an OpImplementation. It includes the a pair of compute and schedule function, the name of the implementation, and a priority level (the use of priority level is explained in Select Implementation from Op Strategy).

The OpStrategy includes a list of OpSpecialization. Each OpSpecialization contains a list of OpImplementation associated with a SpecializedCondition (see definition in include/tvm/te/schedule.h). The SpecializedCondition can be null, indicating the implementations are generally applicable; otherwise, the implementations are only considered when the specialized condition is satisfied. SpecializedCondition consists of a list of clauses defined in Tensor Expression in conjunctive normal form (CNF) and only supports conditions on tensor shapes.

Last, a strategy function, or FTVMStrategy, determines which pair(s) of compute and schedule functions should be used given a workload, and needs to be registered to each Relay operator. FTVMStrategy is a generic function (see include/tvm/target/generic_func.h), that can be overwritten for each target. The function signature is

OpStrategy(const Attrs& attrs, const Array<Tensor>& inputs, const Type& out_type, const Target& target)

that the function returns an OpStrategy given the op attributes, input tensors, output types, and target to compile to.

Write A Strategy Function

We recommend developers to write strategy function in Python as most TOPI compute and schedule functions are written in Python. In python, we provide OpStrategy class in pyton/tvm/relay/op/op.py. It only has one API, which is to add an implementation to the strategy:

def add_implementation(self, compute, schedule, name="default", plevel=10)

We now take topk as an example to explain how to write the FTVMStrategy function:

# add to python/tvm/relay/op/strategy/generic.py
@override_native_generic_func("topk_strategy")
def topk_strategy(attrs, inputs, out_type, target):
    strategy = _op.OpStrategy()
    strategy.add_implementation(
        wrap_compute_topk(topi.topk),
        wrap_topi_schedule(topi.generic.schedule_topk),
        name="topk.generic")
    return strategy

# add to each target file in python/tvm/relay/op/strategy, e.g., x86.py, cuda.py, etc.
@topk_strategy.register(["cuda", "gpu"])
def topk_strategy_cuda(attrs, inputs, out_type, target):
    strategy = _op.OpStrategy()
    strategy.add_implementation(
        wrap_compute_my_new_op(topi.cuda.topk),
        wrap_topi_schedule(topi.cuda.schedule_topk),
        name="topk.cuda")
    return strategy

In this example, we use topi.cuda.topk and topi.cuda.schedule_topk as the compute and schedule function for CUDA or GPU target, while use TOPI generic compute and schedule for the rest of targets. Note that we use two wrapper functions that wrap the topi compute and schedule to conform with the required function signature ( see FTVMCompute and FTVMSchedule in include/tvm/relay/op_attr_types.h). Usually we need to write a customized compute wrapper function for each operator to get different fields from op attributes.

The example above shows a very basic strategy function that only adds one implementation in the strategy. But for many complicated operators, we may need to add multiple implementations that use different algorithms. For example, we can use both direct and winograd algorithm to compute a conv2d op. In order to achieve this, we can write the strategy function as follows:

strategy.add_implementation(
    wrap_compute_conv2d(topi.cuda.conv2d_nchw),
    wrap_topi_schedule(topi.cuda.schedule_conv2d_nchw),
    name="conv2d_nchw.cuda",
    plevel=10)

if winograd_condition:
    strategy.add_implementation(
        wrap_compute_conv2d(topi.cuda.conv2d_nchw_winograd),
        wrap_topi_schedule(topi.cuda.schedule_conv2d_nchw_winograd),
        name="conv2d_nchw_winograd.cuda",
        plevel=15)

In this example, we add two implementations to the conv2d strategy where winograd algorithm is only added when winograd_condition is true. The implementation "conv2d_nchw_winograd.cuda" will be used to compile conv2d when winograd_condition is true as it has higher priority level (this could be changed if certain implementation is an AutoTVM template. See Select Implementation from Op Strategy for more details). Otherwise, "conv2d_nchw.cuda" is used.

We can extend the example above to third party library implementation. For example, we can add the implementation that invokes kernel in the cblas library when cblas is included in the target.

if "cblas" in target.libs:
    strategy.add_implementation(
        wrap_compute_dense(topi.x86.dense_cblas),
        wrap_topi_schedule(topi.x86.schedule_dense_cblas),
        name="dense_cblas.x86",
        plevel=15)

Further, we can add implementation specialized for a certain range of shapes. The code below shows an example of dense strategy that adds an implementation that is specialized for m greater than 16. The main difference between hardcode python condition like examples above and specialized condition is that it allows TVM to generate multiple kernels when the input tensors have symbolic shapes. The compile engine will generate a dispatch function that invokes the specialized kernel when the corresponding condition is met; otherwise, invoke the kernel that has no associated specialized condition (dense_common in this example). This part is still work in progress. More details will be provided after it is done.

def dense_strategy(attrs, inputs, out_type, target):
    m = inputs[0].shape[0]
    strategy = _op.OpStrategy()
    strategy.add_implementation(
        wrap_compute_dense(dense_compute1),
        wrap_topi_schedule(dense_schedule1),
        name="dense_common")

    with tvm.te.SpecializedCondition(m > 16):
        strategy.add_implementation(
            wrap_compute_dense(dense_compute2),
            wrap_topi_schedule(dense_schedule2),
            name="dense_for_large_m",
            plevel=15)

    return strategy

Register Strategy Function to An Operator

After we define the strategy function for an operator, we can now register the strategy function to this operator with

register_strategy("topk", strategy.topk_strategy)

However, it takes much effort to write a strategy function for an operator. Therefore, we provide two other methods for simpler operators.

First, for operators that have injective, broadcast, or reduction pattern, we can call register_injective_schedule, register_broadcast_schedule, and register_reduce_schedule repsectively. The schedule function for these patterns are already registered by each target and can be applied to these operators. We assume the compute function should be the same across all targets, and FTVMCompute needs to be registered to the op before invoking register schedule.

register_broadcast_schedule("add")

Second, for operators that doesn’t have these common patterns mentioned before, but also have the same compute function for all targets, we can use register_schedule API. It is easier to write FTVMSchedule function as we only need to provide which schedule function to use. The following code snippet shows FTVMSchedule function for pooling.

# add to python/tvm/relay/op/strategy/generic.py
@generic_func
def schedule_pool(attrs, outs, target):
    with target:
        return topi.generic.schedule_pool(outs, attrs.layout)

# add to each target file in python/tvm/relay/op/strategy, e.g., x86.py, cuda.py, etc.
@schedule_pool.register("cpu")
def schedule_pool_cpu(attrs, outs, target):
    ...

After we created the FTVMSchedule for an operator, we can register the strategy using register_schedule:

register_schedule("nn.max_pool2d", strategy.schedule_pool)

Register Strategies for A New Target

There are two ways to register strategies for a new target. The more straightforward one is adding a new target file in the directory python/tvm/relay/op/strategy. You only need to customize the strategy for ops that have been implemented for this new target and reuse the generic strategies for the rest.

Alternatively, you can also register the strategy for the new target outside the TVM python library. The following code snippet shows an example how to do so. You can find more examples in vta/python/vta/top/op.py.

@relay.op.strategy.conv2d_strategy.register("mytarget")
def conv2d_strategy_mytarget(attrs, inputs, out_type, target):
    ...

Select Implementation from Op Strategy

During the compilation, Relay compile engine needs to determine which implementation to use for an operator when there are multiple. The selection policy works as follows.

When the input tensors to an operator or a fused op all have constant shapes, the compile engine first finds the best implementation based on AutoTVM tuning logs. If there is no implementation that is an AutoTVM template or all AutoTVM templates have fallback configs, the implementation with highest priority level will then be chosen. Implementations with same priority level in this case leads to an undefined behavior, and any of them might be selected.

The selection policy for ops with symbolic input shapes is still work in progess. Currently, if any input tensor has a symbolic shape, only the implementation with highest priority level will be used for this operator. This will be updated after the implemention finishes.

For debug purpose, you can add the following lines before you compile the Relay model to learn which implementation is used for each operator.

logging.getLogger("compile_engine").setLevel(logging.INFO)
logging.getLogger("compile_engine").addHandler(logging.StreamHandler(sys.stdout))