tvm.schedule

The computation schedule api of TVM.

class tvm.schedule.IterVar

Represent iteration variable.

IterVar is normally created by Operation, to represent axis iterations in the computation. It can also created by schedule primitives like tvm.schedule.Stage.split.

See also

tvm.thread_axis
Create thread axis IterVar.
tvm.reduce_axis
Create reduce axis IterVar.
class tvm.schedule.Buffer

Symbolic data buffer in TVM.

Buffer provide a way to represent data layout specialization of data structure in TVM.

Do not construct directly, use decl_buffer instead. See the documentation of decl_buffer for more details.

See also

decl_buffer
Declare a buffer
access_ptr(access_mask, ptr_type='handle', content_lanes=1, offset=0)

Get an access pointer to the head of buffer.

This is the recommended method to get buffer data ptress when interacting with external functions.

Parameters:
  • access_mask (int) – The access pattern MASK. Indicate whether the access will read or write to the data content.
  • ptr_type (str, optional) – The data type of the result pointer. Do not specify unless we want to cast pointer to specific type.
  • content_lanes (int, optional) – The number of lanes for the data type. This value is greater than one for vector types.
  • offset (Expr, optional) – The offset of pointer. We can use it to offset by the number of elements from the address of ptr.

Examples

import tvm.schedule.Buffer
# Get access ptr for read
buffer.access_ptr("r")
# Get access ptr for read/write with bitmask
buffer.access_ptr(Buffer.READ | Buffer.WRITE)
# Get access ptr for read/write with str flag
buffer.access_ptr("rw")
# Get access ptr for read with offset
buffer.access_ptr("r", offset = 100)
vload(begin, dtype=None)

Generate an Expr that loads dtype from begin index.

Parameters:
  • begin (Array of Expr) – The beginning index in unit of Buffer.dtype
  • dtype (str) – The data type to be loaded, can be vector type which have lanes that is multiple of Buffer.dtype
Returns:

load – The corresponding load expression.

Return type:

Expr

vstore(begin, value)

Generate a Stmt that store value into begin index.

Parameters:
  • begin (Array of Expr) – The beginning index in unit of Buffer.dtype
  • value (Expr) – The value to be stored.
Returns:

store – The corresponding store stmt.

Return type:

Stmt

tvm.create_schedule(ops)

Create a schedule for list of ops

Parameters:ops (list of Operations) – The source expression.
Returns:sch – The created schedule.
Return type:schedule.Schedule
class tvm.schedule.Schedule

Schedule for all the stages.

cache_read(tensor, scope, readers)

Create a cache read of original tensor for readers.

This will mutate the body of the readers. A new cache stage will be created for the tensor. Call this before doing any split/fuse schedule.

Parameters:
  • tensor (Tensor) – The tensor to be cached.
  • scope (str) – The scope of cached
  • readers (list of Tensor or Operation) – The readers to read the cache.
Returns:

cache – The created cache tensor.

Return type:

Tensor

cache_write(tensor, scope)

Create a cache write of original tensor, before storing into tensor.

This will mutate the body of the tensor. A new cache stage will created before feed into the tensor.

This function can be used to support data layout transformation. If there is a split/fuse/reorder on the data parallel axis of tensor before cache_write is called. The intermediate cache stores the data in the layout as the iteration order of leave axis. The data will be transformed back to the original layout in the original tensor. User can further call compute_inline to inline the original layout and keep the data stored in the transformed layout.

Parameters:
  • tensor (Tensor, list or tuple) – The tensors to be feed to. All the tensors must be produced by one computeOp
  • scope (str) – The scope of cached
Returns:

cache – The created cache tensor.

Return type:

Tensor

create_group(outputs, inputs, include_inputs=False)

Create stage group by giving output and input boundary.

The operators between outputs and inputs are placed as member of group. outputs are include in the group, while inputs are not included.

Parameters:
  • outputs (list of Tensors) – The outputs of the group.
  • inputs (list of Tensors) – The inputs of the group.
  • include_inputs (boolean, optional) – Whether include input operations in the group if they are used by outputs.
Returns:

group – A virtual stage represents the group, user can use compute_at to move the attachment point of the group.

Return type:

Stage

normalize()

Build a normalized schedule from the current schedule.

Insert necessary rebase to make certain iter var to start from 0. This is needed before bound inference and followup step.

Returns:sch – The normalized schedule.
Return type:Schedule
rfactor(tensor, axis, factor_axis=0)

Factor a reduction axis in tensor’s schedule to be an explicit axis.

This will create a new stage that generated the new tensor with axis as the first dimension. The tensor’s body will be rewritten as a reduction over the factored tensor.

Parameters:
  • tensor (Tensor) – The tensor to be factored.
  • axis (IterVar) – The reduction axis in the schedule to be factored.
  • factor_axis (int) – The position where the new axis is placed.
Returns:

tfactor – The created factored tensor.

Return type:

Tensor or Array of Tensor

same_as(other)

check object identity equality

class tvm.schedule.Stage

A Stage represents schedule for one operation.

bind(ivar, thread_ivar)

Bind ivar to thread index thread_ivar

Parameters:
  • ivar (IterVar) – The iteration to be binded to thread.
  • thread_ivar (IterVar) – The thread to be binded.
compute_at(parent, scope)

Attach the stage at parent’s scope

Parameters:
  • parent (Stage) – The parent stage
  • scope (IterVar) – The loop scope t be attached to.
compute_inline()

Mark stage as inline

Parameters:parent (Stage) – The parent stage
compute_root()

Attach the stage at parent, and mark it as root

Parameters:parent (Stage) – The parent stage
double_buffer()

Compute the current stage via double buffering.

This can only be applied to intermediate stage. This will double the storage cost of the current stage. Can be useful to hide load latency.

env_threads(threads)

Mark threads to be launched at the outer scope of composed op.

Parameters:threads (list of threads) – The threads to be launched.
fuse(*args)

Fuse multiple consecutive iteration variables into a single iteration variable.

fused = fuse(…fuse(fuse(args[0], args[1]), args[2]),…, args[-1]) The order is from outer to inner.

Parameters:args (list of IterVars) – Itervars that proceeds each other
Returns:fused – The fused variable of iteration.
Return type:IterVar
opengl()

The special OpenGL schedule

Maps each output element to a pixel.

parallel(var)

Parallelize the iteration.

Parameters:var (IterVar) – The iteration to be parallelized.
pragma(var, pragma_type, pragma_value=None)

Annotate the iteration with pragma

This will translate to a pragma_scope surrounding the corresponding loop generated. Useful to support experimental features and extensions.

Parameters:
  • var (IterVar) – The iteration to be anotated
  • pragma_type (str) – The pragma string to be annotated
  • pragma_value (Expr, optional) – The pragma value to pass along the pragma

Note

Most pragmas are advanced/experimental features and may subject to change. List of supported pragmas:

  • debug_skip_region

    Force skip the region marked by the axis and turn it into no-op. This is useful for debug purposes.

  • parallel_launch_point

    Specify to launch parallel threads outside the specified iteration loop. By default the threads launch at the point of parallel construct. This pragma moves the launching point to even outer scope. The threads are launched once and reused across multiple parallel constructs as BSP style program.

  • parallel_barrier_when_finish

    Insert a synchronization barrier between working threads after the specified loop iteration finishes.

  • parallel_stride_pattern

    Hint parallel loop to execute in strided pattern. for (int i = task_id; i < end; i += num_task)

prefetch(tensor, var, offset)

Prefetch the specified variable

Parameters:
  • tensor (Tensor) – The tensor to be prefetched
  • var (IterVar) – The loop point at which the prefetching is applied
  • offset (Expr) – The number of iterations to be prefetched before actual execution
reorder(*args)

reorder the arguments in the specified order.

Parameters:args (list of IterVar) – The order to be ordered
same_as(other)

check object identity equality

set_scope(scope)

Set the thread scope of this stage

Parameters:scope (str) – The thread scope of this stage
set_store_predicate(predicate)

Set predicate under which store to the array can be performed.

Use this when there are duplicated threads doing the same store and we only need one of them to do the store.

Parameters:predicate (Expr) – The guard condition fo store.
split(parent, factor=None, nparts=None)

Split the stage either by factor providing outer scope, or both

Parameters:
  • parent (IterVar) – The parent iter var.
  • factor (Expr, optional) – The splitting factor
  • nparts (Expr, optional) – The number of outer parts.
Returns:

  • outer (IterVar) – The outer variable of iteration.
  • inner (IterVar) – The inner variable of iteration.

storage_align(axis, factor, offset)

Set alignment requirement for specific axis

This ensures that stride[axis] == k * factor + offset for some k. This is useful to set memory layout to for more friendly memory access pattern. For example, we can set alignment to be factor=2, offset=1 to avoid bank conflict for thread access on higher dimension in GPU shared memory.

Parameters:
  • axis (IterVar) – The axis dimension to be aligned.
  • factor (int) – The factor in alignment specification.
  • offset (int) – The offset in the alignment specification.
tensorize(var, tensor_intrin)

Tensorize the computation enclosed by var with tensor_intrin

Parameters:
  • var (IterVar) – The iteration boundary of tensorization.
  • tensor_intrin (TensorIntrin) – The tensor intrinsic used for computation.
tile(x_parent, y_parent, x_factor, y_factor)

Perform tiling on two dimensions

The final loop order from outmost to inner most are [x_outer, y_outer, x_inner, y_inner]

Parameters:
  • x_parent (IterVar) – The original x dimension
  • y_parent (IterVar) – The original y dimension
  • x_factor (Expr) – The stride factor on x axis
  • y_factor (Expr) – The stride factor on y axis
Returns:

  • x_outer (IterVar) – Outer axis of x dimension
  • y_outer (IterVar) – Outer axis of y dimension
  • x_inner (IterVar) – Inner axis of x dimension
  • p_y_inner (IterVar) – Inner axis of y dimension

unroll(var)

Unroll the iteration.

Parameters:var (IterVar) – The iteration to be unrolled.
vectorize(var)

Vectorize the iteration.

Parameters:var (IterVar) – The iteration to be vectorize