Level 1 API

Tensors

enum [anonymous]

Values:

enumerator CCV_NNC_TENSOR_MEMORY_MAP_EAGER

Load tensor mapped directly.

enumerator CCV_NNC_TENSOR_MEMORY_MAP_ON_DEMAND

Defer tensor map until read on supported devices.

enum [anonymous]

Values:

enumerator CCV_NNC_TENSOR_READ_METADATA_ONLY

Read tensor that data is nil, with only metadata.

typedef int (*ccv_nnc_tensor_io_option_decode_f)(const void *const data, const size_t data_size, const int datatype, const int *const dimensions, const int dimension_count, const unsigned int identifier, void *const context, const ccv_nnc_tensor_param_t tensor_params, ccv_nnc_tensor_t **const tensor_out, void *const decoded, size_t *const decoded_size)

Method to decode tensor into a give buffer.

Param data:

The encoded data that needs to be decoded.

Param data_size:

The size of the encoded data.

Param datatype:

The expected data type of the encoded data.

Param dimensions:

The expected dimension for the data.

Param dimension_count:

The number of dimensions for the data.

Param identifier:

The identifier saved along the encoder (non-zero) that used to identify this decoder.

Param context:

The context associated with this decoder.

Param tensor_params:

The tensor parameters for the final container. This can be different from the expected values above.

Param tensor_out:

The final container for the tensor. It can be nil and you need to initialize it in that case.

Param decoded:

The buffer for data to be decoded.

Param decoded_size:

The size of the buffer to be decoded.

Return:

1 if it is processed, 0 otherwise.

typedef int (*ccv_nnc_tensor_io_option_encode_f)(const void *const data, const size_t data_size, const int datatype, const int *const dimensions, const int dimension_count, void *const context, void *const encoded, size_t *const encoded_size, ccv_nnc_tensor_param_t *const tensor_params, unsigned int *const identifier)

Method to encode tensor into a give buffer.

Param data:

The data that needs to be encoded.

Param data_size:

The size of the data to be encoded.

Param datatype:

The expected data type of the data to be encoded.

Param dimensions:

The expected dimension for the data.

Param dimension_count:

The number of dimensions for the data.

Param context:

The context associated with this encoder.

Param encoded:

The buffer for encoded data.

Param encoded_size:

The size of the buffer.

Param tensor_params:

The tensor parameters that can be modified.

Param identifier:

The identifier identifies this encoder (non-zero).

Return:

1 if it is processed, 0 otherwise.

static inline int ccv_nnc_tensor_nd(const int dim[CCV_NNC_MAX_DIM_ALLOC])

Count the dimensionality of a tensor.

ccv_nnc_tensor_new(const void *const ptr, const ccv_nnc_tensor_param_t params, const int flags)

Create a new tensor.

Parameters:
  • ptr – If 0, nnc will allocate the tensor ourselves. Otherwise, will use the memory region referenced by ‘ptr’.

  • params – Tensor parameters.

  • flags – Reserved flags for the allocation.

Returns:

The newly created tensor.

ccv_nnc_tensor_new_from_file(const ccv_nnc_tensor_param_t params, const char *const filename, const off_t offset, const int flags)

Create a new tensor with data from a file. This will create a mmap tensor if that is preferred.

Parameters:
  • params – Tensor parameters.

  • filename – The file to load tensor content from.

  • offset – The offset to the tensor content from the file.

  • flags – Reserved flags for this loading.

Returns:

The newly created tensor.

ccv_nnc_tensor(const void *const ptr, const ccv_nnc_tensor_param_t params, const int flags)

Create a new tensor on stack.

Parameters:
  • ptr – If 0, nnc will allocate the tensor ourselves. Otherwise, will use the memory region referenced by ‘ptr’.

  • params – Tensor parameters.

  • flags – Reserved flags for the allocation.

Returns:

The tensor struct.

ccv_nnc_tensor_resize(ccv_nnc_tensor_t *const tensor, const ccv_nnc_tensor_param_t params)

Resize an existing tensor to a new dimension.

Parameters:
  • tensor – The old tensor to be resized.

  • params – Tensor parameters.

Returns:

Potentially a new tensor, but if the size is sufficient, it will be in-place operation.

int ccv_nnc_tensor_pin_memory(ccv_nnc_tensor_t *const tensor)

Pin the tensor memory for faster access on GPU.

Parameters:

tensor – A tensor that we want to pin the memory.

Returns:

0 for success.

void ccv_nnc_tensor_free(ccv_nnc_tensor_t *const tensor)

Free a tensor object.

Parameters:

tensor – The tensor to be freed.

ccv_nnc_tensor_view_new(const ccv_nnc_tensor_t *const tensor, const ccv_nnc_tensor_param_t params, const int ofs[CCV_NNC_MAX_DIM_ALLOC], const int stride[CCV_NNC_MAX_DIM_ALLOC])

Create a tensor view. A tensor view can be non-continuous. Essentially, it provides a view into a tensor.

Parameters:
  • tensor – The tensor that we want to view into.

  • params – The tensor parameters for the tensor view.

  • ofs – The offset on each of the dimension.

  • stride – The stride of each dimension.

Returns:

The newly created tensor view.

ccv_nnc_tensor_view(const ccv_nnc_tensor_t *const tensor, const ccv_nnc_tensor_param_t params, const int ofs[CCV_NNC_MAX_DIM_ALLOC], const int stride[CCV_NNC_MAX_DIM_ALLOC])

Create a tensor view on stack.

Parameters:
  • tensor – The tensor that we want to view into.

  • params – The tensor parameters for the tensor view.

  • ofs – The offset on each of the dimension.

  • stride – The line size of each dimension.

Returns:

The tensor view struct.

void ccv_nnc_tensor_view_free(ccv_nnc_tensor_view_t *const tensor_view)

Free a tensor view object.

Parameters:

tensor_view – The tensor view to be freed.

void ccv_nnc_tensor_zero(void *const tensor)

Zero out a given tensor.

Parameters:

tensor – The tensor to be zero out.

ccv_nnc_tensor_eq(const ccv_nnc_tensor_t *const a, const ccv_nnc_tensor_t *const b)

Compare whether two tensors are equal. This will tolerant some floating point issues follow http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm

Parameters:
  • a – Tensor a.

  • b – Tensor b.

Returns:

0 if equal, -1 otherwise.

ccv_nnc_tensor_format_new(const ccv_nnc_tensor_t *const a)

Format a tensor output to string so that it can be used as debug output for other languages. This will look like: [ 0.13, 0.44, 0.24, 0.24 ] And format closely to what numpy looks like.

Parameters:

a – The input tensor, it can be a tensor or a tensor view. It has to be accessible on CPU.

Returns:

An allocated string that you can call ccfree to free it.

int ccv_nnc_tensor_write(const ccv_nnc_tensor_t *const tensor, void *const handle, const char *const name, const ccv_nnc_tensor_io_option_t *const options)

Write tensor to a SQLite database with a given name.

Parameters:
  • tensor – The tensor.

  • handle – The SQLite handle.

  • name – The name to find the tensor in the database.

  • options – If provided, we will use this to encode tensor data.

Returns:

CCV_IO_FINAL for success, otherwise error.

int ccv_nnc_tensor_read(void *const handle, const char *const name, const ccv_nnc_tensor_io_option_t *const options, const int flags, const ccv_nnc_tensor_param_t *const tensor_params, ccv_nnc_tensor_t **const tensor_out)

Read a tensor from a SQLite database with a given name.

Parameters:
  • handle – The SQLite handle.

  • name – The name to find the tensor in the database.

  • options – If provided, we will use this to decode any data that identifier != 0.

  • flags – Additional flag to configure how we read tensor.

  • tensor_params – If provided, we will use this to create the tensor if tensor_out is not provided.

  • tensor_out – The pointer to hold the tensor. If you supply the tensor yourself, we will read the data into the existing tensor.

Returns:

CCV_IO_FINAL for success, otherwise error.

struct ccv_nnc_tensor_io_option_t
#include <ccv_nnc.h>

Additional options to regulate tensor write / read behavior. For example, you can pass encryptor / compressor to encrypt / compress the data prior to write to disk. You can also only store reference, and use external storage for tensors.

Commands

enum [anonymous]

Values:

enumerator CCV_NNC_CMD_ATTR_PASSTHROUGH

This doesn’t compute anything, but pass the first n tensors to the output (useful for backprop that is identical).

enumerator CCV_NNC_CMD_ATTR_OUTPUT_ONES

All the output tensors are 1s (unit).

enumerator CCV_NNC_CMD_ATTR_NULL_IS_ONES

Accept nullptr input as if these are tensors with 1s (unit).

enum [anonymous]

Values:

enumerator CCV_NNC_ACCUMULATE_OUTPUT

Enable accumulate outputs (unsupported).

enumerator CCV_NNC_ZERO_MEMORY_ALLOC

Don’t allocate any extra memory for this operation.

enum [anonymous]

Values:

enumerator CCV_NNC_EXEC_SUCCESS

Successfully executed the command.

enumerator CCV_NNC_EXEC_INVALID

Invalid inputs.

enumerator CCV_NNC_EXEC_NO_KERNEL

No kernel available for a given command / backend.

enumerator CCV_NNC_EXEC_OOM

Out of memory error.

enum [anonymous]

Values:

enumerator CCV_NNC_MSE_REDUCE_MEAN

Reduce with mean when computing MSE loss.

enumerator CCV_NNC_MSE_REDUCE_SUM

Reduce with sum when computing MSE loss.

enum [anonymous]

Values:

enumerator CCV_NNC_HISTOGRAM_EVEN

The bins are evenly distributed from min to max.

enumerator CCV_NNC_HISTOGRAM_LOGARITHMIC

The bins are distributed follow exponentially curve, growing from min to max with ratio.

enumerator CCV_NNC_HISTOGRAM_BINS

The bins range will be supplied, such as [0, 2, 3, 10]. For result, [-inf, 0, 2, 3, 10, inf] implied.

enum [anonymous]

Values:

enumerator CCV_NNC_UPSAMPLE_NEAREST

Using nearest value.

enumerator CCV_NNC_UPSAMPLE_BILINEAR

Using bilinear interpolation.

enum [anonymous]

Values:

enumerator CCV_NNC_PAD_ZERO

Pad 0s.

enumerator CCV_NNC_PAD_REPLICATE

Pad by replicating the edge.

typedef struct ccv_nnc_stream_context_s ccv_nnc_stream_context_t

Opaque pointer to a stream object.

typedef struct ccv_nnc_cmd_vtab_s ccv_nnc_cmd_vtab_t

The function prototype is for automatically deduce tensor shapes.

typedef struct ccv_nnc_cmd_s ccv_nnc_cmd_t
typedef int (*ccv_nnc_cmd_exec_f)(const ccv_nnc_cmd_t cmd, const ccv_nnc_hint_t hint, const int flags, ccv_nnc_tensor_t *const *const inputs, const int input_size, ccv_nnc_tensor_t *const *const outputs, const int output_size, ccv_nnc_stream_context_t *const stream_context)

For forward functions, the input tensors and output tensors can be arbitrary. However, for backward functions (backpropagation, or gradient functions in other libs), the input is: 0~m-1: gradient for output tensors, 1~n: input tensors for forward functions, n+1~n+m: output tensors for forward functions, the output is: 0~n-1: output gradients w.r.t. input tensors. Which input / output tensors can be ignored can be specified in the cmd config structs.

typedef int (*ccv_nnc_cmd_autotune_f)(const ccv_nnc_cmd_t cmd, const size_t max_workspace_size, const ccv_nnc_hint_t hint, const int flags, ccv_nnc_tensor_t *const *const inputs, const int input_size, ccv_nnc_tensor_t *const *const outputs, const int output_size, ccv_nnc_stream_context_t *const stream_context)

The function prototype for autotune. The only difference is the max_workspace_size. Whoever implement this function prototype means we handled over autotune task to the command itself, you are responsible to select the best algorithm.

Return:

The selected algorithm.

uint64_t ccv_nnc_cmd_mono_time(void)

Return a high precision time unit. What this time unit is is platform specific.

Returns:

A monotonic increasing 64-bit integer w.r.t. passing of time.

ccv_nnc_cmd_name(const uint32_t cmd)

Return UTF-8 encoded name of a given command.

Returns:

A UTF-8 string (pointing to a static constant).

ccv_nnc_cmd_backend_name(const uint32_t backend)

Return UTF-8 encoded name of a given backend.

Returns:

A UTF-8 string (pointing to a static constant).

ccv_nnc_cmd_ok(const uint32_t cmd, const uint32_t backend)

Check whether a given backend is available for a given command.

Returns:

1 if it is available.

ccv_nnc_cmd(const uint32_t cmd, ccv_nnc_cmd_vtab_t *const isa, const ccv_nnc_cmd_param_t params, const int flags)

Create a wrapped command with parameters.

Parameters:
  • cmd – The command identifier.

  • isa – If this is a CCV_NNC_CUSTOM_FORWARD / CCV_NNC_CUSTOM_BACKWARD command, this supplies the custom functions.

  • params – The parameters for the command.

  • flags – A reserved field for flags.

Returns:

A wrapped ccv_nnc_cmd_t structure.

ccv_nnc_hint_verify(const ccv_nnc_hint_t hint, const ccv_nnc_cmd_param_t cmd, const ccv_nnc_tensor_param_t a, const ccv_nnc_tensor_param_t b)

Verify whether a hint is compatible with a given command and a given input tensor parameters / output tensor parameters.

Parameters:
  • hint – The hint for a given command. Hint defines things such as paddings, strides etc. for a given command.

  • cmd – The wrapped command.

  • a – The input tensor parameters.

  • b – The output tensor parameters.

Returns:

1 if it passes.

ccv_nnc_hint_auto(const ccv_nnc_cmd_param_t cmd, const ccv_nnc_tensor_param_t a, const ccv_nnc_tensor_param_t b)

Automatically find the best hint for a given input / output (on forward pass only).

Parameters:
  • cmd – The wrapped command.

  • a – The input tensor parameters.

  • b – The output tensor parameters.

Returns:

Best hint we can guess.

void ccv_nnc_hint_tensor_auto(const ccv_nnc_cmd_t cmd, const ccv_nnc_tensor_param_t *const inputs, const int input_size, const ccv_nnc_hint_t hint, ccv_nnc_tensor_param_t *const outputs, const int output_size)

Automatically find the outputs for the given inputs / hint.

Parameters:
  • cmd – The wrapped command.

  • inputs – An array of input tensor parameters.

  • input_size – The size of input array.

  • hint – The hint for the given command.

  • outputs – An array for the output tensor parameters.

  • output_size – The size of the output array.

ccv_nnc_cmd_find_backend(const ccv_nnc_cmd_t cmd, const int tensor_memory, const int tensor_formats, const int tensor_datatypes)

Find a suitable backend for a given command and tensor settings.

Parameters:
  • cmd – The wrapped command.

  • tensor_memory – The tensor memory setup (whether it is CPU or GPU).

  • tensor_formats – The tensor layout format (NCHW, NHWC, CHWN etc.)

  • tensor_datatypes – The datatype of a given tensor (FP32 etc.)

Returns:

The backend identifier for the selected backend.

ccv_nnc_cmd_autotune(const ccv_nnc_cmd_t cmd, const size_t max_workspace_size, const ccv_nnc_hint_t hint, const int flags, ccv_nnc_tensor_t *const *const inputs, const int input_size, ccv_nnc_tensor_t *const *const outputs, const int output_size, ccv_nnc_stream_context_t *const stream_context)

Run autotune to find the best kernel and configuration for the given input.

Parameters:
  • cmd – The original wrapped command.

  • max_workspace_size – The maximum memory allowed for this command to execute.

  • hint – The hint for the given command.

  • flags – The reserved field for flags.

  • inputs – An array of input tensors.

  • input_size – The size of input array.

  • outputs – An array of output tensors.

  • output_size – The size of output array.

  • stream_context – The stream we can do the autotune on. 0 uses default stream.

Returns:

The modified cmd that contains the updated configuration.

ccv_nnc_cmd_bitmask(const ccv_nnc_cmd_t cmd, const int input_size, const int output_size, const uint64_t *const input_bitmasks, const int input_bitmask_size, const uint64_t *const output_bitmasks, const int output_bitmask_size)

Check whether a given tensor input / output pattern can be computed by the given command. bitmasks encode whether a given input tensor / output tensor available at a position.

Parameters:
  • cmd – The wrapped command to check.

  • input_size – The intended size of the input tensor array.

  • output_size – The intended size of the output tensor array.

  • input_bitmasks – The input tensor array encoding in bitmap, 0: no tensor, 1: has a tensor.

  • input_bitmask_size – The size of the input bitmask array.

  • output_bitmasks – The output tensor array encoding in bitmap.

  • output_bitmask_size – The size of the output bitmask array.

Returns:

1 if the command can be executed with the given input / output pattern.

ccv_nnc_cmd_aux(const ccv_nnc_cmd_t cmd)

Return auxillary information related to a particular command with a particular backend. A backend is required to be useful for this method.

Parameters:

cmd – The wrapped cmmand to check auxillary information.

Returns:

The auxillary information specific to a particular command with a particular backend.

int ccv_nnc_cmd_exec(const ccv_nnc_cmd_t cmd, const ccv_nnc_hint_t hint, const int flags, ccv_nnc_tensor_t *const *const inputs, const int input_size, ccv_nnc_tensor_t *const *const outputs, const int output_size, ccv_nnc_stream_context_t *const stream_context)

Execute a given command.

Parameters:
  • cmd – The wrapped command to be executed.

  • hint – The hint provided for the command.

  • flags – A reserved field for flags.

  • inputs – The input tensor array.

  • input_size – The size of input tensor array.

  • outputs – The output tensor array.

  • output_size – The size of output tensor array.

  • stream_context – The stream which the command will be executed upon.

Returns:

CCV_NNC_EXEC_SUCCESS if succeed.

ccv_nnc_cmd_is_forward(const ccv_nnc_cmd_t cmd)

Check whether the command is a forward pass or not.

Parameters:

cmd – The wrapped command.

Returns:

1 if it is a forward pass.

ccv_nnc_cmd_is_backward(const ccv_nnc_cmd_t cmd)

Check whether the command is a backward pass or not.

Parameters:

cmd – The wrapped command.

Returns:

1 if it is a backward pass.

ccv_nnc_cmd_attr(const ccv_nnc_cmd_t cmd, const int flags)

Check this command against listed attributes.

Parameters:
  • cmd – The wrapped command.

  • flags – The flags to check against the command (unsupported).

Returns:

1 if the flag is supported by the command.

ccv_nnc_cmd_allow_inplace(const ccv_nnc_cmd_t cmd, const int input_idx, const int input_size, const int output_idx, const int output_size)

Check whether this command allow inplace operation against a particular input and output (index from 0).

Parameters:
  • cmd – The wrapped command.

  • input_idx – The index of the input tensor we want to check.

  • input_size – The total number of inputs.

  • output_idx – the index of the output tensor we want to check.

  • output_size – The total number of outputs.

Returns:

1 if the input tensor can be used as the output tensor.

ccv_nnc_cmd_enforce_inplace(const ccv_nnc_cmd_t cmd, const int input_idx, const int input_size, const int output_idx, const int output_size)

Check whether this command need to enforce inplace operation against a particular input and output (index from 0).

Parameters:
  • cmd – The wrapped command.

  • input_idx – The index of the input tensor we want to check.

  • input_size – The total number of inputs.

  • output_idx – the index of the output tensor we want to check.

  • output_size – The total number of outputs.

Returns:

1 if the input tensor is required to be used as the output tensor.

void ccv_nnc_set_profiler(int state)

Set for a profiler to be on or off. Right now, this just proxy call on to cudaProfilerStart / cudaProfilerStop.

Parameters:

state – 1 is on, 0 is off.

void ccv_nnc_set_memory_efficient(int state)

When have choices between doing things, prefer to be more memory efficient and take performance hit. This is relevant to MPSGraph because if we dispatch all command buffers at full speed, we risk of holding a lot of resources up until all of them executed. Alternatively, we can wait previous one done before proceed, with obvious performance penalties.

Parameters:

state – 1 is on, 0 is off. Default to off.

ccv_nnc_palettize(const void *input, const int datatype, const int memory_type, const size_t input_length, const int qbits, const int number_in_blocks, void *output, const size_t output_length)

Quantize a given memory region of a given datatype / memory resides, into nbits palette.

Parameters:
  • input – The input memory region, it can be CCV_64F, CCV_32F or CCV_16F.

  • datatype – The datatype, it can be CCV_64F, CCV_32F or CCV_16F.

  • memory_type – Where the memory resides. Right now only support CPU_MEMORY.

  • input_length – How many elements in the input.

  • qbits – How many bits for the palette. Right now only 4 / 5 / 6 / 7 / 8 bits supported.

  • number_in_blocks – How many elements share a palette.

  • output – The output memory region.

  • output_length – The maximum size of the output.

Returns:

The actual length in bytes of the output.

void ccv_nnc_depalettize(const void *input, const int datatype, const int memory_type, const size_t input_length, const int qbits, const int number_in_blocks, void *output, const size_t output_length)

Dequantize a given memory region of a given datatype / memory resides, from built-in nbits palette.

Parameters:
  • input – The input memory region.

  • datatype – The datatype, it can be CCV_64F, CCV_32F or CCV_16F.

  • memory_type – Where the memory resides. It can be either CPU_MEMORY or GPU_MEMORY.

  • input_length – The size of the input in bytes.

  • qbits – How many bits for the palette. Right now only 4 / 5 / 6 / 7 / 8 bits supported.

  • number_in_blocks – How many elements share a palette.

  • output – The output memory region, it can be CCV_64F, CCV_32F or CCV_16F.

  • output_length – How many elements in the output.

struct ccv_nnc_cmd_param_t
#include <ccv_nnc.h>

Parameters for command.

Public Members

int dim[CCV_NNC_MAX_DIM_ALLOC]

[size.dim] The window size for the layer. For full connect layer, it is 1 because it is 1x1 convolutional layer with count of filters

int count

[convolution.count] The number of filters for convolutional layer.

[convolution_transpose.count] The number of filters for convolutional layer.

[bnorm.count] The number of axis selected.

[lnorm.count] The number of axis selected.

[rmsnorm.count] The number of axis selected.

[reduce.count] The number of axis selected.

int groups

[convolution.groups] The number of groups for convolutional layer.

[convolution_transpose.groups] The number of groups for convolutional layer.

[gnorm.group] The number of groups that separates channels.

int dilation[CCV_NNC_MAX_DIM_ALLOC]

[convolution.dilation[]] The dilation factor for convolutional layer. Default to 1.

[convolution_transpose.dilation[]] The dilation factor for convolutional layer. Default to 1.

int output_padding

[convolution_transpose.output_padding] The output padding to resolve ambiguity when treat this as inverse of convolution.

int hidden_size

[rnn.hidden_size] The number of features in the hidden state h.

int proj_size

[rnn.proj_size] The number of features in the hidden state h.

int num_layers

[rnn.num_layers] The number of layers for RNN.

int bias

[rnn.bias] If 0, the layer won’t use bias weights.

int batch_first

[rnn.batch_first] If 1, will batch before sequence.

int bidirectional

[rnn.bidrectional] Enable bidirectional mode of RNN.

float dropout

[rnn.dropout] If non-zero, enable dropout at each layer of RNN.

int is_test

[rnn.is_test] Whether running this kernel in test mode or not.

[bnorm.is_test] Whether in test mode.

int reserved

[pool.reserved] A reserved field.

float kappa

[rnorm.kappa] As of b[i] = a[i] / (rnorm.kappa + rnorm.alpha * sum(a, i - rnorm.size / 2, i + rnorm.size / 2)) ^ rnorm.beta

float alpha

[rnorm.alpha] See **rnorm.kappa**.

[rmsprop.momentum] The alpha hyper-parameter.

float beta

[rnorm.beta] See **rnorm.kappa**.

[smooth_l1.beta] The beta on the smooth L1 loss (or Huber loss)

int axis[CCV_NNC_MAX_DIM_ALLOC]

[bnorm.axis[]] The axis selected to compute mean / variance.

[lnorm.axis[]] The axis selected to compute mean / variance.

[rmsnorm.axis[]] The axis selected to compute mean / variance.

[reduce.axis[]] The axis selected to reduce.

[transpose.axis[2]] The axis we’d like to transpose for input.

float epsilon

[bnorm.epsilon] The epsilon for standard derivation.

[lnorm.epsilon] The epsilon for standard derivation.

[gnorm.epsilon] The epsilon for standard derivation.

[rmsnorm.epsilon] The epsilon for standard derivation.

[adam.epsilon] The epsilon for standard derivation.

[lamb.epsilon] The epsilon for standard derivation.

[rmsprop.epsilon] The epsilon for standard derivation.

float momentum

[bnorm.momentum] running_mean = running_mean * momentum + mean * (1 - momentum).

[sgd.momentum] For SGD, this follows http://www.cs.toronto.edu/%7Ehinton/absps/momentum.pdf.

[rmsprop.momentum] The momentum hyper-parameter.

int elementwise_affine

[lnorm.elementwise_affine] Whether it supports scale / bias.

int group_axis

[gnorm.group_axis] The axis selected to be grouped.

int reduce_axis[CCV_NNC_MAX_DIM_ALLOC]

[gnorm.reduce_axis[]] The other axis selected to compute mean / variance.

int reduce_count

[gnorm.reduce_count] The number of other axis selected.

int nesterov

[sgd.nesterov] Nesterov accelerated gradient.

float rate

[sgd.rate] The learning rate.

[adam.rate] The learning rate.

[lamb.rate] The learning rate.

[rmsprop.rate] The learning rate.

[histogram.ratio] The rate from min to max, only applied to logarithmic.

float scale

[sgd.scale] The scale to be applied to the gradient before doing any minimization.

[adam.scale] The scale to be applied to the gradient before doing any minimization.

[lamb.scale] The scale to be applied to the gradient before doing any minimization.

[rmsprop.scale] The scale to be applied to the gradient before doing any minimization.

[scaled_dot_product_attention.scale] The scale we multiple to the dot product of Q & K

float decay

[sgd.decay] This is the weight decay parameter, which represents L2 regularization after momentum applied.

[adam.decay] This is the weight decay parameter, which represents L2 regularization.

[lamb.decay] This is the weight decay parameter, which represents L2 regularization.

[rmsprop.decay] This is the weight decay parameter, which represents L2 regularization after momentum applied.

float dampening

[sgd.dampening] This usually == momentum, however, it can be changed.

int step

[adam.step] Step t in adam optimizer.

[lamb.step] Step t in lamb optimizer.

float beta1

[adam.beta1] The beta1 hyper-parameter in adam optimizer.

[lamb.beta1] The beta1 hyper-parameter in lamb optimizer.

float beta2

[adam.beta2] The beta2 hyper-parameter in adam optimizer.

[lamb.beta2] The beta2 hyper-parameter in lamb optimizer.

int amsgrad

[adam.amsgrad] Whether use amsgrad.

int transpose_a[2]

[blas.transpose_a[2]] The axis we’d like to transpose for input a.

int transpose_b[2]

[blas.transpose_b[2]] The axis we’d like to transpose for input b.

float a[3]

[blas.a[3]] BLAS scalars.

float trim0

[label_smoothing.trim0] The smoothed label for 0.

float trim1

[label_smoothing.trim1] The smoothed label for 1.

float pos_weight

[binary_crossentropy.pos_weight] The pos_weight on the loss: -(pos_weight * y * log(x) + (1 - y) * log(1 - x))

int reduce_op

[mse.reduce_op] Whether reduce with mean or with sum

int tanh

[gelu.tanh] Use tanh approximation

float p

[dropout.p] Dropout probability.

int entirety

[dropout.entirety] Drop the whole layer with the given probability.

int type

[upsample.type] 0 - nearest, 1 - bilinear.

[histogram.type] The type, can be even, logarithmic, or bins.

[pad.type] The type of pad, can be either zeros or replicating edge.

float width_scale

[upsample.width_scale] scale for width parameter. It is between 1 and 2 at the moment.

float height_scale

[upsample.height_scale] scale for height parameter. It is between 1 and 2 at the moment.

int align_corners

[upsample.align_corners] Whether to scale to align corners. Thus, for 0…1, if false, it will align to -0.25, 0.25, 0.75, 1.25, if true, it will align to 0, 0.3333, 0.6666, 1.0

float min

[clamp.min] The minimum, NaN is no min.

[histogram.min] The minimal number, for even or logarithmic.

float max

[clamp.max] The maximum, NaN is no max.

[histogram.min] The maximal number, for even or logarithmic.

float iou_threshold

[nms.iou_threshold] Threshold between 0 to 1 for IoU threshold.

int bins

[histogram.bins] The number of bins, only applied to even.

float negative_slope

[leaky_relu.negative_slop] The negative slope to be applied when activation < 0.

int is_causal

[scaled_dot_product_attention.is_causal] Whether we have causal matrix associated with the attention. The attention mask will be cut to triangular if provided.

int upcast

[scaled_dot_product_attention.upcast] Whether we want to run the attention computation at higher precision (from FP16 to FP32).

int end[CCV_NNC_MAX_DIM_ALLOC]

[pad.end] Work together with size.dim. size.dim is how much to add at the beginning and pad.end is how much to add at the end.

struct ccv_nnc_hint_t

Public Members

int dim[CCV_NNC_MAX_DIM_ALLOC]

Stride for each dimension.

int begin[CCV_NNC_MAX_DIM_ALLOC]

Padding at the beginning of a dimension.

int end[CCV_NNC_MAX_DIM_ALLOC]

Padding at the end of a dimension.

struct ccv_nnc_cmd_s

Public Members

uint32_t cmd

The identifier for command.

uint32_t backend

The identifier for backend.

int algorithm

The algorithm selector (as defined by backend).

ccv_nnc_cmd_param_t info

The command parameters.

ccv_nnc_cmd_vtab_t *isa

This is for type CCV_NNC_CUSTOM_FORWARD / CCV_NNC_CUSTOM_BACKWARD

struct ccv_nnc_cmd_vtab_s
#include <ccv_nnc.h>

The function prototype is for automatically deduce tensor shapes.

Streams

enum [anonymous]

Values:

enumerator CCV_STREAM_CONTEXT_CPU

A CPU based stream context (unsupported).

enumerator CCV_STREAM_CONTEXT_GPU

A GPU based stream context.

typedef void (*ccv_nnc_callback_f)(void *const callback_context)

The callback prototype on the stream context.

typedef void (*ccv_nnc_stream_context_destructor_f)(const ccv_nnc_stream_context_t *const stream, void *const context)

The hooks to be called when a stream context is destroyed. At the moment, the stream context will be destroyed at the time ccv_nnc_stream_context_free is called, so there is no tricks. This method is useful because we have some resources associated with stream pointer, hence, would be good to free these resources upon free the stream.

typedef struct ccv_nnc_stream_signal_s ccv_nnc_stream_signal_t

Opaque pointer to the signal object.

typedef ccv_nnc_stream_context_t *(*ccv_nnc_stream_context_neighbor_discovery_f)(const int device_id, void *const context)

The neighbor discovery function that will be called with the device id.

ccv_nnc_stream_context_new(const int type)

Create a new stream context.

Parameters:

type – A combination of CPU / GPU and DEVICE_ID.

Returns:

The newly created stream context.

ccv_nnc_stream_context_type(const ccv_nnc_stream_context_t *const stream_context)

Get the type of the stream context.

Parameters:

stream_context – The stream context we want to inspect.

Returns:

The type of the stream context.

ccv_nnc_stream_context_get_workspace(ccv_nnc_stream_context_t *const stream_context, const size_t workspace_size, const int mem)

Get a stream context local workspace memory. This memory region will be reused the next time when you call this method on the same stream context.

Parameters:
  • stream_context – The stream context which provides the workspace memory.

  • workspace_size – The size of the workspace memory.

  • mem – The memory type of the said workspace memory (GPU or CPU).

Returns:

A pointer to the workspace memory.

void ccv_nnc_stream_context_drain(ccv_nnc_stream_context_t *const stream)

Deallocate any workspace memory on the stream context.

Parameters:

stream – The stream context to drain workspace memory.

void ccv_nnc_stream_context_add_callback(ccv_nnc_stream_context_t *const stream, const ccv_nnc_callback_f callback, void *const callback_context)

Add a callback function to be called once stream executed to that point.

Parameters:
  • stream – The stream context to add callback.

  • callback – The callback function.

  • callback_context – The context to be called with the callback function.

void ccv_nnc_stream_context_wait(const ccv_nnc_stream_context_t *const stream)

Wait until all tasks submitted (command, graph run etc.) on the stream context completed.

Parameters:

stream – The stream context to wait.

int ccv_nnc_stream_context_add_destructor_hook(ccv_nnc_stream_context_t *const stream, ccv_nnc_stream_context_destructor_f destructor, void *const context)

Add a new destructor hook callback when a stream is freed.

Parameters:
  • stream – The stream to be observed.

  • destructor – The new destructor callback method.

  • context – additional context.

Returns:

A integer identifier to help remove the hook.

void ccv_nnc_stream_context_remove_destructor_hook(ccv_nnc_stream_context_t *const stream, const int hook_id)

Remove a destructor hook callback.

Parameters:
  • stream – The stream we observe.

  • hook_id – The returned integer when calling the add method.

void ccv_nnc_stream_context_free(ccv_nnc_stream_context_t *const stream_context)

Deallocate the stream context.

Parameters:

stream_context – The stream context to be destroyed.

void ccv_nnc_stream_context_set_seed(ccv_nnc_stream_context_t *const stream_context, uint32_t seed)

Set random seed for stream context.

Parameters:
  • stream_context – The stream context to set the seed. 0 means use the default stream context.

  • seed – The seed for the stream context.

uint32_t ccv_nnc_stream_context_genrand_uint32(ccv_nnc_stream_context_t *const stream_context)

Generate uint32_t random number for stream context. These are usually used as seed for other high-performance random number generators.

Parameters:

stream_context – The stream context associated with random number generation.

ccv_nnc_stream_signal_new(const int type)

Create a new stream signal.

Parameters:

type – A composed type denotes whether it associated with a GPU or CPU stream context, and on which device.

Returns:

The newly created stream signal.

ccv_nnc_stream_signal_type(const ccv_nnc_stream_signal_t *const signal)

Get the type of the stream signal.

Parameters:

signal – The stream signal we want to inspect.

Returns:

The type of the stream signal.

void ccv_nnc_stream_context_emit_signal(ccv_nnc_stream_context_t *const stream, ccv_nnc_stream_signal_t *const signal)

Emit a signal on a stream.

Parameters:
  • stream – The stream context where the signal will be emitted.

  • signal – The signal to be emitted. It has to be on the same device as the stream.

ccv_nnc_stream_signal_t *ccv_nnc_stream_context_emit_signal_new(ccv_nnc_stream_context_t *const stream)

Emit a signal on a stream directly. It will be managed by the stream. You have to use it immediately after return.

Parameters:

stream – The stream context where the signal will be emitted.

Returns:

The new signal emitted on the stream context.

void ccv_nnc_stream_context_wait_signal(const ccv_nnc_stream_context_t *const stream, const ccv_nnc_stream_signal_t *const signal)

Wait a signal on a stream.

Parameters:
  • stream – The stream context that will be blocked by the signal.

  • signal – The signal to be waited. It can be on a different device of the stream.

ccv_nnc_stream_signal_get_emitter(const ccv_nnc_stream_signal_t *const signal)

Get on which stream context this signal is going to be emitted on.

Parameters:

signal – The signal we want to inspect.

Returns:

The most recent stream context you called ccv_nnc_stream_context_emit_signal with.

void ccv_nnc_stream_signal_free(ccv_nnc_stream_signal_t *const signal)

Deallocate the signal.

Parameters:

signal – The signal to be destroyed.

ccv_nnc_device_count(const int type)

Return number of devices.

Parameters:

type – The type of devices (CCV_NNC_STREAM_CONTEXT_GPU / CCV_NNC_STREAM_CONTEXT_CPU)

Returns:

The number of devices.

ccv_nnc_device_remap(const int type, const int source, const int destination)

Remap a source device as the destination device.

Parameters:
  • type – The type of devices (CCV_NNC_STREAM_CONTEXT_GPU / CCV_NNC_STREAM_CONTEXT_CPU)

  • source – The original device id.

  • destination – The new device id.

Returns:

0 if the device remap is successful, -1 if it is not.

void ccv_nnc_stream_context_set_neighbor_discovery(ccv_nnc_stream_context_t *const stream_context, ccv_nnc_stream_context_neighbor_discovery_f discovery, void *const context)

Set the neighbor stream context discovery mechanism. This method exposes how neighbor should be defined per stream context. This method is useful for commands that operates cross devices and need to find the correct stream context for these devices. Stream context itself is bounded to one device only.

Parameters:
  • stream_context – The stream context that bounds to a discovery mechanism.

  • discovery – The neighbor discovery function to invoke.

  • context – The associated context with the neighbor discovery function.

ccv_nnc_stream_context_find_neighbor(ccv_nnc_stream_context_t *const stream_context, const int device_id)

Find a neighbor stream context on a given device id for current stream context.

Parameters:
  • stream_context – The stream context which we will look for neighbors.

  • device_id – On which device the stream context may exist.

Returns:

0 if no stream context found. Otherwise, return the stream context on that device.

CCV_STREAM_GET_CONTEXT(type)
CCV_STREAM_GET_DEVICE(type)
CCV_STREAM_GET_DEVICE_ID(type)
CCV_STREAM_SET_DEVICE_ID(type, device_id)

Micro Ops

enum [anonymous]

Values:

enumerator CCV_NNC_MICRO_UNARY_OP_NEG
enumerator CCV_NNC_MICRO_UNARY_OP_LOG
enumerator CCV_NNC_MICRO_UNARY_OP_EXP
enum [anonymous]

Values:

enumerator CCV_NNC_MICRO_BINARY_OP_PLUS
enumerator CCV_NNC_MICRO_BINARY_OP_MINUS
enumerator CCV_NNC_MICRO_BINARY_OP_MUL
enumerator CCV_NNC_MICRO_BINARY_OP_DIV
enumerator CCV_NNC_MICRO_BINARY_OP_MAX
enumerator CCV_NNC_MICRO_BINARY_OP_MIN
enumerator CCV_NNC_MICRO_BINARY_OP_EQUAL_TO
enumerator CCV_NNC_MICRO_BINARY_OP_LESS_THAN
enum [anonymous]

Values:

enumerator CCV_NNC_MICRO_REDUCE_OP_MAX
enumerator CCV_NNC_MICRO_REDUCE_OP_MIN
enumerator CCV_NNC_MICRO_REDUCE_OP_ARGMAX
enumerator CCV_NNC_MICRO_REDUCE_OP_ARGMIN
enumerator CCV_NNC_MICRO_REDUCE_OP_MEAN
enumerator CCV_NNC_MICRO_REDUCE_OP_SUM
enumerator CCV_NNC_MICRO_REDUCE_OP_PROD
typedef struct ccv_nnc_micro_io_vtab_s ccv_nnc_micro_io_vtab_t

Abstract vtab for different ccv_nnc_micro_io_t.

typedef struct ccv_nnc_micro_io_s *ccv_nnc_micro_io_t

Abstract micro op representation.

typedef struct ccv_nnc_micro_combine_s ccv_nnc_micro_combine_t

The combined op from micro ops.

ccv_nnc_micro_input(const int dimensions)

Create a free-form input that represent a tensor.

Parameters:

dimensions – The maximum dimension of the input.

ccv_nnc_micro_reindex(const char *const *const shape, const int shape_count, const ccv_nnc_micro_io_t *const ss, const int s_count, const char *const *const reindex, const int reindex_count, const ccv_nnc_micro_io_t x)

Use shape and reindex expression to reindex the given tensor into a different shape. The expressions can bind integer parameters which starts with $.

The expression follows specific pattern, integer parameters starts with $. Dimensions are represented as dXn, such as dA0, dA1, dA2 … Index into the provided tensor can be represented as i0, i1, i2. These are all 0-indexed.

Constants are supported, such as 235, 431 etc. Operators supported currently are -, +, /, *.

Thus, for broadcast a tensor x[w, h] to y[w, h, h], it can be represented as: shape: { “dA0”, “dA1”, “dA1” }, reindex: { “i0”, “i1”, “0” }. For example, transpose can be represented as: shape: { “dA1”, “dA0” }, reindex: { “i1”, “i0” }

Parameters:
  • shape – The shape expressions per axis.

  • shape_count – The dimensions of the output.

  • ss – The tensors to reference shape dimensions.

  • s_count – The number of tensors to reference shape dimensions.

  • reindex – The reindex expressions per axis.

  • reindex_count – The dimensions of the input.

  • x – The input for reindex operation.

Returns:

The reindexed tensor.

ccv_nnc_micro_unary(const uint32_t op, const ccv_nnc_micro_io_t x)

Apply element-wise computations with one tensor.

Parameters:
  • op – The binary operand.

  • x – The input.

Returns:

The result tensor.

ccv_nnc_micro_binary(const uint32_t op, const ccv_nnc_micro_io_t left, const ccv_nnc_micro_io_t right)

Apply pair-wise computations with two tensors. They has to match shape exactly.

Parameters:
  • op – The binary operand.

  • left – The left input.

  • right – The right input.

Returns:

The result tensor.

ccv_nnc_micro_reduce(const uint8_t op, const int *const axis, const int axis_count, const ccv_nnc_micro_io_t x)

Apply reduction computation against some dimensions and generate the final reduced tensor.

Parameters:
  • op – The reduction operand.

  • axis – The axis to reduce.

  • axis_count – Number of axes.

  • x – The input tensor.

Returns:

The result tensor after reduction.

ccv_nnc_micro_select(const int axis, const ccv_nnc_micro_io_t x, const ccv_nnc_micro_io_t index)

Use the index tensor to select one value from the x per axis.

Parameters:
  • axis – The axis to select.

  • x – The tensor to be indexed.

  • index – The integer tensor of indexes.

Returns:

The result tensor with values selected from x with index from index tensor.

ccv_nnc_micro_grad(const ccv_nnc_micro_io_t x)

Return the gradient for a particular output. For example, if x is ccv_nnc_micro_unary(exp, input), this represents the gradient of x, not the input. This method is used to generate representation of gradients for ccv_nnc_micro_combine_new method.

Parameters:

x – The tensor to take a gradient of.

Returns:

The result tensor that represents the gradient of x.

ccv_nnc_micro_combine_new(const ccv_nnc_micro_io_t *const inputs, const int input_size, const char *const *const parameters, const int parameter_size, const ccv_nnc_micro_io_t *const outputs, const int output_size, const ccv_nnc_micro_io_t *const ingrads, const int ingrad_size, const ccv_nnc_micro_io_t *const outgrads, const int outgrad_size)

Combine micro ops into one, and do some optimization passes. The combined one can be then processed to generate optimized kernels. Particularly, we can processed the combined one into C code and CUDA code as reference implementations.

Parameters:
  • inputs – The inputs for the combined ops.

  • input_size – The number of the inputs.

  • parameters – The name of the parameters, this determines the order of the these parameters.

  • parameter_size – The number of parameters.

  • outputs – The outputs for the combined ops.

  • output_size – The number of the outputs.

  • ingrads – The gradient inputs for the combined ops, including any inputs / outputs if there are any.

  • ingrad_size – The number of ingrads.

  • outgrads – The gradient outputs for the combined ops.

  • outgrad_size – The number of outgrads.

void ccv_nnc_micro_combine_free(ccv_nnc_micro_combine_t *const combine)

Free the combined op.

Parameters:

combine – The op to be freed.

void ccv_nnc_micro_combine_interpret(ccv_nnc_micro_combine_t *const combine, const uint32_t cmd, ccv_nnc_tensor_t *const *const inputs, const int input_size, const ccv_nnc_micro_scalar_t *const values, const int parameter_size, ccv_nnc_tensor_t *const *const outputs, const int output_size)

Run combined op in interpret mode. This is only useful for debug internals. Because this is for generic combined op, there is no hint, or flags, or stream context, or cmd.

Parameters:
  • combine – The op.

  • cmd – Choice between CMD_CUSTOM_FORWARD and CMD_CUSTOM_BACKWARD.

  • inputs – The input tensors.

  • input_size – The size of input tensors.

  • values – The value corresponding to the parameters when call ccv_nnc_micro_combine_new.

  • parameter_size – How many parameters. It must match when called ccv_nnc_micro_combine_new.

  • outputs – The output tensors.

  • output_size – The size of output tensors.

char *ccv_nnc_micro_combine_c(ccv_nnc_micro_combine_t *const combine)

Generate C code from the combined op.

Parameters:

combine – The combined op to generate some C code.

Returns:

The generated C code string.

struct ccv_nnc_micro_io_s
struct ccv_nnc_micro_scalar_t