NNC Common Neural Network Primitives

Computation graph is a powerful abstraction for scheduling and managing computations. Often times though, this can feel too raw. Conceptually, in a computation graph, all tensors are equal. But for neural networks, parameters (weights) and activations are different. Parameters are the configurations, while activations are the temporary states given the input.

Both weights and activations in computation graph are represented as ordinary tensors.

Model

Model is the core abstraction in common neural network primitives (CNNP) interface. It can represent both a layer or a group of layers. An ordinary neural network layer contains parameters, and applies the parameters to input neurons to generate activations in output neurons. The model abstraction goes beyond one input and one output. It can take multiple sets of input neurons, and generate activations on multiple sets of output neurons. The main difference between a model and a command in a concrete graph is that the model contains states (parameters).

The model itself is incredibly flexible. You don’t need to know any shape of the inputs or outputs to compose a model. Models are composable. The most simple way to compose a new model from a list of models is to use ccv_cnnp_sequential_new. This function takes a list of models, and runs activations through them sequentially (the input of current model is the output of the model prior). Alternatively, ccv_cnnp_model_new can compose a new model out of a set of model inputs and outputs. More on this in Model IO.

The model concept is meta in a sense that a model is not materialized until you call ccv_cnnp_model_compile with tensor inputs / outputs parameters. Internally, this method will materialize model into a symbolic graph that has proper shape. After a model compiled, either evaluate the model against inputs or train the model is possible.

Model IO

Composing a model with ccv_cnnp_model_new requires model inputs and model outputs. The concept of model inputs / outputs is remarkably similar to tensor symbols. In this case, it is broader. Ordinarily, ccv_cnnp_input gives a ccv_cnnp_model_io_t that represents a tensor as input. When ccv_cnnp_model_apply called with a model and set of inputs, its ccv_cnnp_model_io_t output represents a set of tensors generated by applying inputs against the said model. Thus, ccv_cnnp_model_io_t can conceptually both be a single tensor and a set of tensors. For the given model inputs and outputs, a set of models that are used to generate the outputs from the inputs can be traced to compose a new model. This also means a composed model can be used to compose a more complex model. In this way, the model IO abstraction is very natural to compose ever complex models.

Fit, Evaluate, Backward, Apply Gradients

CNNP provides two sets of APIs such that you can control different aspects of the training process yourselves. The first one is rather straightforward, ccv_cnnp_model_fit is the one-stop API that handles both compute losses and gradients’ update. If you want to have finer control over what losses propagated back, or accumulating gradients from multiple batches, you can use ccv_cnnp_model_evaluate, ccv_cnnp_model_backward and ccv_cnnp_model_apply_gradients.

By separating ccv_cnnp_model_fit API into 3 pieces, we can integrate CNNP models with dynamic graph easily. The ccv_dynamic_graph_evaluate under the hood calls into ccv_cnnp_model_evaluate, so does ccv_dynamic_graph_backward and ccv_dynamic_graph_apply_gradients. With this design, CNNP models with dynamic graph are now the recommended way to build models.