In this article, I'll show you how to convert your Keras or Tensorflow model to run on the Neural Compute Stick 2.


I assume that you have a working development environment with the OpenVino toolkit installed and configured. If this is not the case, follow this guide for the Raspberry Pi 3 and this one for Ubuntu.

Starting with a Keras model

Let's say that you start with a Keras model, it can be either a .h5 file that described the whole model and weights, or separate files (model.json and weights.h5). You'll have to convert your Keras model to Tensorflow first, here's how to do it.

You can find the whole code, with the creation of a Keras model on my GitHub.

First, load your Keras model.

from keras.models import load_model
from keras import backend as K

# loading keras model
model = load_model(keras_model)

Then, convert it to a TF model and save it as a .pb file.

import tensorflow as tf

# create a frozen-graph of the keras model
frozen_graph = freeze_session(K.get_session(),
                                output_names=[ for out in model.outputs])

# save model as .pb file
tf.train.write_graph(frozen_graph, "TF_model/", "tf_model.pb", as_text=False)

The magic really happens in the freeze_session function:

def freeze_session(session, keep_var_names=None, output_names=None, clear_devices=True):
    from tensorflow.python.framework.graph_util import convert_variables_to_constants
    graph = session.graph
    with graph.as_default():
        freeze_var_names = list(set( for v in
                                tf.global_variables()).difference(keep_var_names or []))
        output_names = output_names or []
        output_names += [ for v in tf.global_variables()]
        input_graph_def = graph.as_graph_def()
        if clear_devices:
            for node in input_graph_def.node:
            node.device = ""
        frozen_graph = convert_variables_to_constants(session, input_graph_def,
                                                        output_names, freeze_var_names)
    return frozen_graph

You now have a Tensorflow model named tf_model.pb in the TF_model/ directory.

Use Model Optimizer to convert the TF model

Now we can use the model optimizer to convert the Tensorflow model to a IR file that we can use on the Neural Compute Stick 2. --data_type FP16 --framework tf --input_model TF_model/tf_model.pb --model_name IR_model --output_dir IR_model/ --input_shape [1,28,28,1] --input conv2d_1_input --output activation_6/Softmax

Few things here:

  • –data_type is used to specify the precision you want to use. From what I know, FP32 will not work on the NCS2. Something less than FP16 should work but you might have poor results due to loss of precision
  • –input_model is the path to your TF model
  • –model_name is the name of the converted IR file you'll create
  • –output_dir is the directory where your IR file will be saved
  • –input_shape is shape of your input tensor. In this model case it's [1,28,28,1] because there's one image of size 28x28 with one channel (grayscale)
  • –input is used to specified the input layer of your model
  • –output is used to specified the output layer of your model

Note: If you don't know the input or output layer name, you can see it when training the Keras model with the output of model.summary(). You can also use Netron. For that, run pip install netron and run: netron -b TF_model/tf_model.pb head to http://localhost: to see you're model and get the name of the input/output layers. model view with netron

If everything goes according to plan, the output will show a summary of the arguments you've used and you should have an output like the one below and an IR file.

[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /home/mreliptik/Documents/dev/Keras_to_TF_NCS2/IR_model/IR_model.xml
[ SUCCESS ] BIN file: /home/mreliptik/Documents/dev/Keras_to_TF_NCS2/IR_model/IR_model.bin
[ SUCCESS ] Total execution time: 7.88 seconds.

Make a prediction using the Neural Compute Stick 2

Now that we have an IR model, we can use that to run on the NCS2. In the GitHub repo, the file is used for that.

from openvino.inference_engine import IENetwork, IEPlugin
model_xml = "IR_model/IR_model.xml"
model_bin = "IR_model/IR_model.bin"
# Plugin initialization for specified device
plugin = IEPlugin(device="MYRIAD")

net = IENetwork(model=model_xml, weights=model_bin)

# Loading model to the plugin
exec_net = plugin.load(network=net)

With this few lines, you first read the model IR files, you then initialize the device and create a network from the loaded files. With the last line you load the network to the device, that returns a network ready for inference.

Then simply use exec_net.infer() to run the inference.

	res = exec_net.infer(inputs={input_blob: prepimg})

Voila! You successfully converted a model to IR and ran it on the Compute Stick!

Final thoughts

The provided model optimizer is pretty forward to use. You still have to know what you're doing though. I thought that Intel would have developed something a bit more user friendly.

Next up

In the following weeks, I'll be working on my Handpose project to make it run with the NCS2. Stay tuned!

See you around for more!