Cloning Amazon Go…(2/5)
First steps
My first step was to search some library with TensorFlow C#bindings.
I found the project TensorFlowSharp. By the moment of write this post it works with TensorFlow 1.9 and I tried to make by my self a working version of Object Detection example.
I made first a .NET Standar 2.0 library to encapsulate de Object Detection logic and after that I made a console application that should receive an image file and after that write an image file with the image with boxes with the objects detected and the estimation of score (the percentage of success in the detection).
My first problem was that based on the TensorFlowSharp example the amount of objects detected with some fast model was very small compared with the same python example.
I try to use a fast model ssd_mobilenet_v1_coco_2017_11_17 and this is my result:
I was not able to understand the problem and searching on internet I found this blog with the same problem….and I could not find any solution.
The Solution
After one day of tries I try to directly simplify the code on the same way that the python example. And I made these changes:
From the original example from ImageUtil.cs:
// The inception model takes as input the image described by a Tensor in a very
// specific normalized format (a particular image size, shape of the input tensor,
// normalized pixel values etc.).
//
// This function constructs a graph of TensorFlow operations which takes as
// input a JPEG-encoded string and returns a tensor suitable as input to the
// inception model.
//
private static TFGraph ConstructGraphToNormalizeImage( out TFOutput input,
out TFOutput output,
TFDataType destinationDataType = TFDataType.Float)
{
// Some constants specific to the pre-trained model at:
// https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip
//
// - The model was trained after with images scaled to 224x224 pixels.
// - The colors, represented as R, G, B in 1-byte each were converted to
// float using (value - Mean)/Scale.
//
const int w = 224;
const int h = 224;
const float mean = 117;
const float scale = 1;
var graph = new TFGraph();
input = graph.Placeholder(TFDataType.String);
output = graph.Cast(graph.Div(
x: graph.Sub(
x: graph.ResizeBilinear(
images: graph.ExpandDims(
input: graph.Cast(
graph.DecodeJpeg(contents: input, channels: 3),
DstT: TFDataType.Float),
dim: graph.Const(0, "make_batch")),
size: graph.Const(new int[] { w, h }, "size")),
y: graph.Const(mean, "mean")),
y: graph.Const(scale, "scale")), destinationDataType);
return graph;
I change the code to this one:
// The inception model takes as input the image described by a Tensor in a very
// specific normalized format (a particular image size, shape of the input tensor,
// normalized pixel values etc.).
//
// This function constructs a graph of TensorFlow operations which takes as
// input a JPEG-encoded string and returns a tensor suitable as input to the
// inception model.
//
private static TFGraph ConstructGraphToNormalizeImage( out TFOutput input,
out TFOutput output,
TFDataType destinationDataType = TFDataType.Float)
{
var graph = new TFGraph();
input = graph.Placeholder(TFDataType.String);
output = graph.Cast(
graph.ExpandDims(
input: graph.Cast(graph.DecodeJpeg(contents: input, channels: 3), DstT: TFDataType.Float),
dim: graph.Const(0, "make_batch")
)
, destinationDataType
);
return graph;
}
Once I made the change I got better results and the results are the same that I got in the Python example.
Next Step
The next step will be try to make the same on a video stream capturated from a camera.