Thursday, 23 August 2018

Tensorflow - The importance of session

Java Tensorflow


I've created a little web server that loads in a frozen graph and processes the images that are sent to it. It can be found here.

When I first set it up I was really disappointed at the performance, performance on the mobile phones were better than I was getting... 

My code was straight out of the example pages, e.g.
    try (Session session = new Session(this.graph)) {
      outputs = session
          .runner()
          .feed("image_tensor",tensor)
          .fetch("detection_scores")
          .fetch("detection_classes")
          .fetch("detection_boxes")
          .run();
    }

It was taking an age e.g. At first I thought it was down to the library not being compiled for my cpu as it was warning me on startup... so I recompiled (using the awesome documentation { basically
bazel build --config opt //tensorflow/java:tensorflow //tensorflow/java:libtensorflow_jni
} ), but it was still roughly the same. Not that then. But how bad was it?

2018-08-23 19:01:31.971  INFO 34476 --- [nio-9000-exec-8] u.c.s.t.c.TensorflowImageEvaluator       : session  time:0
2018-08-23 19:01:31.972  INFO 34476 --- [nio-9000-exec-8] u.c.s.t.c.TensorflowImageEvaluator       : runner setup  time:1
2018-08-23 19:01:36.050  INFO 34476 --- [nio-9000-exec-8] u.c.s.t.c.TensorflowImageEvaluator       : run time:4078
2018-08-23 19:01:36.051  INFO 34476 --- [nio-9000-exec-8] u.c.s.t.c.TensorflowImageEvaluator       : results time:0

That's right nearly 5 seconds... I though this was supposed to be quick... so what was I doing wrong? Basically not reusing the session. Once that is being reused... just look what happens on the same image. In this one I'm using a set of pre-pared sessions in a BlockingQueue to get the single thread use requirements for session
      Session session = this.sessions.take();
      Session.Runner runner = session
          .runner()
          .feed("image_tensor", tensor)
          .fetch("detection_scores")
          .fetch("detection_classes")
          .fetch("detection_boxes");
      start = logTimeDiff("runner setup  time:" ,start);
      outputs = runner.run();
      start = logTimeDiff("run time:" , start);
Just remember to pop the session back in at the end!

2018-08-23 19:01:57.739  INFO 34476 --- [io-9000-exec-10] u.c.s.t.c.TensorflowImageEvaluator       : session  time:0
2018-08-23 19:01:57.739  INFO 34476 --- [io-9000-exec-10] u.c.s.t.c.TensorflowImageEvaluator       : runner setup  time:0
2018-08-23 19:01:57.791  INFO 34476 --- [io-9000-exec-10] u.c.s.t.c.TensorflowImageEvaluator       : run time:51
2018-08-23 19:01:57.791  INFO 34476 --- [io-9000-exec-10] u.c.s.t.c.TensorflowImageEvaluator       : results time:0

Turns out it *is* fast after all!

My guess is its loading in the graph on first run so saving that... boom! fast runs!