In last couple of years we had the chance to work quite a lot on Multiple Object Tracking, a very hot topic in research; differently from many other tasks in computer vision, deep learning here is (still) not the definitive technology for solving the task. However the debate is quite open. Please check MOT challenge for an always up-to-date reference.

Having received important requests on multiple people tracking, and having investigated the several options available in literature, we have implemented with much success for our customers the Multiple Hypothesis Tracking (MHT), which places its roots back to a seminal paper of D.B. Reid (IEEE Transactions on Automatic Controls, 1979).

At first we deployed and tailored the efficient implementation of Cox and Hingorani (IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996) on ARM-based smart-cameras for people counting application with far infrared sensors; despite the exponential complexity of the task, our implementation runs smoothly at 30fps on an Cortex-A8 single-core 1Ghz ARM, taking less than 10% of the CPU.

Then we dived deep into the more recent MHT Revisited (Kim et al., Proceedings of ICCV 2015) which cast the tracking onto a graph optimization tasks and takes into account the tracks visual appearance by means of deep-learning based features; we implemented it back from scratch in C++ and by means of further computational optimizations (by use of Gurobi, the Mathematical Programming Solver and SCIP, a mixed integer programming solver), we made it work between 30 to 100 fps depending on the number of detections on Intel i7. Take a look down here to the MHT Revisited results on MOT17 challenge public datasets.