Object tracking with image stitching

In today's world, security is a major concern in all domains and customers. As there are many products and solutions to cater to security requirements, nowadays abundance of data is a major concern than a lack of them a few years ago.

Technically security products are categorized as preventive, detective, and responsive. Preventive mechanisms are mostly static such as using locks with access doors etc. The preference is towards moving to detective products includes using CCTV cameras, motion sensors, etc, and responsive mechanisms acting on the detected alarm such as dialing emergency numbers, raising alarms, etc.

While responsive mechanism are much recent trends, detective products have a rich installation base. And most of the detective solutions are based on software options. Hence the feature set is endless grows along with technology evolution, field requirements, and customer imagination. It is also expected to upgrade the system with a new set of features any time without disturbing dependent components.

Of all detective systems, CCTV (Closed-circuit television) and sensors play a major role. This is widely used in home automation as well as in industrial segments and corporates. Some of the advanced features include image tracking, zoom in/out, ANPR (Nameplate recognition), data analytics such as people count, crowd prediction, face recognition to identify person of interests etc. And regular cameras could be using lens covering 50-70 degrees field of view (FoV) and wide-angle camera – 180 to 360 degrees FoV. With numerous video streams coming from different cameras, there is a need for a mechanism to effectively playback the data and follow a person or object in the recorded stream. Here we will discuss one of our implementations of Object tracking with image stitching technique.

Customer Pain Point

Our customer is one of the leading infrastructure management companies in India catering to various high-profile IT parks, Malls, Gated Communities, Hospitals, etc across the country. As a part of their security system, they have numerous cameras installed in each of these locations. While some of the cameras are intelligent and capable of detecting events, most of them are recorders. The security servers can store many hours of recording and can make the data available quickly playback the same.

These cameras have different features based on various requirements. Some of them have a wider angle of view, some of them are capable of live person tracking, person counter. Some of the advanced cameras can even identify no-parking violations, one-way driving etc.

With numerous live streams coming from across the campuses (sometimes going up to 100+ cameras in larger sites), it has become difficult to track them coherently. Even if an event is identified and route cause is found, following up with them is difficult. For example, tracking a person across different cameras at different point-of-views is laborious as feeds from many cameras have to be synchronized and played.

While there are some premium products, cost is a major challenge in the APAC market. Though the solution may not be 100% foolproof, prospects of lower-cost outweigh these features. And many of these premium products can only work with a certain set of cameras and it is not viable to replace the existing cameras.

Business Solution

Embien was tasked with the above challenge to create a low-cost solution to enhance the security system and reduce the manual overhead during a security review. With our expertise in image processing and CCTV domain knowledge, we did a detailed comparative analysis of all new products and solutions in the market w.r.t price, features, and AMC. We identified a list of features that can be offered to customers with less time, less cost and less risk. In this context, risk is adding a new feature that breaks an existing feature. We proposed a solution to create a software that can follow an object/person in 360 degrees using a joystick. The customer was excited with the approach as it is easy to track movements across multiple cameras and times with very little effort like in a video game.

Technical solution

Embien proceeded with the technical implementation of the object tracking with image stitching solution. Wide-angle view feature is provided by stitching the frames from at least 3 camera recordings placed at about 60 degrees each.

  • The Correct frame needs to be identified to start image stitching in 3 videos.
  • Image stitching time should be less than 30ms.
  • 30fps need to be maintained to have a smooth video.

Both OpenCV and dlib are considered for implementation and favored OpenCV for this solution. In OpenCV, there are many algorithms to stitch images, among them the fastest technique is SIFT (Scale Invariant Feature Transform) algorithm. The SIFT algorithm has 4 basic steps.

  • Estimate a scale-space extremum using the Difference of Gaussian (DoG).
  • Key point localization where the key point candidates are localized and refined by eliminating the low contrast points.
  • Key point orientation assignment based on local image gradient
  • Descriptor generator to compute the local image descriptor for each key point based on image gradient magnitude and orientation.

After finding the descriptor and key points, one can use the Brute force matcher to match features. From a Brute force matcher, it is possible to get the best possible results. Of these, it is possible to sort them by tuning their distance and obtain points for images with the same features. By using homography, the images can be wrapped and stitched together.

In our implementation, we grouped all the cameras into groups of three which has continuous overlap with the correct frame number. Due to this overlap among images, the image stitching algorithm identifies the matching points and stitch the video accordingly. This will consume more processing power for identifying the matching points for every frame. Since our solution is based on a fixed camera, we provided a learning stage for a few initial frames to identify the overlaps and generating the descriptor and key points. Post learning our application stitch the videos directly. After stitching the videos may have change in light condition and on some occasions angle of view. So, the learning process is repeated once in a while to keep the images in sync.

Image stitching

Image stitching

Following an object/person in a pre-recorded video require a marriage between image processing and deep learning data analytics. Embien with our prior experience in deep learning for nameplate recognition, we used some of the best algorithms available to detect the face or object in the frame. Tracking the person or object requires external hardware to control the field of view. For creating a 360-degree view experience, we integrated a USB based joystick to the system. In this Linux based product, we rotated the stitched images (left or right) based on the signal received from this joystick.

Configurable options are provided in the UI so that the end customer can configure a new camera or remove installed cameras on a need basis. An intuitive interface is provided that help in grouping cameras with overlapping field of view and validating them for overlaps. Industry Standard interfaces are provided to fetch recordings from the security server.

About Embien

Embien with its expertise in image processing and deep learning can quickly create a low-cost solution for object tracking with image stitching of video from multiple camera feeds. The solution was successfully deployed in a pilot project and will find a wider deployment on successful trials. Feel free to get in touch with us to help us part of such success stories.