A premier Indian infrastructure management company, responsible for security at numerous high-traffic IT parks, malls, and hospitals, faced a critical challenge. With 100+ cameras per site, their security teams could not efficiently track individuals or objects across different, disjointed video feeds, creating significant security vulnerabilities.
The core challenge was data overload from non-integrated systems. While numerous cameras were installed, they operated in silos. Tracking a person of interest meant security personnel had to manually pull feeds from multiple recorders, painstakingly synchronize them by time, and watch each stream individually. This process was slow, laborious, and extremely prone to human error, rendering post-incident reviews ineffective.
Furthermore, high-cost premium surveillance products were financially unviable for the APAC market. These solutions often demanded a complete "rip-and-replace" of the client's existing, multi-brand camera hardware, leading to vendor lock-in and exorbitant costs. The client needed an intelligent, low-cost, software-based solution that could integrate with and enhance their current infrastructure.
Embien addressed this challenge by proposing a cost-effective software enhancement instead of a new hardware strategy. Our expertise in image processing and the CCTV domain allowed us to design an "Integrated Object Tracking system with Image Stitching." The core concept was to transform the fragmented user experience into an interactive, 360-degree panoramic view, controlled simply with a USB joystick—much like a video game.
Technical Implementation
The primary technical hurdle was creating this seamless panoramic view in real-time from multiple, independent video streams.
Camera Grouping: Our solution began by logically grouping existing cameras. We identified sets of three adjacent cameras, each typically positioned at 60-degree angles, to create a continuous, overlapping field of view.
Algorithm & Framework Selection: We evaluated both OpenCV and Dlib libraries for the implementation. We ultimately selected OpenCV for its robust, high-performance features, which were ideal for this task. To stitch the video frames together, we employed the SIFT (Scale Invariant Feature Transform) algorithm, as it is exceptionally effective at finding unique features in an image, regardless of scale, rotation, or lighting variations.
The SIFT Process: Our implementation followed SIFT's four key steps for each frame:
Scale-Space Extrema Estimation: Using the Difference of Gaussian (DoG) to find potential interest points.
Keypoint Localization: Refining these points and eliminating low-contrast or poorly localized candidates.
Orientation Assignment: Assigning an orientation to each keypoint based on local image gradients to achieve rotation invariance.
Descriptor Generation: Creating a unique 'fingerprint' (descriptor) for each keypoint.
Real-Time Matching and Warping: Once SIFT generated descriptors and keypoints for frames from adjacent cameras, we used a Brute-force matcher to find the corresponding features in the overlapping regions. From these matches, we used homography to calculate the precise geometric transformation needed to "wrap" and seamlessly stitch the images into a single, wide-angle video stream.
Solving the Performance Challenge
The system had a strict performance requirement: maintain a smooth 30fps video feed. This meant the entire stitching process for three camera feeds had to be completed in under 30 milliseconds. Running the computationally expensive SIFT algorithm on every single frame from all three streams would fail this requirement.
Our key innovation was a "Learning Stage." Since the security cameras were in fixed positions, their relative overlaps did not change. Our application first runs this one time learning stage. During this stage, it performs the heavy computations identifying overlaps, generating SIFT descriptors, and finding keypoints and then caches this data.
After this initial learning, the application bypasses the expensive detection phase in normal operation. It uses the cached data to stitch the videos directly, drastically reducing processing power and easily achieving the 30fps target. To account for gradual changes in environmental lighting or minor camera shifts over time, this learning process is configured to repeat periodically, ensuring the stitched image remains accurate.
Tracking and Operator Control
Stitching the video created the panoramic view; the next step was to enable tracking. Leveraging our prior experience in deep learning for projects like Automatic Number Plate Recognition (ANPR), we integrated algorithms to detect faces and objects within the newly stitched frame.
To give the operator intuitive control, we integrated a standard USB joystick. On the Linux-based host system, joystick inputs (panning left or right) were translated into commands to rotate the panoramic stitched image. This allowed the security officer to manually "follow" a person of interest as they moved across the combined field of view, all from a single screen.
Finally, we delivered a complete solution with an intuitive UI. This interface allows administrators to configure the system, group cameras, add new devices, and validate the overlaps to ensure a perfect stitch. It uses industry-standard interfaces to fetch recordings from the client's existing security servers.
Maximized ROI:Eliminated the need for costly hardware replacement by integrating with and enhancing the client's existing multi-brand camera infrastructure.
Drastic Reduction in Manual Effort: Replaced the laborious, multi-screen task of synchronizing video files with a single, interactive "video game" like interface.
Enhanced Situational Awareness: Provided security operators with a seamless 360-degree panoramic view, effectively removing the blind spots between adjacent cameras.
Intuitive Real-Time Tracking: Enabled fluid, real-time object and person tracking using a simple joystick, significantly improving response times to security events.
Low-Cost, High-Impact Solution:Delivered a powerful software-based upgrade at a fraction of the cost of premium, all-in-one surveillance systems, providing immediate value.
Embien successfully transformed a fragmented and inefficient CCTV network into a powerful, cohesive surveillance tool. By applying intelligent image stitching and deep learning, we delivered a low-cost solution that empowers security teams to track threats intuitively and effectively. This pilot project demonstrates our ability to unlock hidden value in existing infrastructure.
If your organization struggles with disjointed video feeds or data silos, contact Embien today to explore a custom image processing and system integration solution.
Contact Embien to build your custom integrated surveillance software today.