Anyone who has searched for a single item in any of the commonly available satellite images available online today understands its comparison to the "needle in a haystack" adage. Zooming up close reveals details that are seldom identifiable—unless pre-labeled—and when zooming out, an aerial image covering square miles can induce feelings ranging from tedious to seemingly hopeless. For professionals, the process of finding a single item, such as a specific vehicle, campsite, home, or building, necessarily involves the coordinated efforts of scanning the zoomed-up version of a large image for likely candidates, followed by a boots-on-the-ground verification step by actual people.
When time is of the essence, artificially intelligent (AI) pattern recognition algorithms are coming to the rescue.
The phrase used today for such AI-driven pattern recognition assistance to on-the-ground searches is called a visual active search—VAS. A visual active search uses aerial imagery to direct ground-based personnel to find everything from lost hikers to kidnapped children to illegal shipments of weapons, drugs, rare animals, or humans. The coordinated effort of algorithmic pattern recognition, machine learning, and ground personnel is key to a successful VAS.
"Adding AI to visual searches has the potential to drastically reduce the time and resources spent when working from scanned images over a large area," said applied AI expert Ben Moseley, who is the Carnegie Bosch Associate Professor of Operations Research at Carnegie Mellon University (CMU) "This can potentially speed up the search in numerous applications, such as using imagery to detect wildlife poaching activity, in search-and-rescue scenarios, or identifying illegal trafficking."
Unfortunately, there are usually not enough ground personnel to physically search every likely location on an aerial image. AI—specifical, deep reinforcement learning (DRL)—can help by learning the "needle" image and pinpointing the most likely grid locations on a aerial image from its training on past search data. Often, however, there is no specifically-trained DRL neural network available to pinpoint the most likely grid locations to be searched for a specific "needle."
The answer is enhancing DRL neural networks to adapt in real time, according to Yevgeniy Vorobeychik at Washington University, St. Louis. Today, using deep reinforcement learning is mainly an end-to-end solution: a pretrained neural network divides an aerial image into a fixed number of grid locations, then orders the likelihood of each grid location containing the desired object from its past training data. However, according to Vorobeychik, many, if not most, searches do not have a pre-trained neural network available. Instead, Vorobeychik's group, including fellow Washington University professor Nathan Jacobs and doctoral candidate Anindya Sarkar, are making use of the real-time results of an ongoing search to adapt a neural network's learning process, using metadata to supervise how to pick the next-most-likely grid location from the results found by on-the-ground searchers so far.
"Our adaptive visual active search algorithm divides the search area into a grid, then learns using two neural networks: a task-specific location prediction module and a task-agnostic search module," said Vorobeychik. "By using a novel meta-learning approach to update the prediction module's initialization parameters with the incoming results of the physical ground-based search so far, the overall performance is greatly improved over today's end-to-end methods which do not make full use of available data during training or actual searches."
The researchers recently described the new method's inner workings and preliminary test results in a paper presented in December at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023, New Orleans). The Washington University researchers also have made their code and data available for download at GitHub.
In a nutshell, the researchers' partially supervised reinforcement learning framework for visual active searches is divided into three sequential algorithms. The first is a standard ResNet-34 neural network pre-trained on the popular ImageNet dataset of over 14 million photos pre-labeled, for which the more than 20,000 categories of objects are pictured in a frame. Vorobeychik's group then appended two other learning neural networks to the pre-trained ResNet. The researchers' learning parts of the overall algorithm use the classifications of the pre-trained ResNet neural network to create an application-specific grid-location prediction learning neural network, followed by an application-agnostic search learning neural network. Grid location searching is supervised by gleaning feedback from the results—favorable or unfavorable—of ongoing searches on-the-ground. Search results are fed back as meta-data to the prediction neural network, which gradually learns to increase its effectiveness at choosing the next grid location to physically search.
In more detail, the algorithm starts by pre-dividing a satellite image of the overall search area into a set of grid sector locations. The grid sector locations each are classified by the ResNet neural network (whose configuration is held fixed—that is, it does not learn). The classification outputs of ResNet are passed to the next part of the algorithm, the prediction module. Its neural network gradually learns from the search result meta-data fed back at each step from the search results of that grid location. As a result, the prediction and search neural networks learn how to hone in on grid sector locations which become increasingly likely to contain the objects specific to that application (such as the vehicle type, in an "Amber Alert" search for a kidnap victim).
Overall grid location prediction and search policies are also a part of the algorithm; for instance, the overall budget available for a VAS is also tracked through each time step, to balance the remaining budget available against the likelihood of finding the "needle" in the remaining unsearched grid locations.
"The search module operates based on the output of the prediction module and the remaining budget as modified by the policy's actions in the preceding time step. Therefore, the search module learns the heuristic of determining which grid to query next when the remaining budget is low as part of the policy training process," said Sarkar.
The searching module learns the physical search results, allowing the policy actions to accumulate knowledge from previous grid-location search results, and passes that meta-data to the prediction neural network to improve its accuracy in pin-pointing the next-most-likely grid-sector to search. At each time step, the cost of a search is subtracted, so as to only pick next grid locations that keep the total project within its budgetary limits.
R. Colin Johnson is a Kyoto Prize Fellow who has worked as a technology journalist for two decades.
No entries found