Future Forward: New Tools Needed To Take Full Advantage of Big Data
This particular interview was recorded by Todd Danielson, the editorial director of Informed Infrastructure. You can watch a video of the full interview above or by visiting bit.ly/3CVk7sl.
It’s often assumed that cloud computing is the solution to properly using Big Data, the massive amount of information available on the internet. What can’t fit on a desktop computer or even much-larger company IT servers surely can fit in the data warehouses that host Big Data. Although it’s true these warehouses can host the data, they’re still too large to analyze (and therefore effectively use) through typical IT frameworks.
“It’s not necessarily true that cloud computing scales linearly,” explains Dr. Mike Flaxman. “And where it doesn’t is when you get into Big Data issues.”
Flaxman believes graphics processing units (GPUs) are a key tool to be able to process and analyze available data in an efficient manner to make better decisions.
“Just as a starting comparison, your typical good desktop machine might have 16 or 24 cores of central processing units (CPUs),” he notes. “A typical graphics card has 20,000 to 40,000 cores. That’s where GPU analytics comes in—you have a lot more ability to parallel process natively.”
A GPU is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. It’s often used in gaming, but it can be effectively applied in data analysis. Imagery analysis is a common GPU application in the infrastructure domain.
As an example, Flaxman notes that two-thirds of the world is cloudy on a given day, so two-thirds of imagery on a remote-sensing website probably isn’t desired or useful. Cloud computing allows users to filter out only cloud-free images, but, depending on the application, that’s likely still too much data for standard computing. GPUs, with their ability to compute in parallel, are a much faster option.
“When the data get big enough, you need to figure out how you’re going to parallel process on them, and cloud doesn’t solve that for you,” says Flaxman. “That’s where you have to start to reach to GPU.”
How It Can Work
To explain how GPU processing of Big Data works in a real application, Flaxman cites the city of Virginia Beach, which took 25 years of U.S. Geological Survey weather data (10 or more terabytes of data) and cut out just the parts it needed for Virginia Beach to model hurricane-based flooding.
“This is a good example of using the power of really large geospatial data and modeling projects, but then applying them in a local-use context to figure out which local streets are subject to flooding and what to do about it,” he notes.
He cites another example at the University of Michigan, which applied Big Data to the Flint water crisis. They had partial information on which pipes feeding which houses had lead in their water supply, and they built a machine-learning model to better predict likelihood of contamination. A public information site then was created, where anyone can get an estimation of the likelihood there’s a lead issue at any individual address.
“It’s a very traditional risk analysis, but this public face gets directly down to the consumer,” notes Flaxman. “If I’m a homeowner, and you tell me the average lead in the water supply is ‘some number’ across my city, that’s a lot less compelling than telling me there’s a 75 percent chance I have lead in my water supply.”
Time Will Tell
Another area Flaxman believes has extensive public benefit is real-time analysis of where people are and which infrastructure they’re using.
“Emerging issues such as environmental or infrastructure justice can be well addressed by taking a look at who has access to what infrastructure and who doesn’t have access to infrastructure,” he explains. “That is something most engineers have the technical capability to do.”
Telecommunications companies are using GPU analytics to plan where they install assets based on where people are day and night, where they live, work and play. Such data are widely and commercially available.
“It goes well beyond remote sensing, because it has that temporal dimension of when people use an asset and when there’s surge demand on that asset,” adds Flaxman.
About Todd Danielson
Todd Danielson has been in trade technology media for more than 20 years, now the editorial director for V1 Media and all of its publications: Informed Infrastructure, Earth Imaging Journal, Sensors & Systems, Asian Surveying & Mapping, and the video news portal GeoSpatial Stream.