Contrasting views of the Lagoon nebula. Top: Infrared observations from the Paranal Observatory in Chile cut through dust and gas to reveal a crisp view of baby stars within. Bottom: A similar view in visible light appears opaque. 

ESO, VVV
For Kirk Borne, the information revolution began 11 years ago while he was working at NASA’s National Space Science Data Center in Greenbelt, Maryland. At a conference, another astronomer asked him if the center could archive a terabyte of data that had been collected from the MACHO sky survey, a project designed to study mysterious cosmic bodies that emit very little light or other radiation. Nowadays, plenty of desktop computers can store a terabyte on a hard drive. But when Borne ran the request up the flagpole, his boss almost choked. “That’s impossible!” he told Borne. “Don’t you realize that the entire data set NASA has collected over the past 45 years is one terabyte?”
“That’s when the lightbulb went off,” says Borne, who is now an associate professor of computational and data sciences at George Mason University. “That single experiment had produced as much data as the previous 15,000 experiments. I realized then that we needed to do something not only to make all that data available to scientists but also to enable scientific discovery from all that information.”
The tools of astronomy have changed drastically over just the past generation, and our picture of the universe has changed with them. Gone are the days of photographic plates that recorded the sky snapshot by painstaking snapshot. Today more than a dozen observatories on Earth and in space let researchers eyeball vast swaths of the universe in multiple wavelengths, from radio waves to gamma rays. And with the advent of digital detectors, computers have replaced darkrooms. These new capabilities provide a much more meaningful way to understand our place in the cosmos, but they have also unleashed a baffling torrent of data. Amazing discoveries might be in sight, yet hidden within all the information.
For the first time 
in history, we cannot 
examine all our data,” says Caltech astronomer George 
Djorgovski. “It’s not just 
the volume of data. It’s also the quality and complexity.”
A new generation of sky surveys promises to catalog literally billions and billions of astronomical objects. Trouble is, there are not enough graduate students in the known universe to classify all of them. When the Large Synoptic Survey Telescope (LSST) in Cerro Pachón, Chile, aims its 3.2-
billion-pixel digital camera (the world’s largest) at the night sky in 2019, it will capture an area 49 times as large as the moon in each 15-second exposure, 2,000 times a night. Those snapshots will be stitched together over a decade to eventually form a motion picture of half the visible sky. The LSST, producing 30 terabytes of data nightly, will become the centerpiece of what some experts have dubbed the age of peta­scale astronomy—that’s 1015 bits (what Borne jokingly calls “a tonabytes”).
The data deluge is already overwhelming astronomers, who in the past endured fierce competition to get just a little observing time at a major observatory. “For the first time in history, we cannot examine all our data,” saysGeorge Djorgovski, an astronomy professor and codirector of the Center for Advanced Computing Research at Caltech. “It’s not just data volume. It’s also the quality and complexity. A major sky survey might detect millions or even billions of objects, and for each object we might measure thousands of attributes in a thousand dimensions. You can get a data-mining package off the shelf, but if you want to deal with a billion data vectors in a thousand dimensions, you’re out of luck even if you own the world’s biggest supercomputer. The challenge is to develop a new scientific methodology for the 21st century.”