The Data You Don't See

This may be one of the coolest data analysis stories I’ve come across yet:

During WWII, the US Navy tried to determine where they needed to armor their aircraft to ensure they came back home. They ran an analysis of where planes had been shot up, and came up with a detailed map of bullet impacts. As the image below shows, obviously the places that needed to be up-armored are the ones taking most of the bullets: wingtips, the central body, and the elevators. That’s where most of the damage was happened.

Abraham Wald, a statistician, disagreed. He thought they should add better armor to the nose area, engines, and mid-body. Which was crazy, of course. That’s not where the planes were getting shot. Except Mr. Wald realized what the others didn’t – the planes were getting shot there too, but they weren’t making it home. What the Navy thought it had done was analyze where aircraft were suffering the most damage. What they had actually done was analyze where aircraft could suffer the most damage without catastrophic failure. All of the places that weren’t hit? Those planes had been shot there and crashed. They weren’t looking at the whole sample set, only the survivors.

Moral of the story: the numbers don’t lie, but our interpretation of them often hides the real truth.