The role of media in the age of digital reproduction: how do we deal with the increasing realism of fake digital media?
There was a lot of attention in the media this past week on the results of NVIDIA’s ICLR 2018 submission1. In the paper, the group at NVIDIA describe a few modifications they made to the training and construction of generative adversarial networks to generate image samples of somewhat higher quality than we’ve seen before.
At a high level, a generative adversarial network (GAN) consists of two competing networks (a generator and a discriminator network) where - at training time - the generator attempts to create samples that fool the discriminator network. Ideally, the generator eventually starts pulling samples from something that looks like the true distribution; oftentimes, however, a generator can get away with producing samples of little variation when pitted against a weak discriminator or get overwhelmed by a discriminator that becomes effective far too early in the training process. Efforts are being made to address some of these issues, so while the results are interesting, the networks themselves can be rather finicky.
We’re still quite far away from true photorealism or the singularity; the networks themselves don’t understand the semantics of real-world object behavior and are still quite sensitive to choice of architecture and hyperparameters. However, that the first reaction is to point out how this dovetails into the recent proliferation of fake news got me thinking about how our perceptions of images and video have changed as technologies mature.
Rewinding 50 years, the Vietnam War featured an unprecedented media presence (with the number of war correspondents surpassing 400 at the height of the war) where journalists were given nearly unlimited access to combat zones. Many carried cameras and audio recorders. The visceral accounts of atrocities, suffering, and US strategic positions sometimes ran counter to the government’s official narrative, ultimately turning the tide of public opinion against the war. Today, we can look at iconic photographs from the war (such as Nick Ut’s photo featuring Phan Thi Kim Phuc “the Napalm girl”, and Eddie Adams’ “General Nguyen Ngoc Loan executing a Viet Cong prisoner in Saigon”) and really get a sense of their far-reaching impact. I would argue that the constraints on analog photo/videography and its dissemination were key to this success - in other words, the universal power and believability of these images were only possible because they were (a) difficult to convincingly alter and (b) shared through (relatively) trusted institutions. That is not to say that these were universally trusted even by contemporaries: senior US government officials often speculated as to whether they were faked or whether to characterize them as such to the public. But those media artifacts had an “aura” of realism that is simply not possible today.
The advent of digital photography, the rise of the Internet, and the modern-day ubiquity of mobile phones have changed the playing field completely. Digital photographs are only treated as primary evidence in courts under certain circumstances. War-time photojournalism now competes on similar footing with conspiracy theories, especially when automated, click-optimizing Internet platforms are favored over hand-curation. It’s become commonplace for dodgy politicians to dismiss unflattering yet truthful reports as “fake news”. Ironically, creating videos using mobile phones should have been empowering, but it is precisely that ease which has stripped them of their former power. Over time, we’ve seen the question shift from “Is this photo depicting the subject unfairly?” to “Is this photo real?”. The introduction of GANs simply pushes the envelope further and may require us to reevaluate our definition of authenticity.
Given that images and video are already somewhat easy to fake2,3, how do we currently decide what to believe in? I’d like to start by noting the interesting trend towards verifiability through crowdsourcing. An individual recording of an event may be subject to scrutiny, but until we’re able to realistic multi-view videos, having multiple simultaneous views of the same event gives credence to the original video. This “alibi” effect can be further enhanced by the trustworthiness of the other video creators. More broadly, we can use these observations to reason about how we place our trust in digital media. We generally acquire our news from institutions that we trust. Each of these are composed of individuals who constantly stake their reputation on the work that they do. The occasional mistake can often be overlooked if it wasn’t made in bad faith or if the right corrective action takes place to retain trust. In addition, we may be inclined to believe something if a number of friends in whom we highly trust believe in it as well. But what happens when we and the institutions that we trust decide that we only associate with other like-minded individuals and institutions?
In Walter Benjamin’s essay “The Work of Art in the Age of Mechanical Reproduction” (1936), he describes how the introduction of mass production techniques that enable the mass distribution of perfectly-reproduced art diminishes the value of the physical artifact (e.g. the original Mona Lisa painting) and transforms our notion of art and authenticity, potentially increasing the overall value of art by creating a more shared experience. Digital media operates on many of the same principles; it lacks a well-defined concept of ownership and can be distributed even more easily to the masses. Yet, digital media can be altered and reproduced with ease and to an extent that Benjamin never imagined4. What makes it powerful, liberating, and dangerous at the same time is that the lines between creators and observers blur; it’s an ecosystem where we (largely) decide what’s made, what’s shared, and what languishes, allowing us to amplify and reinforce our best hopes and worst resentments.
In the end, we’re left with a few possible options. First, we can try reclaiming the concept of ownership. We can devise media formats (perhaps along the lines of subtle but functionally inconsequent watermarking5,6) that enable a proof of ownership. This allows us to trust known users, but what about anonymous sources or those that are not universally trusted? We may have to define some context-specific notion of authenticity (i.e. photos are not edited post-hoc in a semantically significant way, and are provably reproducible using the same settings on trusted hardware) or devise a way to verifiably capture lineage. This is probably extremely difficult to do for a general setting but perhaps it could be made to work for specific domains, such as photojournalism. Furthermore, we may explore the new interactive capabilities of evolving platforms (e.g. live streaming, real-time robotic control through VR) to build trust and see the world from other perspectives. Or maybe we can find better ways of connecting face-to-face in real life - perhaps technology could help us with that.