ImageNet Roulette was a project created in support of Kate Crawford and Trevor Paglen’s excellent essay Excavating AI. You could upload a photo (or take one using a webcam) and ImageNet Roulette would attempt to detect any faces and then label them, frequently with absurd labels, many of which included professions.
ImageNet is one of the most significant training sets in the history of AI. A major achievement. The labels come from WordNet, the images were scraped from search engines. The 'Person' category was rarely used or talked about. But it's strange, fascinating, and often offensive.
— Kate Crawford (@katecrawford) September 16, 2019
While I do not recall all of the labels that were suggested when I submitted a webcam portrait, the one that I do recall is Widower.
ImageNet Roulette was not built to be intentionally bad but rather to draw attention to the role of data and labels in training AI models.
August Sander
August Sander was a German portrait photographer working in the first half of the 20th century whose life work People of the 20th Century consisted of portraits organised into a typology. Teju Cole describes this body of work as follows:
In the work Sander produced around and just following the First World War, he created a catalog of images that stood in for an entire generation in Weimar Germany. Farmers, cooks, stevedores, teachers, priests, and manual laborers were all represented in their full dignity, and Sander achieved something like a double-portraiture in each case, because each actual individual was at the same time a representative type.
My first instinct when I came across ImageNet Roulette (back when it was still active) was to see what typologies it might suggest if I provided August Sander portraits as inputs.
ImageNet Roulette vs August Sander
What does a Theosophist even look like? Or, more relevantly, why might a model associate Theosophist with particular images? One obvious possibility is that images of Theosophists in the training set were formally similar because they are from the same era and were similarly toned black and white photographs. Already we can see that this approach may be telling as much about the overall context as the face.
A particularly concerning example of this phenomenon is a model that was trained to detect malignant skin lesions:
we noted that the algorithm appeared more likely to interpret images with rulers as malignant. Why? In our dataset, images with rulers were more likely to be malignant; thus the algorithm inadvertently “learned” that rulers are malignant. These biases in AI models are inherent unless specific attention is paid to address inputs with variability.
Automated Classification of Skin Lesions: From Pixels to Practice - Narla, Kuprel, Sarin et al.
And there it is, the racism. The fair-skinned get labels (however wrong) about what they do and the only black man is labelled by the colour of his skin. To add further insult the only undetected face (of those whose gaze meets the camera) is a black woman. Her gaze seems reproachful.
Algorithms for detecting faces, at least historically, were largely trained by white men and most frequently did not have sufficiently diverse training data to learn to recognise faces across a variety of skin tones. It is important to note that this is yet another mechanism by which power structures reinforce themselves without centralised control. The engineers involved were not acting on orders to produce an algorithm that favoured one set of skin tones over another but they managed to arrive at that outcome all the same.
Algorithmic bias has been written about extensively and is an increasingly important subject. I’ll provide some starting points for where to read further in Related Reading below.
With that said I don’t think that this form of bias is what Crawford and Paglen wanted to highlight with ImageNet Roulette.
Invisible essences
To my mind, the central issues raised by ImageNet Roulette are concerned with meaning.
If we consider some of the labels we have seen above we should ask in what way these labels possibly could correspond to the surface of an image. Does it even make sense to say that a person looks so much like an orphan/biographer/aviatrix(!)/revisionist that we could conclude that that is what they are? In what way does a dentist, outside of their practice, look like a dentist? What traits should even be detectable from an image?
Training sets for AI, specifically Computer Vision, may have some set of labels that do not uniformly correspond to concrete categories. For example, race is a socially constructed category and one that varies in definition both in time and space.
The classificatory schema for race recalls many of the deeply problematic racial classifications of the twentieth century. For example, the South African apartheid regime sought to classify the entire population into four categories: Black, White, Colored, or Indian. Around 1970, the South African government created a unified “identity passbook” called The Book of Life, which linked to a centrally managed database created by IBM. These classifications were based on dubious and shifting criteria of “appearance and general acceptance or repute,” and many people were reclassified, sometimes multiple times. The South African system of racial classification was intentionally very different from the American “one-drop” rule, which stated that even one ancestor of African descent made somebody Black, likely because nearly all white South Africans had some traceable black African ancestry.
Crawford and Paglen
What are the assumptions undergirding visual AI systems? First, the underlying theoretical paradigm of the training sets assumes that concepts—whether “corn”, “gender,” “emotions,” or “losers”—exist in the first place, and that those concepts are fixed, universal, and have some sort of transcendental grounding and internal consistency. Second, it assumes a fixed and universal correspondences between images and concepts, appearances and essences. What’s more, it assumes uncomplicated, self-evident, and measurable ties between images, referents, and labels. In other words, it assumes that different concepts—whether “corn” or “kleptomaniacs”—have some kind of essence that unites each instance of them, and that that underlying essence expresses itself visually. Moreover, the theory goes, that visual essence is discernible by using statistical methods to look for formal patterns across a collection of labeled images. Images of people dubbed “losers,” the theory goes, contain some kind of visual pattern that distinguishes them from, say, “farmers,” “assistant professors,” or, for that matter, apples. Finally, this approach assumes that all concrete nouns are created equally, and that many abstract nouns also express themselves concretely and visually (i.e., “happiness” or “anti-Semitism”).
Crawford and Paglen
The Typology of August Sander
August Sander’s manner of organising his portraits has aged poorly.
The top-level categories are:
- The Farmer
- The Skilled Tradesmen
- The Woman
- Classes and Professions
- The Artists
- The City
- The Last People
Each category is further broken down into several portfolios of portraits, such as The Master Craftsman, The Woman in Intellectual and Practical Occupation and People Who Came To My Door. The details of these portfolios, along with their photographs, can be found at the August Sander Stiftung.
It isn’t really clear to me how or why Sander used this particular scheme. It may have been a desire to provide structure to a large body of work or perhaps an attempt to ensure breadth. Both Sander and his son appear at times in his work. Sander is simply presented as ‘Photographer’. Tellingly I could not find this self-portrait in People of the 20th Century. Should photographers appear under Skilled Tradesmen, Classes and Professions or perhaps Artists?
Related Reading
- Excavating AI - Crawford and Paglen.
- Working with Faces - Kyle McDonald.