Chatbots and virtual assistants are often biased. Personal assistants, for example, are often feminized, reproducing harmful gender stereotypes about the role of women in society and the type of work women perform. The datasets and algorithms used in these AIs may also be biased, perpetuating existing discrimination and incorrectly interpreting the language of certain ethnic or socioeconomic groups.
When designing virtual assistants and chatbots, it is important to consider how they might perpetuate stereotypes and social inequalities. The designers of virtual assistants should be aware of how robots are gendered, e.g. by naming, voice, etc. (Søraa, 2017—see also Gendering Social Robots). Designers should adopt a participatory research approach to better understand how conversational AI agents can better fit a diverse group of users, based on intersecting traits such as gender, ethnicity, age, religion, etc.
1. Combatting the Harassment of Conversational AI
Feminized chatbots are often harassed. Pushing back on harassment presents an opportunity for AI to help stop gender and sexual harassment.
2. De-Biasing Data and Algorithms
Virtual assistants need to be trained on a wide variety of language so that they do not discriminate against language variations, dialects, accents or slang.
Conversational AI agents are increasingly used by companies for customer services and by consumers as personal assistants. These AIs are not scripted by humans and respond to human interlocutors using learning and human-guided algorithms (Brandtzaeg & Følstad, 2017; West, Kraut & Ei Chew, 2019). One challenge is that virtual assistants and chatbots are often gendered as female. Three of the four best known virtual assistants—Apple’s Siri, Amazon’s Alexa and Microsoft’s recently discontinued Cortana—are styled female through naming practices, voice and personality. Justifications center on the notion that users prefer female voices over male voices, especially when support is being provided (Payne et al., 2013). Gendering virtual assistants as female reinforces harmful stereotypes that assistants—always available, ready to help and submissive—should, by default, be female.
Another challenge is that the algorithms used by the conversational AI agents do not always acknowledge user gender or understand context-bound and culture-bound language. As a result, the conversations may be biased. Users may be addressed as males by default in gender-inflected languages, and the language used by minority groups may be filtered out as hate speech.
Method: Analyzing Gender and Intersectionality in Social Robots
When designing virtual assistants and chatbots, it is important to consider how gendering might perpetuate stereotypes and social inequalities. The designers of virtual assistants should be aware of how robots are gendered, e.g., by naming, voice, etc. (Søraa, 2017—see also Gendering Social Robots). Designers should adopt a participatory research approach to better understand how conversational AI agents can better fit a diverse group of users, based on intersecting traits such as gender, ethnicity, age, religion, etc.
Conversational AIs allow users to converse in natural language. This highly valuable interface requires the AI to respond appropriately to specific queries. But problems can arise. Such is the case with sexually charged and abusive human language. Virtual assistants designed with female names and voices are often harassed. These feminized digital voice assistants have often been programmed to respond to harassment with flirty, apologetic and deflecting answers (West, Kraut, & Ei Chew, 2019)—see below. They do not fight back.
Researchers at Quartz concluded that these evasive and playful responses reinforce stereotypes of “unassertive, subservient women in service positions and...[and may] intensify rape culture by presenting indirect ambiguity as a valid response to harassment” (Fessler, 2017). The problem is that humans often treat AIs in the same way as they treat humans, and if humans become accustomed to harassing AIs, this may further endanger women. In the US, one in five women have been raped in their lifetimes (Fessler, 2017).
Rachel Adams and Nóra Ni Loideain have argued that these new technologies reproduce harmful gender stereotypes about the role of women in society and the type of work women perform (Ni Loideain & Adams, 2018 and 2019). They argue further that virtual assistants of this kind indirectly discriminate against women in ways that are illegal under international human rights law and in violation of the United Nations Convention on the Elimination of All Forms of Discrimination Against Women (Adams, & Ni Loideain, 2019).
The innovation here is their argument that legal instruments exist that can be applied to addressing the societal harm of discrimination embodied in conversational AIs. In Europe, these legal instruments include the EU Charter of Fundamental Rights and, potentially, an expanded version of Data Protection Impact Assessments (Adams & Ni Loideain, 2019). In the U.S., the Federal Trade Commission, as the regulatory body broadly mandated with consumer protection, could play a similar role (Ni Loideain & Adams, 2019).
In response to such scrutiny, companies have updated their voice assistants with new responses. Siri now responses to “you’re a bitch” with “I don’t know how to respond to that”. Voice assistants are less tolerant of abuse. They do not, however, push back; they do not say “no”; they do not label such speech as inappropriate. They tend to deflect or redirect but with care not to offend customers. Alexa, for example, when called a bitch, responds, “I’m not sure what outcome you expect.” Such a response does not solve the structural problem of “software, made woman, made servant” (Bogost, 2018).
Take for example the well-known Microsoft bot, Tay, released in 2016. Tay was designed to be a fun millennial girl, and to improve its conversational abilities over time by making small talk with human users. In less than 24 hours, however, the bot became offensively sexist, misogynistic and racist. Failure hinged on the lack of sophisticated algorithms to overcome the bias inherent in the data fed into the program.
For conversational AIs to function properly and avoid bias, they must understand something about context—i.e. users’ gender, age, ethnicity, geographic location and other characteristics—and the socio-cultural language associated with them. For example, African American English (AAE) and particularly African American slang may be “blacklisted” and filtered out by algorithms designed to detect rudeness and hate speech. Researchers from the University of Massachusetts, Amherst, analyzed 52.9 million tweets and found that tweets that contained African American slang and vernacular were often not considered English. Twitter’s sentiment analysis tools struggled and its “rudeness” filter tended to misinterpret and delete them (Blodgett & O'Connor, 2017; Schlesinger et al., 2018). Similarly, Twitter algorithms fail to understand the slang of drag queens, where, for example, “love you, bitch” is an effort to reclaim these words for the community; the word "bitch" was filtered out as hateful (Interlab). Worryingly, drag queens’ tweets were often ranked as more offensive than those of white supremacists.
The innovation in this case is understanding that language differs by accent, dialect and community. This is an important aspect of the EU-funded project REBUILD, which developed an ICT-based program to help immigrants integrate into their new communities. The program will enable personalized communication between user and virtual assistants to connect immigrants seamlessly to local services.
Method: Analyzing Gender and Intersectionality in Machine Learning
AI technologies used in chatbots and virtual assistants operate within societies alive with issues related to gender, race and other forms of structural oppression. The data used to train AI agents should be analyzed to identify biases. The model and algorithm should also be checked for fairness, making sure no social group is discriminated against or unfairly filtered out.
Do virtual assistants and chatbots perpetuate stereotypes and social inequalities? This case study offers strategies designers can deploy to challenge stereotypes, de-bias algorithms, and better accommodate diverse user groups.
Gendered Innovations:
1. Combatting the Harassment of Conversational AI. Virtual assistants designed with female names and voices are often harassed. The problem is that humans who harass AIs may also harass real women. In the U.S., one in five women have been raped in their lifetimes.
AI has the potential to help stop gender and sexual harassment. Over the years, companies have designed voice assistants to be less tolerant of abuse. Siri, for example, now responds to “you’re a bitch” with “I don’t know how to respond to that.” AIs do not, however, push back; they do not say “no.” Virtual assistants tend to deflect or redirect but with care not to offend customers. Alexa, for example, when called a bitch, responds, “I’m not sure what outcome you expect.”
Researchers in Europe have argued that these conversational AIs reproduce harmful gender stereotypes about the role of women in society and the type of work women perform. They argue that legal instruments exist that can be applied to address this inequality. These include, in Europe, the EU Charter of Fundamental Rights and, potentially, an expanded version of Data Protection Impact Assessments, and, in the U.S., the Federal Trade Commission, as the regulatory body broadly mandated to protect consumers.
2. De-Biasing Data and Algorithms. African-American English and particularly African-American slang may be “blacklisted” and filtered out by algorithms designed to detect rudeness and hate speech. Researchers found that tweets that contained African-American slang and vernacular were often not considered English. Twitter’s sentiment analysis tools tend to misinterpret and delete them.
Similarly, Twitter algorithms fail to understand the slang of drag queens, where, for example, “love you, bitch” is an effort to reclaim these words for the community. Too often the word “bitch” was filtered out as hateful.
3. Gender-Neutral Conversation. To combat the harm of feminized virtual assistants, companies and researchers are developing gender-neutral voices and language. One such innovation is “Q,” the first genderless AI voice. Q was developed in Denmark. The database powering the voice was constructed by combining strands of the speech of gender-fluid people. The genderless range is technically defined as between 145 Hz and 175 Hz, a range that is difficult for humans to categorize as either female or male. Designers hope that this approach will add a viable gender-neutral option for voicing virtual assistants