Microsoft is backing away from its public support for some AI-driven features, including facial recognition, and acknowledging the discrimination and accuracy issues these offerings create. But the company had years to fix the problems and didn’t. That's akin to a car manufacturer recalling a vehicle rather than fixing it.
Despite concerns that facial recognition technology can be discriminatory, the real issue is that results are inaccurate. (The discriminatory argument plays a role, though, due to the assumptions Microsoft developers made when crafting these apps.)
Let’s start with what Microsoft did and said. Sarah Bird, the principal group product manager for Microsoft's Azure AI, summed up the pullback last month in a Microsoft blog.
“Effective today (June 21), new customers need to apply for access to use facial recognition operations in Azure Face API, Computer Vision, and Video Indexer. Existing customers have one year to apply and receive approval for continued access to the facial recognition services based on their provided use cases. By introducing Limited Access, we add an additional layer of scrutiny to the use and deployment of facial recognition to ensure use of these services aligns with Microsoft’s Responsible AI Standard and contributes to high-value end-user and societal benefit. This includes introducing use case and customer eligibility requirements to gain access to these services.
"Facial detection capabilities–including detecting blur, exposure, glasses, head pose, landmarks, noise, occlusion, and facial bounding box — will remain generally available and do not require an application.”
Look at that second sentence, where Bird highlights this additional hoop for users to jump through “to ensure use of these services aligns with Microsoft’s Responsible AI Standard and contributes to high-value end-user and societal benefit.”
This certainly sounds nice, but is that truly what this change does? Or will Microsoft simply lean on it as a way to stop people from using the app where the inaccuracies are the biggest?
One of the situations Microsoft discussed involves speech recognition, where it found that “speech-to-text technology across the tech sector produced error rates for members of some Black and African American communities that were nearly double those for white users,” said Natasha Crampton, Microsoft’s Chief Responsible AI Officer. “We stepped back, considered the study’s findings, and learned that our pre-release testing had not accounted satisfactorily for the rich diversity of speech across people with different backgrounds and from different regions.”
Another issue Microsoft identified is that people of all backgrounds tend to speak differently in formal versus informal settings. Really? The developers didn’t know that before? I bet they did, but failed to think through the implications of not doing anything.
One way to address this is to reexamine the data collection process. By its very nature, people being recorded for voice analysis are going to be a bit nervous and they are likely to speak strictly and stiffly. One way to deal with is to hold much longer recording sessions in as relaxed an environment as possible, After a few hours, some people may forget that they are being recorded and settle into casual speaking patterns.
I've seen this play out with how people interact with voice recognition. At first, they speak slowly and tend to over-enunciate. Over time, they slowly fall into what I’ll call "Star Trek" mode and speak as they would to another person.
A similar problem was discovered with emotion-detection efforts.
More from Bird: “In another change, we will retire facial analysis capabilities that purport to infer emotional states and identity attributes such as gender, age, smile, facial hair, hair, and makeup. We collaborated with internal and external researchers to understand the limitations and potential benefits of this technology and navigate the tradeoffs. In the case of emotion classification specifically, these efforts raised important questions about privacy, the lack of consensus on a definition of emotions and the inability to generalize the linkage between facial expression and emotional state across use cases, regions, and demographics. API access to capabilities that predict sensitive attributes also opens up a wide range of ways they can be misused—including subjecting people to stereotyping, discrimination, or unfair denial of services. To mitigate these risks, we have opted to not support a general-purpose system in the Face API that purports to infer emotional states, gender, age, smile, facial hair, hair, and makeup. Detection of these attributes will no longer be available to new customers beginning June 21, 2022, and existing customers have until June 30, 2023, to discontinue use of these attributes before they are retired.”
On emotion detection, facial analysis has historically proven to be much less accurate than simple voice analysis. Voice recognition of emotion has proven quite effective in call center applications, where a customer who sounds very angry can get immediately transferred to a senior supervisor.
To a limited extent, that helps make Microsoft’s point that it is the way the data is used that needs to be restricted. In that call center scenario, if the software is wrong and that customer was not in fact angry, no harm is done. The supervisor simply completes the call normally. Note: the only common emotion-detection with voice I've seen is where the customer is angry at the phonetree and its inability to truly understand simple sentences. The software thinks the customer is angry at the company. A reasonable mistake.
But again, if the software is wrong, no harm is done.
Bird made a good point that some use cases can still rely on these AI functions responsibly. “Azure Cognitive Services customers can now take advantage of the open-source Fairlearn package and Microsoft’s Fairness Dashboard to measure the fairness of Microsoft’s facial verification algorithms on their own data — allowing them to identify and address potential fairness issues that could affect different demographic groups before they deploy their technology.”
Bird also said technical issues played a role in some of the inaccuracies. “In working with customers using our Face service, we also realized some errors that were originally attributed to fairness issues were caused by poor image quality. If the image someone submits is too dark or blurry, the model may not be able to match it correctly. We acknowledge that this poor image quality can be unfairly concentrated among demographic groups.”
Among demographic groups? Isn’t that everyone, given that everyone belongs to some demographic group? That sounds like a coy way of saying that non-whites can have poor match functionality. This is why law enforcement’s use of these tools is so problematic. A key question for IT to ask: What are the consequences if the software is wrong? Is the software one of 50 tools being used, or is it being relied upon solely?
Microsoft said it's working to fix that issue with a new tool. “That is why Microsoft is offering customers a new Recognition Quality API that flags problems with lighting, blur, occlusions, or head angle in images submitted for facial verification,” Bird said. “Microsoft also offers a reference app that provides real-time suggestions to help users capture higher-quality images that are more likely to yield accurate results.”
In a New York Times interview, Crampton pointed to another issue was with “the system’s so-called gender classifier was binary ‘and that’s not consistent with our values.’”
In short, she’s saying while the system not only thinks in terms of just male and female, it couldn’t easily label people who identified in other gender ways. In this case, Microsoft simply opted to stop trying to guess gender, which is likely the right call.