Aug 7, 2023 10:00 AM PT

Has Microsoft cut security corners once too often?

As details about the recent China attack against US government agencies come to light, two details stand out: Microsoft failed to store security keys properly — and the keys were used by attackers even though they'd already expired.

MethodShop / Microsoft

As Microsoft revealed tidbits of its post-mortem investigation into a Chinese attack against US government agencies via Microsoft, two details stand out: the company violated its own policy and did not store security keys within a Hardware Security Module (HSM) — and the keys were successfully used by attackers even though they had expired years earlier. 

This is simply the latest example of Microsoft quietly cutting corners on cybersecurity and then only telling anyone when it gets caught. 

Tenable CEO Amit Yoran wrote a powerful post on LinkedIn last week and described “a repeated pattern of negligent cybersecurity practices…. Microsoft’s lack of transparency applies to breaches, irresponsible security practices and to vulnerabilities, all of which expose their customers to risks they are deliberately kept in the dark about.”

He then referenced his own company’s dealings with Microsoft:

“In March 2023, a member of Tenable’s Research team was investigating Microsoft’s Azure platform and related services. The researcher discovered an issue (detailed here) which would enable an unauthenticated attacker to access cross-tenant applications and sensitive data, such as authentication secrets. To give you an idea of how bad this is, our team very quickly discovered authentication secrets to a bank. They were so concerned about the seriousness and the ethics of the issue that we immediately notified Microsoft. Did Microsoft quickly fix the issue that could effectively lead to the breach of multiple customers' networks and services? Of course not. They took more than 90 days to implement a partial fix – and only for new applications loaded in the service. That means that as of today, the bank I referenced above is still vulnerable, more than 120 days since we reported the issue, as are all of the other organizations that had launched the service prior to the fix.”

The Tenable example could be dismissed as an isolated incident if I hadn’t recently heard from multiple security researchers about other security holes they discovered and their talks with Microsoft about the issues. This is a troubling pattern. 

“Microsoft plays fast and loose when it comes to transparency and their responsibilities in cybersecurity. Their pace for remediation is not world class,” Yoran said in an interview. “Once they patch, they have a history of not disclosing that there ever was a hole. They have a moral responsibility to disclose.”

Back in the 1990s, a common and true adage among enterprise IT execs was the clichéd, “You can never get fired for hiring IBM.” Today, that statement is still true, if you  swap out Microsoft for IBM. 

Here’s why that is such a problem. It seems all but certain that the cybersecurity corner-cuttings that happened in the China attack were done by some mid-level manager. That manager was confident that opting for a slight cost reduction (along with a small boost in efficiency at the expense of violating Microsoft security policy) would not be a job risk. Had there been a legitimate fear of getting fired or even just having their career advancement halted, that manager would have not chosen to violate security policy.

The sad truth, though, is that the manager confidently knew that Microsoft values margin and market share far more than cybersecurity. Think of any company you believe takes cybersecurity seriously, such as RSA or Boeing. Would a manager there ever dare to openly violate cybersecurity rules? 

If this is all true, why don’t enterprises take their business elsewhere? This brings us back to the “you can’t get fired for hiring Microsoft” adage. If your enterprise uses the Microsoft cloud — or, for that matter, cloud services at Google or Amazon — and there’s a cybersecurity disaster, chances are excellent senior management will blame Microsoft. Had you chosen a smaller company that takes security more seriously — and that company screwed up — there is a good chance you would be blamed for having taken a chance. 

Chris Krebs, former director of the US Cybersecurity and Infrastructure Security Agency (CISA) and now cofounder of Krebs Stamos Group, puts this attack into a broader global context. Krebs said China government attackers were not looking at Microsoft as a software vendor as much as the owner of one of the top three cloud platforms. They see those hyperscale cloud providers as an easy way to access data from a massive number of companies.

And cloud architectures “are insanely complex. You think you know how the cloud works? You don’t,” Krebs said in an interview. But he argued the cloud is a game-changing for cybersecurity for a simple reason: “What is so different is that the cloud is effectively the first technology that the (US) government has not been able to roll out itself,” he said. “They are entirely dependent on the private sector.”

China knows that only too well.

Let’s look at what happened with Microsoft and the China attack.

This is from Microsoft’s explanation:

The China attackers “acquired an inactive MSA consumer signing key and used it to forge authentication tokens for Azure AD enterprise and MSA consumer to access OWA and Outlook.com. All MSA keys active prior to the incident — including the actor-acquired MSA signing key — have been invalidated. Azure AD keys were not impacted. Though the key was intended only for MSA accounts, a validation issue allowed this key to be trusted for signing Azure AD tokens. The actor was able to obtain new access tokens by presenting one previously issued from this API due to a design flaw. This flaw in the GetAccessTokenForResourceAPI has since been fixed to only accept tokens issued from Azure AD or MSA respectively. The actor used these tokens to retrieve mail messages from the OWA API.”

How did an expired key still function? Cybersecurity specialists pointed to various possibilities, including whether caching played a role. But they all agreed that Microsoft didn’t sufficiently test its own environment.

“Why would an expired driver’s license still work in a bar? It’s because they are not checking expiration dates,” said cryptography expert and Harvard lecturer Bruce Schneier. “Why do people leave their doors unlocked? People do things. Someone screwed up and someone didn’t notice.” 

Michael Oberlaender, who has been CISO for eight enterprises and served on the board of the FIDO Alliance, said it’s likely Microsoft had “automated code that is running the sites that did not validate the certificates properly. This was not tested right. If that proper signing key validation — including the scope and function of the key — is not happening in the PKI key chain hierarchy, then it’s not working as intended.”

Another security specialist, Prashanth Samudrala, vice president of products at AutoRabit, argued that the expiration date could have become irrelevant if the initial coding was not executed properly.

“During development, developers often hard code access to their systems for machine identities,” Samudrala said. “These automated processes can bypass traditional authentication requirements that break security protocols — Zero Trust mandates or otherwise. And once these scripts are written, they keep going until they are manually shut down.

“There’s no way to know for sure what happened with Microsoft’s outdated encryption key,” Samudrala said, “but this would explain how access could continue after the point of a key expiring. CISOs are becoming increasingly aware of the vulnerabilities posed by all SaaS Applications.” 

The expiration problem was not the only issue. 

“It sure sounds like the key was cached somewhere, so it wasn’t being served up — which would be an opportunity to say ‘No, that key isn’t supposed to be used anymore,’” said Phil Smith III, senior architect, product manager and distinguished technologist for Open Text Cybersecurity. “If it’s being used to decrypt data, it might still be needed —depending on the flow, this caching might have been perfectly reasonable.

“The bigger errors were mixing consumer and .gov credential processes and then allowing the .gov tokens from the old key to be accepted,” he said. “This runs into one of the common differences between consumer encryption and corporate versus gov[ernment] encryption: consumer stuff isn’t as controlled, so it’s a lot harder to say ‘You can’t use this because you left it too long.’ Just because Joe User hadn’t logged since before the key expired doesn’t mean you tell him he can’t now.”

Smith stressed that a common reaction to a key flaw such as the Microsoft one would be to increase the frequency of key rotation. He argued that such a move might be a bad idea.

Although “events like this make the case for rollover in some use cases, it’s just foolish in others — like re-encrypting huge volumes of data just because it was encrypted a while ago, when there’s no reason for the key to have had any significant risk of exposure. This is like being in a bunker during a war and deciding you should take off all your clothes and run to another bunker just because you’ve been in this one awhile: the risk you’re adding during that run/rollover is significant and not necessarily worthwhile,” Smith said.

“The point is that many standards say, ‘Roll keys every n months/years’ without regard for the risk involved.,” he said. “If the keys have been distributed to external endpoints, then sure, there needs to be a rollover strategy, because you don’t have any way to assess how careful those folks are. But this needs to be planned from the beginning:  ‘Hey, re-protect this 50TB of data by next month’ isn’t realistic. If keys have only gone to hardened, internal endpoints, risk is lower. If the encryption/decryption has only taken place remotely — say, via web services — then there’s little to no risk, since if someone compromised those servers, you’re already toast.”

Beyond the expired key that still worked, the biggest issue here is that Microsoft violated its own security rules and did not store the keys in an HSM. The most likely reason? Storing anything in an HSM is labor-intensive, costs more and can degrade  performance.

There is “a very small bit of latency drop over the network,” Samudrala said. ”Yes, (HSMs) are expensive and, yes, there is a performance degradation. When you have legacy systems, HSMs could be very, very expensive and eat into a product’s roadmap. Companies seek to use cloud-based key management services rather than HSM. Why? (HSMs) are too damn hard, take a lot of time, a lot of costs, a lot of complexity.”

The importance of Microsoft’s failure to use an HSM cannot be overstated,” said Oberlaender. “Had they stored and managed in an HSM, this whole (China) thing would not have been possible,” he said, adding that corporate communications disconnects might have played a role. “Communications often gets blurry in big enterprises, with different entities often not talking with each other.”

Whatever the reasoning and rationales, Microsoft is starting to be seen as an organization that tolerates sloppy security implementation. Although such a perception is bad for any business, it could be disastrous for Microsoft, specifically because it uses its marketing clout to scream that its environments are ultra-secure for the planet’s largest enterprises.

If Microsoft doesn’t clean up its act quickly — and hope that no more massive breaches get disclosed anytime soon — it’s contract-saving adage could be flipped on its head. Could Microsoft’s brand be to cybersecurity what Uber, Meta and TikTok are to privacy?