Microsoft apologizes “deeply” for the downfall of computers at Azure worldwide

On Tuesday, Microsoft apologized for a global outage that affected Azure’s cloud services, including Microsoft Teams, Office 365 and Dynamics 365.

“We understand the incredible incidence and unacceptability of this fact and apologize deeply,” Microsoft said in a post-incident review report on the hack, which was the result of “authentication errors” in several Microsoft cloud services. . “We continue to take steps to improve the Microsoft Azure platform and our processes to help ensure that these incidents do not occur in the future.”

Microsoft referred in the report to the changes made after a September 28, 2020 outage that affected Microsoft 365 users for five hours.

“In the September incident, we indicated our plans to apply additional protections to the SDP (Session Description Protocol) system of the Azure AD (Active Directory) service to avoid the kind of problems identified here.”

Microsoft said the first phase of the SDP changes is over and that the second phase is in a “very careful deployment” that will end by mid-year.

“The initial analysis indicates that, once fully deployed, it will avoid the type of cut that happened today, as well as the related incident in September 2020,” Microsoft said. “In the meantime, additional guarantees have been added to our key removal process, which will be maintained until the end of the second phase of the SDP deployment.”

Microsoft said Tuesday morning that “most services” affected by the global outage of Azure and computers were back online, except for Intune and Microsoft Managed Desktop.

The last update of the outage came in a 6:34 tweet from the Microsoft 365 status account.

Microsoft’s apology came after a global crash Monday that affected the Teams collaboration application, as well as other “multiple” services of Azure, Office 365 and Dynamics 365.

The problems, reported by Microsoft on Twitter as of 3:40 p.m., Eastern Time on Monday, could affect any user “worldwide,” the company said at the time.

Even with the outage, some industry executives are urging MSPs to move customers faster to the cloud following the March 2 local exchange server attack by Chinese-sponsored hackers.

This attack only affected local versions of the Exchange Server and not Exchange Online or the cloud-based Office 365 email service. Nearly 30,000 U.S. organizations and 60,000 organizations worldwide have stolen emails as a result of the breach, as they were still running local versions of Exchange.

Last week, Microsoft alerted customers to DearCry Ransomware violations as a result of the on-premises Exchange server attack. On March 12, he warned that “human-operated ransomware attacks use Microsoft Exchange vulnerabilities to exploit customers.”

Emmet Tydings, president of AB&T Telecom, based in Columbia, Md., Which provides voice and data on the Internet and failure stability for MSPs, said it is critical for partners to move customers to the cloud to avoid serious security issues like those who came with the Chinese attack. to local Exchange servers.

“MSPs need to move their customers faster to the cloud and also need to stabilize their communications infrastructure with diversity of circuits and migration,” Tydings said. “Microsoft has stressed that they are better able to provide security in the cloud than with local Exchange.”

Tydings said partners should provide strong Internet connectivity with SD-WAN and wireless error migration with carrier plans using a SIM module and a cable backup to a main fiber line.

In the event of an outage like Microsoft Teams, MSPs would have to resort to alternative communications infrastructures like Zoom or Cisco Webex, he said.

With the global pandemic leading to a more distributed workforce, local Exchange no longer makes sense for customers, according to Tydings.

“The MSPs we work with have been heroes in turning their press customers into a cloud since the pandemic hit,” he said.

Rapid cloud migration has led companies to invest in making software products faster, but not investing in making cloud services more resilient, said Portland-based StackPulse co-founder and CEO Ofer Smadari of Oregon. platform helps teams detect, respond to and resolve incidents with code-based automation.

“We seem to see results in the headlines every week as the major brands have site interruptions,” Smadari said. “Most companies continue to use traditional IT tools such as ticketing systems, service management tools or communication applications to share information and collaborate to restore service. Companies need to move from a mindset of IT management to an engineering mindset where they incorporate the resilience of their applications and business operations to take a more risk-conscious approach so that they can quickly recover from disruptions and deliver on their promise to their customers. clients “.

.Source