Technology

59378 readers

2328 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

829

CrowdStrike downtime apparently caused by update that replaced a file with 42kb of zeroes (twiiit.com)

submitted 4 months ago* (last edited 4 months ago) by Aatube@kbin.melroy.org to c/technology@lemmy.world

190 comments fedilink hide all child comments

…according to a Twitter post by the Chief Informational Security Officer of Grand Canyon Education.

So, does anyone else find it odd that the file that caused everything CrowdStrike to freak out, C-00000291-
00000000-00000032.sys was 42KB of blank/null values, while the replacement file C-00000291-00000000-
00000.033.sys was 35KB and looked like a normal, if not obfuscated sys/.conf file?

Also, apparently CrowdStrike had at least 5 hours to work on the problem between the time it was discovered and the time it was fixed.

you are viewing a single comment's thread
view the rest of the comments

[–] ChairmanMeow@programming.dev 11 points 3 months ago (1 children)

If AV suddenly stops working, it could mean the AV is compromised. A BSOD is a desirable outcome in that case. Booting a compromised system anyway is bad code.

[–] CeeBee_Eh@lemmy.world 1 points 3 months ago (1 children)

You know there's a whole other scenario where the system can simply boot the last known good config.

[–] ChairmanMeow@programming.dev 1 points 3 months ago (1 children)

And what guarantees that that "last known good config" is available, not compromised and there's no malicious actor trying to force the system to use a config that has a vulnerability?

[–] CeeBee_Eh@lemmy.world 1 points 3 months ago* (last edited 3 months ago) (1 children)

The following:

An internal backup of previous configs
Encrypted copies
Massive warnings in the system that current loaded config has failed integrity check

There's a load of other checks that could be employed. This is literally no different than securing the OS itself.

This is essentially a solved problem, but even then it's impossible to make any system 100% secure. As the person you replied to said: "this is poor code"

Edit: just to add, failure for the system to boot should NEVER be the desired outcome. Especially when the party implementing that is a 3rd party service. The people who setup these servers are expecting them to operate for things to work. Nothing is gained from a non-booting critical system and literally EVERYTHING to lose. If it's critical then it must be operational.

[–] ChairmanMeow@programming.dev 1 points 3 months ago (1 children)

The 3rd party service is AV. You do not want to boot a potentially compromised or insecure system that is unable to start its AV properly, and have it potentially access other critical systems. That's a recipe for a perhaps more local but also more painful disaster. It makes sense that a critical enterprise system does not boot if something is off. No AV means the system is a security risk and should not boot and connect to other critical/sensitive systems, period.

These sorts of errors should be alleviated through backup systems and prevented by not auto-updating these sorts of systems.

Sure, for a personal PC I would not necessarily want a BSOD, I'd prefer if it just booted and alerted the user. But for enterprise servers? Best not.

[–] CeeBee_Eh@lemmy.world 1 points 3 months ago (1 children)

Sure, for a personal PC I would not necessarily want a BSOD, I’d prefer if it just booted and alerted the user. But for enterprise servers? Best not.

You have that backwards. I work as a dev and system admin for a medium sized company. You absolutely do not want any server to ever not boot. You absolutely want to know immediately that there's an issue that needs to be addressed ASAP, but a loss of service generally means loss of revenue and, even worse, a loss of reputation. If you server is briefly at a lower protection level that's not an issue unless you're actively being targeted and attacked. But if that's the case then getting notified of an issue can get some people to deal with it immediately.

[–] ChairmanMeow@programming.dev 2 points 3 months ago

A single server not booting should not usually lead to a loss of service as you should always run some sort of redundancy.

I'm a dev for a medium-sized PSP that due to our customers does occasionally get targetted by malicious actors, including state actors. We build our services to be highly available, e.g. a server not booting would automatically do a failover to another one, and if that fails several alerts will go off so that the sysadmins can investigate.

Temporary loss of service does lead to reputational damage, but if contained most of our customers tend to be understanding. However, if a malicious actor could gain entry to our systems the damage could be incredibly severe (depending on what they manage to access of course), so much so that we prefer the service to stop rather than continue in a potentially compromised state. What's worse: service disrupted for an hour or tons of personal data leaked?

Of course, your threat model might be different and a compromised server might not lead to severe damage. But Crowdstrike/Microsoft/whatever may not know that, and thus opt for the most "secure" option, which is to stop the boot process.