this post was submitted on 22 Jul 2023
2315 points (98.8% liked)
Lemmy.World Announcements
29079 readers
252 users here now
This Community is intended for posts about the Lemmy.world server by the admins.
Follow us for server news π
Outages π₯
https://status.lemmy.world/
For support with issues at Lemmy.world, go to the Lemmy.world Support community.
Support e-mail
Any support requests are best sent to info@lemmy.world e-mail.
Report contact
- DM https://lemmy.world/u/lwreport
- Email report@lemmy.world (PGP Supported)
Donations π
If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.
If you can, please use / switch to Ko-Fi, it has the lowest fees for us
Join the team
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Yes it is. Suddenly your database exists in more than one location, which is extremely difficult to do with reasonable performance.
Going from 3 to 100 is trivial. Going from one to any number greater than one is the challenge.
Define "horrible"?
When Lemmy, or any server side software is running on a single server, you generally upgrade the hardware before moving to multiple servers (because upgrading is cheaper). When that stops working, and you need to move to another server, it's possible everything in the database that matters (possibly the entire database) will be in L4 cache in the CPU - not even in RAM a lot of it will be in the CPU.
When you move to multiple servers, suddenly a lot of frequent database operations are on another server, which you can only reach over a network connection. Even the fastest network connection is dog slow compared to L4 cache and it doesn't really matter how well written your code is, if you haven't done extensive testing in production with real world users (and actively malicious bots) placing your systems under high load, you will have to make substantial changes to deal with a database that is suddenly hundreds of millions of times slower.
The database might still be able to handle the same number of queries per second, but each individual query will take a lot longer, which will have unpredictable results.
The other problem is you need to make sure all of your servers have the same content. Being part of the Fediverse though, Lemmy probably already has a pretty good architecture for that.
Friend...you have zero idea what you're talking about. Database existing in multiple locations? What in the hell are you even talking about? Single db instance, multiple app servers, and single LB. You are absolutely not experienced with this type of work, and need to just stop because you're making an ass out of yourself with these wild ideas that have no basis in practical deployments. Stop embarrassing yourself.
What if your application has to know a state? Say for certain write requests, only one instance is allowed to process those as it needs a cache that it can somewhat consistently rely on?
(Granted, I wouldn't know why something like Lemmy needs that. But we had that problem at work, and it was a pain to solve while also supporting multiple app instances.)
In that case, I'd use a message queue. Rabbitmq, or I use Pulsar at work - multiple subscribers (using the same subscription name) to one queue of messages that need to be processed. One worker picks it up, processes it, and marks the message as processed. The worker either passes it into a different queue for further processing, or persists it to the DB.
The nice thing with this is when using the Pulsar paradigm, you can have multiple subscriptions to the same message queue, each one carrying its own state as to which messages are processed or not. So say I get one message from an external system, have one system that is processing it right now, and need to add a second system. In that case I just use a different subscription name for the second system, and it works independently of the first with no issues.
Distributed lock of any form would work. Memcache, redis, etcd, read access mechanism in an MQ...etc. Only one process would work on whatever it as a time. Simple.