RoundSparrow

joined 1 year ago
[–] RoundSparrow@lemmy.ml 9 points 1 year ago* (last edited 1 year ago) (10 children)

I already feel like I have to keep sticking my neck out to get them to question if using the ORM and a dozen JOIN statements isn't a problem.... but I guess I'll link it: https://github.com/LemmyNet/lemmy/pull/3900

As stated on my Lemmy user profile, I'm "RocketDerp" on GitHiub.

Honestly, the reason I keep making noise is because I'm sick of Lemmy crashing all the time when I come to use it... and I am on many servers that this happens. I really am not trying to piss off the developers, I even said I felt like I am being hazed, and I feel like hazing in general might explain what is going on with how much they are avoiding the elephant in the ROOM that ORM and a dozen JOIN might be the cause! Let alone the lack of Redis or Memcached addition being avoided, that's a second elephant on the second floor tap-dancing.... GitHub Issue 2910 was the straw that broke my back weeks ago, it took months for them to address it when it could be fixed in a couple hours (and it was weeks before the Reddti API deadline at the end of June.... and issue 2910 was neglected). The whole thing was a nightmare for me to watch...

[–] RoundSparrow@lemmy.ml 7 points 1 year ago* (last edited 1 year ago) (1 children)

First optimization is to not fetch every field and prune it down. For example, it gets public key and private key for every community and user account - then does nothing with them. That's just pushing data between Rust and PostgreSQL for no reason. That kind of thing is pretty obvious.. the huge number of things listed after "SELECT".

The whole approach is what I recently described as: make a JOIN fusion implosion bomb, then wait for null columns to fall out

There are short-term and long-term solutions. Right now there is already a new feature that will add one more JOIN that is pending merge.... "instance blocking" by each single user.

Based on the server overloads and resulting crashes, I think some obvious solutions would be to remove post_aggregates table entirely and just throw more columns on the post table... I've seen people do stuff like that. But really you have to have a concept of core foundation.

To me the core foundation of Lemmy data is that people want fresh meat, when world events get into a frenzy, they want to F5 and get the LATEST post and the LATEST comments. Data should have a big wall between the most recent 5 days and everything else. It's the heart of the beast of human events and a platform like this.

From that perspective, that fresh posts and fresh comments mean everything, you can optimize by just doing a INNER SELECT before any JOIN... or partition the database TABLE into recent and non-recent, or some out-of-band steps to prepare recent data before this SELECT even comes up from an API call... and not let PostgreSQL do so much heavy lifting each page refresh.

[–] RoundSparrow@lemmy.ml 10 points 1 year ago (25 children)

I've largely given up on pull requests.... for sake of sanity. But I waded back in...

I made a pull request today... and I very strategically choose to do it with minimal of features so that it would just go through... and I got lectured that JOIN is never a concern and that filtering based on the core function of the site (presenting fresh meat to readers) was a bad use of the database. I've never seen hazing on a project like this. Memcached and Redis should be discussed every day as "why are we not doing what every website does?", but mum is the word.

[–] RoundSparrow@lemmy.ml 0 points 1 year ago* (last edited 1 year ago)

Community stuff can work well if done right. For example you don’t see Debian repositories constantly crashing.

I don't follow your comment, are you suggesting I said something negative about open source project communities? I was talking about the Lemmy social media communities who actually comment and fund the 64-core server upgrades without asking why the site crashes with only 57K users.... the people who comment and post on Lemmy.... not the "open source" programmer community, but the social media community of Lemmy.

[–] RoundSparrow@lemmy.ml 2 points 1 year ago* (last edited 1 year ago) (3 children)

If anyone bothered to actually look at the SQL SELECT that Lemmy uses to list posts every time you hit refresh it would be blindingly obvious how convoluted it is. yet the community does not talk about the programming issues and instead keeps raising money for 64 core hardware upgrades without recognizing just how tiny Lemmy's database really is and how 57K users is not a large number at all!

your original one, friend. I wouldn’t have argued this point if you had started here.

I mentioned "ORM" right in my first comment.

SELECT 
   "post"."id" AS post_id, "post"."name" AS post_title,
   -- "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."thumbnail_url",
   -- "post"."ap_id", "post"."local", "post"."embed_video_url", "post"."language_id", "post"."featured_community", "post"."featured_local",
     "person"."id" AS p_id, "person"."name",
     -- "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated",
     -- "person"."actor_id", "person"."bio", "person"."local", "person"."private_key", "person"."public_key", "person"."last_refreshed_at", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin",
     -- "person"."bot_account", "person"."ban_expires",
     "person"."instance_id" AS p_inst,
   "community"."id" AS c_id, "community"."name" AS community_name,
   -- "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted",
   -- "community"."nsfw", "community"."actor_id", "community"."local", "community"."private_key", "community"."public_key", "community"."last_refreshed_at", "community"."icon", "community"."banner",
   -- "community"."followers_url", "community"."inbox_url", "community"."shared_inbox_url", "community"."hidden", "community"."posting_restricted_to_mods",
   "community"."instance_id" AS c_inst,
   -- "community"."moderators_url", "community"."featured_url",
     ("community_person_ban"."id" IS NOT NULL) AS ban,
   -- "post_aggregates"."id", "post_aggregates"."post_id", "post_aggregates"."comments", "post_aggregates"."score", "post_aggregates"."upvotes", "post_aggregates"."downvotes", "post_aggregates"."published",
   -- "post_aggregates"."newest_comment_time_necro", "post_aggregates"."newest_comment_time", "post_aggregates"."featured_community", "post_aggregates"."featured_local",
   --"post_aggregates"."hot_rank", "post_aggregates"."hot_rank_active", "post_aggregates"."community_id", "post_aggregates"."creator_id", "post_aggregates"."controversy_rank",
   --  "community_follower"."pending",
   ("post_saved"."id" IS NOT NULL) AS save,
   ("post_read"."id" IS NOT NULL) AS read,
   ("person_block"."id" IS NOT NULL) as block,
   "post_like"."score",
   coalesce(("post_aggregates"."comments" - "person_post_aggregates"."read_comments"), "post_aggregates"."comments") AS unread

FROM (
   ((((((((((
   (
	   (
	   "post_aggregates" 
	   INNER JOIN "person" ON ("post_aggregates"."creator_id" = "person"."id")
	   )
   INNER JOIN "community" ON ("post_aggregates"."community_id" = "community"."id")
   )
   LEFT OUTER JOIN "community_person_ban"
       ON (("post_aggregates"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post_aggregates"."creator_id"))
   )
   INNER JOIN "post" ON ("post_aggregates"."post_id" = "post"."id")
   )
   LEFT OUTER JOIN "community_follower" ON (("post_aggregates"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_moderator" ON (("post"."community_id" = "community_moderator"."community_id") AND ("community_moderator"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_saved" ON (("post_aggregates"."post_id" = "post_saved"."post_id") AND ("post_saved"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_read" ON (("post_aggregates"."post_id" = "post_read"."post_id") AND ("post_read"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_block" ON (("post_aggregates"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_like" ON (("post_aggregates"."post_id" = "post_like"."post_id") AND ("post_like"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_post_aggregates" ON (("post_aggregates"."post_id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_block" ON (("post_aggregates"."community_id" = "community_block"."community_id") AND ("community_block"."person_id" = 3)))
   LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = 3))
   )
WHERE (((((((
  ((("community"."deleted" = false) AND ("post"."deleted" = false)) AND ("community"."removed" = false))
  AND ("post"."removed" = false)) AND ("post_aggregates"."creator_id" = 3)) AND ("post"."nsfw" = false))
  AND ("community"."nsfw" = false)) AND ("local_user_language"."language_id" IS NOT NULL))
  AND ("community_block"."person_id" IS NULL))
  AND ("person_block"."person_id" IS NULL))
ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."published" DESC
LIMIT 10
OFFSET 0
;
[–] RoundSparrow@lemmy.ml 11 points 1 year ago* (last edited 1 year ago) (6 children)

the people who run Lemmy don’t have the money to support a fleet of failover servers that take over when the main server goes offline.

That has nothing to do with the issue I'm talking about. Every server with the amount of data in them would fail. Doesn't matter if you had 100 servers on standby.

The Rust logic for database access and PostgreSQL logic in lemmy is unoptimized and there is a serious lack of Diesel programming skills. site_aggregates table had a mistake where 1500 rows were updated for every single new comment and post - and it only got noticed when lemmy.ca was crashing so hard they made a complete copy of the data and studied what was gong on.

Throwing hardware at it, as you describe, has been the other thing... massive numbers of CPU cores. What's needed is to learn what Reddit did before 2010 with PostgreSQL.... as Reddit also used PostgreSQL (and is open source).

That’s basically the only reason you don’t see lots of downtime from major corporations: investment in redundancy,

Downtime because you avoid using Redis or Memcached caching at all costs in your project isn't common to see in major corporations. But Lemmy avoids caching any data from PostgreSQL at all costs. Been that way for several years. May 17, 2010: "Lesson 5: Memcache;"

As I said in my very first comment, server crashing as a way to scale is a very interesting approach.

EDIT: Freudian slip, "memecached" instead of Memcached

[–] RoundSparrow@lemmy.ml 3 points 1 year ago (10 children)

It’s not

It's really odd how many people around here think the server crashes are perfectly normal and are glad to see newcomers driven away.

[–] RoundSparrow@lemmy.ml 2 points 1 year ago

Only way to solve this (imho) is to reinstall Lemmy BUT use another subdomain.

I wold agree that this is worth considering as an approach to not clash identity and get into custom SQL or Rust programming. But there isn't even really a procedure in place to decommission the old lemmy entity... so another damned if you do, damned if you don't in 0.18.4 era.

I'm a little surprised that the federation private key/public key signing doesn't get upset about all new keys appearing on the same domain name. I've tried to get details of exactly how a server joins the Lemmy network and gets discovered over on !lemmyfederation@lemmy.ml but haven't gotten any actually discussion on the details.

What do you think? Will this work?

I've seen people nuke and start-over their database from empty several times while having problems setting up NGinx and Docker... or whatever part.

I'm glancing at the list of SEQUENCE in Lemmy....

CREATE SEQUENCE public.admin_purge_comment_id_seq
CREATE SEQUENCE public.admin_purge_community_id_seq
CREATE SEQUENCE public.admin_purge_person_id_seq
CREATE SEQUENCE public.admin_purge_post_id_seq
CREATE SEQUENCE public.captcha_answer_id_seq
CREATE SEQUENCE public.comment_aggregates_id_seq
CREATE SEQUENCE public.comment_id_seq
CREATE SEQUENCE public.comment_like_id_seq
CREATE SEQUENCE public.comment_reply_id_seq
CREATE SEQUENCE public.comment_report_id_seq
CREATE SEQUENCE public.comment_saved_id_seq
CREATE SEQUENCE public.community_aggregates_id_seq
CREATE SEQUENCE public.community_block_id_seq
CREATE SEQUENCE public.community_follower_id_seq
CREATE SEQUENCE public.community_id_seq
CREATE SEQUENCE public.community_language_id_seq
CREATE SEQUENCE public.community_moderator_id_seq
CREATE SEQUENCE public.community_person_ban_id_seq
CREATE SEQUENCE public.custom_emoji_id_seq
CREATE SEQUENCE public.custom_emoji_keyword_id_seq
CREATE SEQUENCE public.email_verification_id_seq
CREATE SEQUENCE public.federation_allowlist_id_seq
CREATE SEQUENCE public.federation_blocklist_id_seq
CREATE SEQUENCE public.instance_id_seq
CREATE SEQUENCE public.language_id_seq
CREATE SEQUENCE public.local_site_id_seq
CREATE SEQUENCE public.local_site_rate_limit_id_seq
CREATE SEQUENCE public.local_user_id_seq
CREATE SEQUENCE public.local_user_language_id_seq
CREATE SEQUENCE public.mod_add_community_id_seq
CREATE SEQUENCE public.mod_add_id_seq
CREATE SEQUENCE public.mod_ban_from_community_id_seq
CREATE SEQUENCE public.mod_ban_id_seq
CREATE SEQUENCE public.mod_hide_community_id_seq
CREATE SEQUENCE public.mod_lock_post_id_seq
CREATE SEQUENCE public.mod_remove_comment_id_seq
CREATE SEQUENCE public.mod_remove_community_id_seq
CREATE SEQUENCE public.mod_remove_post_id_seq
CREATE SEQUENCE public.mod_sticky_post_id_seq
CREATE SEQUENCE public.mod_transfer_community_id_seq
CREATE SEQUENCE public.password_reset_request_id_seq
CREATE SEQUENCE public.person_aggregates_id_seq
CREATE SEQUENCE public.person_ban_id_seq
CREATE SEQUENCE public.person_block_id_seq
CREATE SEQUENCE public.person_follower_id_seq
CREATE SEQUENCE public.person_id_seq
CREATE SEQUENCE public.person_mention_id_seq
CREATE SEQUENCE public.person_post_aggregates_id_seq
CREATE SEQUENCE public.post_aggregates_id_seq
CREATE SEQUENCE public.post_id_seq
CREATE SEQUENCE public.post_like_id_seq
CREATE SEQUENCE public.post_read_id_seq
CREATE SEQUENCE public.post_report_id_seq
CREATE SEQUENCE public.post_saved_id_seq
CREATE SEQUENCE public.private_message_id_seq
CREATE SEQUENCE public.private_message_report_id_seq
CREATE SEQUENCE public.received_activity_id_seq
CREATE SEQUENCE public.registration_application_id_seq
CREATE SEQUENCE public.secret_id_seq
CREATE SEQUENCE public.sent_activity_id_seq
CREATE SEQUENCE public.site_aggregates_id_seq
CREATE SEQUENCE public.site_id_seq
CREATE SEQUENCE public.site_language_id_seq
CREATE SEQUENCE public.tagline_id_seq
CREATE SEQUENCE utils.deps_saved_ddl_id_seq

[–] RoundSparrow@lemmy.ml 13 points 1 year ago* (last edited 1 year ago) (40 children)

Let the servers keep crashing, tell everyone to add new instances to help with performance, which puts 1500 rows into the database tables that used to have 50 rows and invokes a massive federation 1-vote-1-https overhead... causing more crashing... all the while ignoring the SQL design of machine-generated ORM statements and counting logic hidden in the background triggers.

... keep users off your sever as a method of scaling by crashing. It's one of the more interesting experiences I've had this year! And I spent all of February and March with the release of GPT-4... which was also interesting!

[–] RoundSparrow@lemmy.ml 19 points 1 year ago (2 children)

That feature you linked to is to flair users.... there is a different issue to flair posts: https://github.com/LemmyNet/lemmy/issues/317

[–] RoundSparrow@lemmy.ml 3 points 1 year ago (2 children)

It is complicated. It's surely a damned-if-do damned-if-don't situation. It doesn't sound like you had all that much in terms of local users, communities, posts, comments - so at least that's in your favor.

[–] RoundSparrow@lemmy.ml 2 points 1 year ago

I haven't looked around at alternatives.

Lemmy has a lot of front-end app development going on and I think that's one of the big strengths. The API can be bloated with a lot of duplicate data in JSON responses but it is usable.

 

I need some help here trying to add a second logging subscriber for a specific target to the Lemmy server Rust code.

Here is the default logging in the app: https://github.com/LemmyNet/lemmy/blob/main/src/lib.rs

Step 1 ==============
I know I have to add another library to log to a file.
cargo add tracing-appender

Step 2 ===============
I know I have to specify how I want the files to work, I found this pile of code:

  let env_filter = EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new("info"));
  let formatting_layer = fmt::layer().pretty().with_writer(std::io::stderr);
  let log_file_name = Local::now().format("%Y-%m-%d").to_string() + "-apub.log";
  let file_appender = rolling::daily("/home/lemmy/logs", log_file_name);
  let (non_blocking_appender, _guard) = non_blocking(file_appender);
  let file_layer = fmt::layer()
      .with_ansi(false)
      .with_writer(non_blocking_appender);

  Registry::default()
      .with(env_filter)
      .with(ErrorLayer::default())
      .with(formatting_layer)
      .with(file_layer)
      .init();

Now this isn't right, because it registers itself as "default", and I want it to be a Target - and I still want the normal Lemmy logging behavior to exist.

I want the macros to work like:

warn!("this is how normal Lemmy server log entries are created in the current code");  
warn!(target: "apubfile", "this logging entry only goes to the apub file logging using tracing-appender);

Can someone work this out? How to have two subscribers, not just the single default, and how to specify the target: string on the subscriber?

Thank you.

EDIT: ok, I found an example of how to have two logs at the same time, one to file and one to console: https://stackoverflow.com/questions/76042603/how-to-unify-the-time-in-the-console-and-the-file-when-using-tracing-appender -- I still need to figure out how to get this into Lemmy's structure and attach to a "target".

 

I don't know Rust, but trying to hack on Lemmy 0.18.1 enough to get a better error message out.

error: data did not match any variant of untagged enum AnnouncableActivities

where: crates/apub/src/activities/community/announce.rs, line: 46

https://github.com/LemmyNet/lemmy/blob/0c82f4e66065b5772fede010a879d327135dbb1e/crates/apub/src/activities/community/announce.rs#L46

That seems to be the function parameters themselves?

Is the error caused by RawAnnouncableActivities not matching the enum AnnouncableActivities and the try_into?

  warn!("zebratrace receive {:?}", self);

Works for adding logging, but I'd like the code to log self only when the enum does not match (errors). Thank you.

 

Here is from 5 days ago:

https://lemmy.ml/post/1494755

And I did a post just now in test to the same news link:

https://lemmy.ml/post/1607677

Spot checking recent ones, they all seem to no longer fetch the summary preview? Although lemmy.ml has been running 0.18.0 more than 5 days, so some other change?

 

Ok, so I have some code to crawl a posting of a community and compare two servers for comments missing. It looks bad today. Both of these servers are version 0.18.0 and have been upgraded for several days.

missing 0 unequal 0 11 on https://lemmy.ml/ vs. 11 on https://sh.itjust.works/
missing 35 unequal 1 48 on https://lemmy.ml/ vs. 14 on https://sh.itjust.works/
missing 4 unequal 0 9 on https://lemmy.ml/ vs. 5 on https://sh.itjust.works/
missing 6 unequal 0 9 on https://lemmy.ml/ vs. 3 on https://sh.itjust.works/
missing 1 unequal 0 1 on https://lemmy.ml/ vs. 0 on https://sh.itjust.works/
missing 6 unequal 0 12 on https://lemmy.ml/ vs. 6 on https://sh.itjust.works/
missing 3 unequal 0 8 on https://lemmy.ml/ vs. 5 on https://sh.itjust.works/
missing 3 unequal 0 6 on https://lemmy.ml/ vs. 4 on https://sh.itjust.works/
missing 22 unequal 0 42 on https://lemmy.ml/ vs. 20 on https://sh.itjust.works/
missing 5 unequal 0 15 on https://lemmy.ml/ vs. 10 on https://sh.itjust.works/
missing 8 unequal 2 17 on https://lemmy.ml/ vs. 9 on https://sh.itjust.works/
missing 3 unequal 0 3 on https://lemmy.ml/ vs. 0 on https://sh.itjust.works/
missing 0 unequal 0 10 on https://lemmy.ml/ vs. 10 on https://sh.itjust.works/
missing 11 unequal 0 24 on https://lemmy.ml/ vs. 13 on https://sh.itjust.works/
missing 1 unequal 0 2 on https://lemmy.ml/ vs. 1 on https://sh.itjust.works/
missing 13 unequal 0 37 on https://lemmy.ml/ vs. 24 on https://sh.itjust.works/
missing 3 unequal 0 7 on https://lemmy.ml/ vs. 4 on https://sh.itjust.works/
missing 0 unequal 0 10 on https://lemmy.ml/ vs. 10 on https://sh.itjust.works/
missing 60 unequal 2 186 on https://lemmy.ml/ vs. 126 on https://sh.itjust.works/
missing 10 unequal 2 51 on https://lemmy.ml/ vs. 41 on https://sh.itjust.works/
missing 16 unequal 0 51 on https://lemmy.ml/ vs. 36 on https://sh.itjust.works/
missing 31 unequal 3 128 on https://lemmy.ml/ vs. 97 on https://sh.itjust.works/
missing 0 unequal 0 4 on https://lemmy.ml/ vs. 4 on https://sh.itjust.works/
missing 2 unequal 0 5 on https://lemmy.ml/ vs. 3 on https://sh.itjust.works/
missing 15 unequal 1 67 on https://lemmy.ml/ vs. 52 on https://sh.itjust.works/
missing 4 unequal 0 53 on https://lemmy.ml/ vs. 49 on https://sh.itjust.works/
missing 0 unequal 0 5 on https://lemmy.ml/ vs. 5 on https://sh.itjust.works/
missing 0 unequal 0 0 on https://lemmy.ml/ vs. 0 on https://sh.itjust.works/
missing 1 unequal 0 19 on https://lemmy.ml/ vs. 18 on https://sh.itjust.works/
missing 0 unequal 0 2 on https://lemmy.ml/ vs. 2 on https://sh.itjust.works/
missing 0 unequal 0 22 on https://lemmy.ml/ vs. 22 on https://sh.itjust.works/
missing 0 unequal 0 16 on https://lemmy.ml/ vs. 18 on https://sh.itjust.works/
missing 0 unequal 0 7 on https://lemmy.ml/ vs. 7 on https://sh.itjust.works/
missing 3 unequal 0 27 on https://lemmy.ml/ vs. 24 on https://sh.itjust.works/
missing 2 unequal 0 32 on https://lemmy.ml/ vs. 30 on https://sh.itjust.works/
missing 3 unequal 0 21 on https://lemmy.ml/ vs. 18 on https://sh.itjust.works/
missing 3 unequal 1 16 on https://lemmy.ml/ vs. 13 on https://sh.itjust.works/
missing 3 unequal 1 47 on https://lemmy.ml/ vs. 44 on https://sh.itjust.works/
missing 1 unequal 0 24 on https://lemmy.ml/ vs. 23 on https://sh.itjust.works/

The number of comments is based on loading comments, not the counts at the top of the posting.

 

Lemmy.ml front page has been full of nginx errors, 500, 502, etc. And 404 errors coming from Lemmy.

Every new Lemmy install begins with no votes, comments, postings, users to test against. So the problems related to performance, scaling, error handling, stability under user load can not easily be matched given that we can not download the established content of communities.

Either the developers have an attitude that the logs are of low quality and not useful for identifying problems in the code and design, or the importance of getting these logs in front of the technical community and trying to identify the underlying patterns of faults is being given too low of a priority.

It's also important to make each log of failures identifiable to where in the code this specific timeout, crash, exception, resource limit is encountered. Users and operations personnel reporting generic messages that are non-unique only slow down server operators, programmers, database experts, etc.

There are also a number of problems testing federation given the nature of multiple servers involved and trying not to bring down servers in front of end-users. It's absolutely critical that failures for servers to federate data be taken seriously and attempts to enhance logging activities and triangulate causes of why peer instances have missing data be track down to protocol design issues, code failures, network failures, etc. Major Lemmy sites doing large amounts of data replication are an extremely valuable source of data about errors and performance. Please, for the love of god, share these logs and let us look for the underlying causes in hard to reproduce crashes and failures!

I really hope internal logging and details of the inner workings of the biggest Lemmy instances is shared more openly with more eyes on how to keep scaling the applications as the number of posts, messages, likes and votes continue to grow each and every day. Thank you.

Three recently created communities: !lemmyperformance@lemmy.ml -- !lemmyfederation@lemmy.ml -- !lemmycode@lemmy.ml

 

( I didn't cross-post, as I encourage comments to go all on one posting )

 

Even though 0.18 is installed on Lemmy.ml - the code is failing internally. And without access to lemmy.ml's server logs, I'm trying to diagnose the whole design from a remote instance.

Lemmy.world, Beehaw, Lemmy.ml are all throwing "fast nginix 500" errors on their front door on a regular basis. And all are showing symptoms of replication failures sending messages and content to each other (missing posts and comments).

Even AFTER lemmy.ml was upgraded to 0.18, I was able to get stuck 'pending' subscribes on both my own personal remote instance and over at Lemmy.world:

I've been making a fool of myself as best as I can in hope somebody will step back and actually share their lemmy application error logs on where the faulty points are within the code. I highly suspect that PostgreSQL is timing out or http federation timeouts are happening within the Rust code.

 

INSERT INTO "comment_like" ("person_id", "comment_id", "post_id", "score") VALUES ($1, $2, $3, $4) ON CONFLICT ("comment_id", "person_id") DO UPDATE SET "person_id" = $5, "comment_id" = $6, "post_id" = $7, "score" = $8 RETURNING "comment_like"."id", "comment_like"."person_id", "comment_like"."comment_id", "comment_like"."post_id", "comment_like"."score", "comment_like"."published"

~~The server is showing relatively high execution time for this INSERT statement, like 0.4 seconds mean time. Is this form of blended INSERT with UPDATE and RETURNING slower than doing a direct insert?~~ (was misreading data, these are milliseconds, not seconds)

Every time a remote federation Upvote on a comment comes in to Lemmy, it executes this statement.

 

A posting on the Instance-specific issues/observations about the upgrade: https://lemmy.ml/post/1444409

KNOWN BUGS

  1. Searching site-wide for "0.18" generates an error. This was working fine in 0.17.4 before Lemmy.ml upgraded: https://lemmy.ml/search?q=0.18&type=All&listingType=All&page=1&sort=TopAll
 

cross-posted from: https://popplesburger.hilciferous.nl/post/9969

After setting up my own Lemmy server, I've been intrigued by the server logs. I was surprised to see some search engines already start to crawl my instances despite it having very little content.

I've noticed that most requests seem to come in from IPv4 addresses, despite my server having both an IPv4 and an IPv6 address. This made me wonder.

IPv4 addresses are getting more scarce by the day and large parts of the world have to share an IPv4 address to get access to older websites. This often leads to unintended fallout, such as thousands of people getting blocked by an IP ban from a site admin that doesn't know any better, as well as anti-DDoS providers throwing up annoying CAPTCHA pages because of bad traffic coming from the shared IP address. Furthermore, hosting a Lemmy server of your own is impossible behind a shared IP address, so IPv6 is the only option.

IPv6 is the clear way forward. However, many people haven't configured IPv6 for their hosts. People running their own Lemmy instances behind an IPv6 address won't be able to federate with those servers, and that's a real shame.

Looking into it

So, I whipped up this quick Python script:

import requests
import sys
import socket
from progress.bar import Bar

lemmy_host = sys.argv[1]

site_request = requests.get(f"https://{lemmy_host}/api/v3/site").json()

hosts = site_request['federated_instances']['linked']

ipv4_only = []
ipv6_only = []
both = []
error = []

with Bar('Looking up hosts', max=len(hosts)) as bar:
    for host in hosts:
        host = host.strip()

        try:
            dns = socket.getaddrinfo(host, 443)
        except socket.gaierror:
            error.append(host)

        has_ipv4 = False
        has_ipv6 = False
        for entry in dns:
            (family, _, _, _, _) = entry

            if family == socket.AddressFamily.AF_INET:
                has_ipv4 = True
            elif family == socket.AddressFamily.AF_INET6:
                has_ipv6 = True

        if has_ipv4 and has_ipv6:
            both.append(host)
        elif has_ipv4:
            ipv4_only.append(host)
        elif has_ipv6:
            ipv6_only.append(host)
        else:
            error.append(host)
        
        bar.message = f"Looking up hosts (B:{len(both)} 4:{len(ipv4_only)} 6:{len(ipv6_only)} E:{len(error)})"
        bar.next()

print(f"Found {len(both)} hosts with both protocols, {len(ipv6_only)} hosts with IPv6 only, and {len(ipv4_only)} outdated hosts, failed to look up {len(error)} hosts")

This script fetches the instances a particular Lemmy server federates with (ignoring the blocked hosts) and then looks all of them up through DNS. It shows you the IPv4/IPv6 capabilities of the servers federating with your server.

I've run the script against a few popular servers and the results are in:

Results

Server IPv6 + IPv4 IPv6 only IPv4 Error Total
Lemmy.ml 1340 3 1903 215 3461
Beehaw.org 807 0 1105 74 1986
My server 202 0 312 4 518

A bar chart of the table above

A pie chart of the results for Lemmy.nl

A pie chart for the results for Beehaw.org

A pie chart for the results for my server

It seems that over half (55%+) the servers on the Fediverse aren't reachable over IPv6!

I'm running my own server, what can I do?

Chances are you've already got an IPv6 address on your server. All you need to do is find out what it is (ip address show in Linux), add an AAAA record in your DNS entries, and enable IPv6 in your web server of choice (i.e. listen [::]:443 in Nginx). Those running a firewall may need to allow traffic through IPv6 as well, but many modern firewalls treat whitelist entries the same these days.

Some of you may be running servers on networks that haven't bothered implementing IPv6 yet. There are still ways to get IPv6 working!

Getting IPv6 through Tunnelbroker

If you've got a publicly reachable IPv4 address that can be pinged from outside, you can use Hurricane Electric's Tunnelbroker to get an IPv6 range, free of charge! You get up to five tunnels per account (each tunnel with a full /64 network) and a routed /48 network for larger installations, giving you up to 65k subnets to play with!

There are lots of guides out there, some for PfSense, some for Linux, some for Windows; there's probably one for your OS of choice.

Getting IPv6 behind CGNAT

Getting an IPv6 network through a tunnelbroker service behind CGNAT is (almost) impossible. Many ISPs that employ CGNAT already provide their customers with IPv6 networks, but some of them are particularly cheap, especially consumer ISPs.

It's still possible to get IPv6 into your network through a VPN, but for serving content you'll need a server with IPv6 access. You can get a free cloud server from various cloud providers to get started. An easy way forward may be to host your server in the cloud, but if you've got a powerful server at home, you can just use the free server for its networking capabilities.

Free servers are available from all kinds of providers, such as Amazon(free for a year), Azure(free for a year), Oracle(free without time limit). Alternatively, a dedicated VPS with IPv6 capabilities can be as cheap as $4-5 per month if you shop around.

You can install a VPN server on your cloud instance, like Wireguard, and that will allow you to use the cloud IPv6 address at home. Configure the VPN to assign an IPv6 address and to forward traffic, and you've got yourself an IPv6 capable server already!

There are guides online about how to set up such a system. This gist will give you the short version.

Final notes

It should be noted that this is a simple analysis based on server counts alone. Most people flock to only a few servers, so most Lemmy users should be able to access IPv6 servers. However, in terms of self hosting, these things can matter!

 

I encourage all instance owner/operators to run the query mentioned in the issue and see how many of these 'pending' they have on their server. (FYI, I am RocketDerp on GitHub)

view more: ‹ prev next ›