42
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 06 Sep 2024
42 points (85.0% liked)
Fediverse
27801 readers
217 users here now
A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).
If you wanted to get help with moderating your own community then head over to !moderators@lemmy.world!
Rules
- Posts must be on topic.
- Be respectful of others.
- Cite the sources used for graphs and other statistics.
- Follow the general Lemmy.world rules.
Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy
founded 1 year ago
MODERATORS
Link detection is flaky as hell, especially for special characters. They rarely work reliably. URLs themselves don't contain unicode. They use basic ASCII and anything beyond that needs to be encoded in some form. The link you posted isn't a spec-compliant link, it only works because Lemmy apps and browsers are nice and do the conversion to the real URL for you. According to the spec:
If you use usernames as identifiers (which, again, are optional) like Lemmy does, databases and external entities will use the percentage URLs, not the readable ones. Unicode domains will have their xn-- form stored as well. It's up to apps and browsers to decide those and turn them back into unicode. It's not really relevant what apps and browsers show you when it comes to the technical interoperability of users.
ActivityPub itself has wide support for various languages, including having different names and content for different languages. The username (actually preferredUsername) is transmitted through JSON, which is by definition UTF-8, so most encodings in use today (not that weird Japanese one and that other Asian encoding that's not UTF compatible) will Just Work™ assuming the necessary URL encoding and decoding logic is added in the right places.
I think Lemmy can be patched to accept unicode characters as usernames, as the current limitations in code and in the UI are just choices made during development. I don't think it'll add much, though.