this post was submitted on 09 Mar 2024
62 points (94.3% liked)
Technology
59378 readers
2328 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Sure and that's likely a good bit of work.
However, you must consider the alternative which is translating the entire text to dozens of languages and doing the same for any update done to said text. I'd assume that to be even more work by at least one order of magnitude.
Many languages are quite similar to another. An article written in the hypothetical abstract language and tuned on an abstract level to produce good results in German would likely produce good results in Dutch too and likely wouldn't need much tweaking for good results in e.g. English. This has the potential to save ton of work.
The point of the abstract language would be to convey the meaning without requiring a language-specific writing style. The language-specific writing style to convey the specified meaning would be up to the language-specific "renderers".
That's up to the English "renderer" to do. If it decides to use a pronoun for e.g. a subject that identifies as male, it'd use "he". All the abstract language's "sentence" would contain is the concept of a male-identifying subject. (It probably shouldn't even encode the fact that a pronoun is used as usage of pronouns instead of nouns is also language-specific. Though I guess it could be an optional tag.)
No, that'd simply be a mistake in building the abstract sentence. The concept of a pig was used rather than the concept of edible meat made from pig which would have been the correct subject to use in this sentence.
Mistakes like this will happen and I'd even consider them likely to happen but the cool thing here is that "pig consumption has increased", while obviously slightly wrong, would still be quite comprehensible. That's an insane advantage considering that this would apply to any language for which a generic "renderer" was implemented.
I don't see how that would necessarily be the case. Most sentences on Wikipedia are of descriptive nature and follow rather simple structures; only complicated further for the purpose of aiding text flow. Let's take the first sentence of the Wikipedia article on Lemmy:
This could be represented in a hypothetical abstract sentence like this:
(IDK why I chose lisp to represent this but it felt surprisingly natural.)
What this says is that this sentence explains the concept of lemmy by equating it with the concept of a software which facilitates the combination of multiple purposes.
A language-specific "renderer" such as the English one would then take this abstract representation and turn it into an English sentence:
The concept of an explanation of a thing would then be turned into an explanation sentence. Explanation sentences depend on what it is that is being explained. In this case, the subject is specifically marked as a proper noun which is usually explained using a structure like " is ". (An explanation for a different type of word could use a different structure.) Because it's a proper noun and at the beginning of a sentence, "Lemmy" would be capitalised.
Next the explanation part which is declared as a concept of being software of the kind FOSS facilitating some purpose. The combined concept of an object and its purpose is represented as " for the purpose of " in English. The object is FOSS here and specifically a software facilitating some purpose, so the English "renderer" can expand this into "free and open-source software for the purpose of facilitating ".
The purpose given is the purpose of having multiple purposes and this concept simply combines multiple purposes into one.
The purposes are two objects to which a property has been applied. In English, the concept of applying a property is represented as as "a ", so in this case "a self-hosted news-aggregation platform" and "a self-hosted online discussion forum". These purposes are then combined using the standard English method of combining multiple objects which is listing them: "a self-hosted news-aggregation platform and a self-hosted online discussion forum". Because both purposes have the same adjective applied, the English "renderer" would likely make the stylistic choice of implicitly applying it to both which is permitted in English: "a self-hosted news-aggregation platform and online discussion forum".
It would then be able to piece together this English sentence: "Lemmy is a free and open source software for the purposes of facilitating a self-hosted news-aggregation platform and online discussion forum.".
You could be even more specific in the abstract sentence in order to get exactly the original sentence but this is also a perfectly valid sentence for explaining Lemmy in English. All just from declaring concepts in an abstract way and transforming that abstract representation into natural language text using static rules.
It isn't just "a good bit of work", it's an unreasonably large amount of work. It's like draining the ocean with a bucket. I'm talking about tagging hundreds of subtle distinctions for each sentence, and that not tagging those distinctions will output nonsense for at least some language.
I did consider it. And it's blatantly clearly overall less work, and easier to distribute among multiple translators.
For example. If I'm translating some genitive construction from Portuguese to Latin, I don't need to care on which side of English's esoteric "of vs. 's" distinction it lies in. Or if I'm expected to use の/no in Japanese in that situation. Or to tag "hey, this is not alienable!" for the sake of Nahuatl. I need to deal with oddities of exactly two languages - source and target.
Under the proposed system though? Enjoy tagging a single word [jap-no][eng-of][lat-gen][nah-inal]. And that's only for four languages.
(inb4: this shit depends on meaning, so no, code can't handle it. At most code can convert sea[lat-gen] to "maris", but it won't "magically" know if it needs to use the genitive or ablative, or if English would use "of" or "'s".)
False dichotomy.
If you're eager to assume (i.e. to make shit up and take it as true), please do not waste my time.
Source: you made it up.
Okay... I've stopped reading here. If your low-hanging fruit example is three closely related languages, then it's blatantly clear that you're ignorant on the sheer scale of the problem.