this post was submitted on 05 Apr 2024
1155 points (97.9% liked)
Technology
59161 readers
1928 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Maybe soon a unified CSV handling might be possible.
I can confidently say that CSV support is one of those problems that even the brightest computer scientists will be pondering for the decades to come.
Supporting CSVs sounds like an easy problem, but it's not. It's like a whole different complexity type. Time complexity, space complexity, and now, the dreaded subclass between spec complexity and organisational complexity.
You can't just make the users agree which delimiter to use and how quotes are supposed to work. That's nearly impossible. No no no.
Commas are too common, we should go with semicolons. And
\n
and UTF-8 by default. And a header that defines changes from defaults, plus metadata such as data logger model and settings. These are some significant quality-of-life improvements but I'd guess it will take another file extension before that happens.I just don't like that CSV exists as a format and has no standards currently. If you remove commas from CSV then you're taking the C out of CSV.
SCSV (semicolon separated values) at least sounds like an upgrade to CSV. Or maybe just use something that is flexible but is standard like JSON?
Yeah, SCSV would work, with a .ssv file extension for FAT compatibility.
JSON is overkill, tabular data is often recorded by 8-bit devices. Yes, you can use a dishwasher to cook salmon, but building a dishwasher is difficult and it can break in many more places. Each piece of salmon also needs to be carefully wrapped.
Yeah, I get what you mean. I'm so overprotective of my dishwasher I actually pre-scrub plates very quickly so not to clog the dishwasher (which is pretty similar to sanitizing inputs for putting them in a database I guess). 😊 It's still much faster than doing the dishes by hands.
But the point is something simple can run on a simple device with minimal supervision.
At that point why not use TSV?
ASCII 0x1f, unit separator and 0x1e, record separator. There's also 0x1d group separator and 0x1c file separator.
Both CSV and TSV have been a mistake from the start it's not like they'd be suitable for binary data anyway and not using ASCII control codes specifically made for in-band messaging of record fields means they ate into the printable characters (and yes \n and \t are printable, they move the print head that's a printing action).
If you want binary compatibility either use bencode or throw ASN.1 at it. The important thing is to have a simple enough data model, don't try to save code in the base compatibility version, evaluate the whole sheet before export if you have to. Using sqlite as interchange format is a bit hacky, but honestly defensible especially with the code (which kinda is the spec) being public domain.