26
parquet vs csv
(lemmy.ml)
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Follow the wormhole through a path of communities !webdev@programming.dev
parquet is cloesely tied to the apache foundation, because it was designed as a storage format for hadoop.
But many data processing libraries offer interfaces to handle parquet files so you can use it outside of the hadoop eco system.
It's really good for archiving data, because the format can store a lot of data with relatively low disk space, while still providing ok read performance because often times you won't need to read the whole file due to how they are structured, where csv files would be a lot of plaintext taking up more diskspace.