this post was submitted on 02 Feb 2024
95 points (97.0% liked)

[Outdated, please look at pinned post] Casual Conversation

6599 readers
1 users here now

Share a story, ask a question, or start a conversation about (almost) anything you desire. Maybe you'll make some friends in the process.


RULES

Related discussion-focused communities

founded 1 year ago
MODERATORS
 

Well not quite but close. I'm holding a hard disk that has ALL of Wikipedia's text in 10 different languages.

Yes you can download all of Wikipedia and yes it can easily fit in a hard drive. Isn't that amazing? Text is incredibly dense compared to images and video. Around 22 GiB for English Wikipedia alone and 56 GiB for the 10 languages I downloaded.

I also have all of Wiktionary in the same hard drive. It's around 16.4 GiB.

you are viewing a single comment's thread
view the rest of the comments
[–] droning_in_my_ears@lemmy.world 9 points 9 months ago (1 children)

It's only the raw text in json line files. No media and no markup. I think I downloaded a compressed dump then used wikiextractor to extract the text.

[–] AbouBenAdhem@lemmy.world 2 points 9 months ago

Does it include each article’s edit history, talk page, etc?