this post was submitted on 01 Mar 2025
32 points (100.0% liked)

Privacy

1831 readers
148 users here now

Welcome! This is a community for all those who are interested in protecting their privacy.

Rules

PS: Don't be a smartass and try to game the system, we'll know if you're breaking the rules when we see it!

  1. Be civil and no prejudice
  2. Don't promote big-tech software
  3. No reposting of news that was already posted
  4. No crypto, blockchain, NFTs
  5. No Xitter links (if absolutely necessary, use xcancel)

Related communities:

Some of these are only vaguely related, but great communities.

founded 5 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] e0qdk@reddthat.com 12 points 1 month ago

arXiv has bulk access methods -- you shouldn't need to scrape their website to get the data: https://info.arxiv.org/help/bulk_data.html

If you really want everything (5TB+), that's available from their S3 bucket if you're willing to cover the transfer costs: https://info.arxiv.org/help/bulk_data_s3.html