1000

Elsevier (mander.xyz)

submitted 2 months ago by fossilesque@mander.xyz to c/science_memes@mander.xyz

164 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] Passerby6497@lemmy.world 80 points 2 months ago

That's where you print the downloaded PDF to a new PDF. New hash and same content, good luck tracing it back to me fucko.

[-] Syn_Attck@lemmy.today 65 points 2 months ago* (last edited 2 months ago)

Unfortunately that wouldn't work as this is information inside the PDF itself so it has nothing to do with the file hash (although that is one way to track.)

Now that this is known, It's not enough to remove metadata from the PDF itself. Each image inside a PDF, for example, can contain metadata. I say this because they're apparently starting a game of whack-a-mole because this won't stop here.

There are multiple ways of removing ALL metadata from a PDF, here are most of them.

It will be slow-ish and probably make the file larger, but if you're sharing a PDF that only you are supposed to have access to, it's worth it. MAT or exiftool should work.

Edit: as spoken about in another comment thread here, there is also pdf/image steganography as a technique they can use.

[-] Passerby6497@lemmy.world 8 points 2 months ago

Wouldn't printing the PDF to a new PDF inherently strip the metadata put there by the publisher?

[-] Syn_Attck@lemmy.today 4 points 2 months ago* (last edited 2 months ago)

Good question. I believe the browser "Print to PDF" function simply saves the loaded PDF to a PDF file locally, so it wouldn't work (if I'm correct.)

I'm not an expert in this field, but you can ask on StackExchange or the author of MAT or exiftool. You can also do it yourself (I'll explain how) by making a PDF with a jpg file with your metadata, opening it and printing to pdf, and then extract the image Do let us know your findings! I'm on a smartphone so can't do it.

If you do try it yourself, a note from the linked SE page is that you won't be able to extract the original file extension (it's unknown, so you either have to know what it is, or look at the file headers, or try all extensions), so if you use your own .jpg with your own exif data, rename to .jpg when finished (I believe exif is handled differently based on file type.)

There are multiple tools to add exif data to an image but the exiftool website has some easy examples for our purpose.

(do this as the first step before adding to the PDF)

(command line here, but there are exiftool GUIs)

exiftool -artist="Phil Harvey" -copyright="2011 Phil Harvey" YourFile.jpg

Adds Phil Harvey and the copyright information to the file. If you're on a smartphone and have the time and really have to know, then hypothetically there should be web-based tools for every step needed. I'm just not familiar with any and it's possible the web-based tool would remove the metadata when creating or extracting the PDF.

load more comments (8 replies)

load more comments (9 replies)

load more comments (12 replies)

this post was submitted on 20 Jun 2024

1000 points (98.9% liked)