30
submitted 11 months ago* (last edited 11 months ago) by Set8@lemmy.world to c/piracy@lemmy.dbzer0.com

Around a month ago I posted a poll on this sub asking about feedback relating to a Coomer.su and kemono.su scraper I've been developing, and this post is an update to share where development is going.

For anybody unaware, I have been working on a scraping software that allows you to mass download posts from creators on both kemono and Coomer. This is not a built-in feature of their website, which I found to be somewhat stupid, so I set out to create my own tool.

In my previous post, I talked about the basic features the scraping software would have, and many people pointed out that similar software already exists for this. After taking a look at the software provided to me, I felt it did not meet my expectations and quality standards, so I continued forward with this project.

The major driving factor of this scraping software is the built-in translator I have integrated directly into the codebase, allowing for post titles and descriptions to be seamlessly translated as they are scraped, courtesy of Google translate. This feature has exceeded my expectations, with the only downside being Google's fair rate limit, which can kick in if you translate too many words. This typically only happens with post descriptions and requires upwards of 1k+ words to activate, and thus I feel it is okay in its current state. There is a toggle for translating post descriptions in the code for the time being which defaults to off, and I may add automatic service switching in the future, but for right now, it should work more than well. The translator allows anybody speaking any language to scrape from the PartySites, which is invaluable if your language isn't widely used on the sites.

I've also ported the codebase over to a C# .NET 6 class library for developers, allowing them to create their own scraping software if desired. The project currently has an attached GUI that I am working on refining for the general public.

As I've stated before, the concept of this project is extremely simple, with the codebase itself being compiled to a meager 18kb excluding libraries, and thus it surprises me that nobody has programmed this yet to a capacity deemed acceptable.

I plan to release this scraper in the following weeks, once some bugs are sorted out and discord support is possibly added.

Please let me know what you'd like to see in this, as feedback is always appreciated.

you are viewing a single comment's thread
view the rest of the comments
[-] MigratingtoLemmy@lemmy.world 0 points 11 months ago

Could you take a look at deepl.com's API? It's supposed to be better than Google translate for European languages

[-] Set8@lemmy.world 2 points 11 months ago

DeepL is a paid API unfortunately.

[-] MigratingtoLemmy@lemmy.world 1 points 11 months ago
this post was submitted on 19 Nov 2023
30 points (87.5% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

54443 readers
221 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):


💰 Please help cover server costs.

Ko-Fi Liberapay
Ko-fi Liberapay

founded 1 year ago
MODERATORS