• Staff Applications are OPEN! [ Staff / Moderator ] More Info HERE Help us make a better forum for everyone!

Trimming Epubs - OceanOfPDF

yano2mch

Professional Geeky Perv
Finding some converters inject some unwanted stuff as advertisements into the epubs they convert.

(First one, King Dante Deck of destiny. Second one, Janet Chapman Sinclair brothers 1)

inject.png


None of this is harmful, but some of it is more annoying than others. Every single page seeing OceanOfPDF link, or a Converted using ABC stuff that shouldn't be there. (usually several times on each file, likely something like every few thousand bytes)

Working on and got a prototype script to clean the epubs up; as well as fix other minor problems. So this is likely a work in progress. Providing xdeltas aren't as reliable in this case vs having a fixed version to work against (i optimize my epubs so they are different from what i download here or anne's archive), so i doubt i could do that unlike with the Michael Anderson thread. Instead it would have to be a shell script, or using AHK to do the bulk of the work, assuming i can keep it happy. but sed scripts and AHK scripts aren't fully interchangeable. But they are fairly easy to convert to AHK.

Current fixes/problems include:
empty paragraphs/spans
After a *** line, the next text paragraph tends to be raw outside a paragraph (and then empty paragraph at the end).

At the end of most paragraphs, there's empty spaces: example of <p>hey ho! </p>. (Curiously in a lot of conversions, if the space isn't there, then it's part of a continuing paragraph...)
Deleting oceanofpdf.com file and calibre_bookmarks.txt files.


If anyone sees any other injection of these kinds of blocks, or fixes from common converter issues let me know and i'll incorporate it in the script. (preferably needing an example epub).

If anyone has any epubs they really need stripped immediately or have a huge number let me know and i'll throw a current version of the bash/AHK script together for you to use. (the Bash one might need minor tweaking in a Linux environment vs CygWin that i'm using).
 
Last edited:
Took waaaay too long to get to.

Anyways, here's a converted version of my script to AHK so you can quickly De-oceanify your epubs. If there's other epubs that have tags added we don't want, let me know and point me to them, so i can add them to the script.

Without further adeu... the compiled script. And the source scripting i was using too.

Simple usage, it will create an input/output folders, and extract a couple extra programs. Drop the epubs into the input, and then run the script. If you want the epubs optimized run the compact.bat script within the output folder, otherwise it will just work pretty quickly.

The xdeltas should be usable to restore the original file (should you decompress the zip files) but should only worry about it if there's some major f*k-ups.


edit: remembered and checked for invisible hyphens. That's added back in now.
 

Attachments

  • de-ocean_shell.zip
    de-ocean_shell.zip
    2.5 KB · Views: 4
  • de-ocean_ahk.zip
    de-ocean_ahk.zip
    932.3 KB · Views: 4
Last edited:
Back
Top Bottom