yano2mch
Professional Geeky Perv
Finding some converters inject some unwanted stuff as advertisements into the epubs they convert.
(First one, King Dante Deck of destiny. Second one, Janet Chapman Sinclair brothers 1)
None of this is harmful, but some of it is more annoying than others. Every single page seeing OceanOfPDF link, or a Converted using ABC stuff that shouldn't be there. (usually several times on each file, likely something like every few thousand bytes)
Working on and got a prototype script to clean the epubs up; as well as fix other minor problems. So this is likely a work in progress. Providing xdeltas aren't as reliable in this case vs having a fixed version to work against (i optimize my epubs so they are different from what i download here or anne's archive), so i doubt i could do that unlike with the Michael Anderson thread. Instead it would have to be a shell script, or using AHK to do the bulk of the work, assuming i can keep it happy. but sed scripts and AHK scripts aren't fully interchangeable. But they are fairly easy to convert to AHK.
Current fixes/problems include:
empty paragraphs/spans
After a *** line, the next text paragraph tends to be raw outside a paragraph (and then empty paragraph at the end).
At the end of most paragraphs, there's empty spaces: example of <p>hey ho! </p>. (Curiously in a lot of conversions, if the space isn't there, then it's part of a continuing paragraph...)
Deleting oceanofpdf.com file and calibre_bookmarks.txt files.
If anyone sees any other injection of these kinds of blocks, or fixes from common converter issues let me know and i'll incorporate it in the script. (preferably needing an example epub).
If anyone has any epubs they really need stripped immediately or have a huge number let me know and i'll throw a current version of the bash/AHK script together for you to use. (the Bash one might need minor tweaking in a Linux environment vs CygWin that i'm using).
(First one, King Dante Deck of destiny. Second one, Janet Chapman Sinclair brothers 1)
None of this is harmful, but some of it is more annoying than others. Every single page seeing OceanOfPDF link, or a Converted using ABC stuff that shouldn't be there. (usually several times on each file, likely something like every few thousand bytes)
Working on and got a prototype script to clean the epubs up; as well as fix other minor problems. So this is likely a work in progress. Providing xdeltas aren't as reliable in this case vs having a fixed version to work against (i optimize my epubs so they are different from what i download here or anne's archive), so i doubt i could do that unlike with the Michael Anderson thread. Instead it would have to be a shell script, or using AHK to do the bulk of the work, assuming i can keep it happy. but sed scripts and AHK scripts aren't fully interchangeable. But they are fairly easy to convert to AHK.
Current fixes/problems include:
empty paragraphs/spans
After a *** line, the next text paragraph tends to be raw outside a paragraph (and then empty paragraph at the end).
At the end of most paragraphs, there's empty spaces: example of <p>hey ho! </p>. (Curiously in a lot of conversions, if the space isn't there, then it's part of a continuing paragraph...)
Deleting oceanofpdf.com file and calibre_bookmarks.txt files.
If anyone sees any other injection of these kinds of blocks, or fixes from common converter issues let me know and i'll incorporate it in the script. (preferably needing an example epub).
If anyone has any epubs they really need stripped immediately or have a huge number let me know and i'll throw a current version of the bash/AHK script together for you to use. (the Bash one might need minor tweaking in a Linux environment vs CygWin that i'm using).
Last edited: