• Staff Applications are OPEN! [ Staff / Moderator ] More Info HERE Help us make a better forum for everyone!

REQUEST: HOW TO: Clean ebooks ?

TiffAnne2000

New member
How can I check/clean pdf's and epubs for identification stuff... like account info or names hidden...ie from certain big stores named after long bodies of water?
 
I've never done that but here's how I'd try to do it:
  1. Using Calibre, convert the pdf to an epub file
  2. Then rename the epub to .zip
  3. open the .zip file and look through each file within the zip (they're text files unless the PDF is comprised of images) for account ids. etc.
  4. rename it .epub
  5. convert it back to a pdf in calibre
There may be a tool that does this for you.
 
I've never done that but here's how I'd try to do it:
  1. Using Calibre, convert the pdf to an epub file
  2. Then rename the epub to .zip
  3. open the .zip file and look through each file within the zip (they're text files unless the PDF is comprised of images) for account ids. etc.
  4. rename it .epub
  5. convert it back to a pdf in calibre
There may be a tool that does this for you.
I took an epub I purchased from that long river place, removed the DRM using epubor, which made it an epub, then renamed it to a zip file.

Inside there was a folder named META-INF containing only a single file names "container.xml" it had no information related to the book.

The other folder was called OEBPS and it had a bunch of xhtml files, which when opened was the content to the book.

There was a file named "content.opf, and the opf file format is what calibre uses for the metadata.

I opened that with notepad. The only things in that file were the normal things one would see in the metadata one can view in calibre. The only additional thing there is a revision date, for when I ran it through epubor.

There's a manifest section which lists the xhtml files, in the proper order for the ebook. then there's another section for a "spine toc" which lists the same xhtml files in the proper order to make a table of contents.

There is nothing that would seem to relate to any account information, and as I think about it, if any of that was embedded in the file, it would be part of the DRM. After all, the whole idea of DRM is to tie it to a single account, so it would seem that if the DRM is removed, so is that account information
 
I took an epub I purchased from that long river place, removed the DRM using epubor, which made it an epub, then renamed it to a zip file.

Inside there was a folder named META-INF containing only a single file names "container.xml" it had no information related to the book.

The other folder was called OEBPS and it had a bunch of xhtml files, which when opened was the content to the book.

There was a file named "content.opf, and the opf file format is what calibre uses for the metadata.

I opened that with notepad. The only things in that file were the normal things one would see in the metadata one can view in calibre. The only additional thing there is a revision date, for when I ran it through epubor.

There's a manifest section which lists the xhtml files, in the proper order for the ebook. then there's another section for a "spine toc" which lists the same xhtml files in the proper order to make a table of contents.

There is nothing that would seem to relate to any account information, and as I think about it, if any of that was embedded in the file, it would be part of the DRM. After all, the whole idea of DRM is to tie it to a single account, so it would seem that if the DRM is removed, so is that account information
if they were clever, they'd have some metadata that would identify the account that the file came from.
 
Back
Top Bottom