REQUEST: HOW TO: Clean ebooks ?

TiffAnne2000 · Dec 5, 2025

How can I check/clean pdf's and epubs for identification stuff... like account info or names hidden...ie from certain big stores named after long bodies of water?

KidneyCracker · Dec 5, 2025

I've never done that but here's how I'd try to do it:

Using Calibre, convert the pdf to an epub file
Then rename the epub to .zip
open the .zip file and look through each file within the zip (they're text files unless the PDF is comprised of images) for account ids. etc.
rename it .epub
convert it back to a pdf in calibre

There may be a tool that does this for you.

O_P_T · Dec 5, 2025

KidneyCracker said:
I've never done that but here's how I'd try to do it:

Using Calibre, convert the pdf to an epub file

Then rename the epub to .zip

open the .zip file and look through each file within the zip (they're text files unless the PDF is comprised of images) for account ids. etc.

rename it .epub

convert it back to a pdf in calibre

There may be a tool that does this for you.

I took an epub I purchased from that long river place, removed the DRM using epubor, which made it an epub, then renamed it to a zip file.

Inside there was a folder named META-INF containing only a single file names "container.xml" it had no information related to the book.

The other folder was called OEBPS and it had a bunch of xhtml files, which when opened was the content to the book.

There was a file named "content.opf, and the opf file format is what calibre uses for the metadata.

I opened that with notepad. The only things in that file were the normal things one would see in the metadata one can view in calibre. The only additional thing there is a revision date, for when I ran it through epubor.

There's a manifest section which lists the xhtml files, in the proper order for the ebook. then there's another section for a "spine toc" which lists the same xhtml files in the proper order to make a table of contents.

There is nothing that would seem to relate to any account information, and as I think about it, if any of that was embedded in the file, it would be part of the DRM. After all, the whole idea of DRM is to tie it to a single account, so it would seem that if the DRM is removed, so is that account information

KidneyCracker · Dec 5, 2025

O_P_T said:
I took an epub I purchased from that long river place, removed the DRM using epubor, which made it an epub, then renamed it to a zip file.

Inside there was a folder named META-INF containing only a single file names "container.xml" it had no information related to the book.

The other folder was called OEBPS and it had a bunch of xhtml files, which when opened was the content to the book.

There was a file named "content.opf, and the opf file format is what calibre uses for the metadata.

I opened that with notepad. The only things in that file were the normal things one would see in the metadata one can view in calibre. The only additional thing there is a revision date, for when I ran it through epubor.

There's a manifest section which lists the xhtml files, in the proper order for the ebook. then there's another section for a "spine toc" which lists the same xhtml files in the proper order to make a table of contents.

There is nothing that would seem to relate to any account information, and as I think about it, if any of that was embedded in the file, it would be part of the DRM. After all, the whole idea of DRM is to tie it to a single account, so it would seem that if the DRM is removed, so is that account information

if they were clever, they'd have some metadata that would identify the account that the file came from.

Search

REQUEST: HOW TO: Clean ebooks ?

TiffAnne2000

New member

KidneyCracker

Active member

O_P_T

Active member

KidneyCracker

Active member

Similar threads