• Staff Applications are OPEN! [ Staff / Moderator ] More Info HERE Help us make a better forum for everyone!

Calibre - Ebook creation/conversion/editing - Tutorial

So I discovered where the missing 'tt' 'tti' 'ft' 'tf' fault in some files originates.
Apparently sometimes when you convert a pdf to .epub with eCalibre it does it.

Do you think going to word first will alleviate this?

Not sure. I've seen missing L's when converting, usually the second one in a row, and it was consistent through a document. But different site and converters kept the L. Which may result in me copying the text (losing the formatting) since the l's are more annoying.

Other annoyances i've seen is every line is it's own paragraph... (readable, but uggg.... you can't zoom in it breaks things up so bad)

Is it an OCR issue? I've seen OCR screw up things like that.

I didn't think Calibre could convert an image based PDF, or if it does, it simply puts a single image of each page on a page in the new format.

Mhmmm OCR (Optical Character Recognition) will try to convert an image into it's actual text; though formatting tends to be missing. Calibre and other converters just put image to image and don't try to identify or use OCR on imagebooks.

So this isn't an OCR problem. Though interesting note, up-scaling text (that's under 500x1000) reduces the number of errors, even if it's bi-linear and white-threshold to 70%. Funny those images are thrown away shortly after, but still results in very good results.
 
As long as we're talking about calibre conversion anomalies, One thing I see from time to time is that some of the letters that end up in the converted Word document are not recognized by spell check as being those letters.

They look perfectly fine, but just about very word in the document has the dotted red line under them, indicating there is a spelling error. It's not a country selection in the Word document, it's just that the letters don't show up in the spell check tool as their actual value.

Typically, it's the vowels that are problematic, along with t, l, m, p.

I simply copy one of those letters and past it in the search slot and then type the same letter in the replace and do a global search and replace. Annoying, but typically not a long process to fix.

If I see it in one book by an author, I will likely see it in most or all of their works. Which makes me think it might have to do with how they generated the original source document (what software they used) and if they used any typesetting software to prep it for publication.

It's not Reedsy, which is a company I've seen mentioned by various authors, since their books don't have this issue.

If I can recall a book that had this problem, I'll let you know.
 
I simply copy one of those letters and past it in the search slot and then type the same letter in the replace and do a global search and replace. Annoying, but typically not a long process to fix.

Tools like sed/awk can help with those. Worst case i can write a script using AHK which can do the same thing, but i'd need the raw unicode letters. There's a lot of unicode letters that look like normal characters. It's probably that.
 
OK, I downloaded the Birthday present 1-12 from the first post in Katt Ford's thread.

I think the problem is inherent in that PDF.

If I converted it to epub or Word using calibre, I had the problem you mentioned. The one that caught my eye was the missing "ft" in the second paragraph, "Bobbi must have heard me coming. A er all".

When I opened the PDF and copied and pasted that bit in Word, this is what showed up.
Weird, because when I opened the pdf using mozilla I couldn't find any of the errors I was dealing with - so I thought the problem was eCalibre.
But having run into those "generic" question mark before i can say that not all of those question marks mean the same thing, even though they look the same. The most recent instance I ran into them was a case of punctuations. All 4 question marks looked identical, but they were for the 4 different cases of smart quotation marks (two stroke and single stroke), and eCalibre's find and replace function could tell the difference between them, even though I could not find a way for it to show me the differences...
Maybe not ?ASCII?
Regardless above my paygrade.
 
The most recent instance I ran into them was a case of punctuations. All 4 question marks looked identical, but they were for the 4 different cases of smart quotation marks (two stroke and single stroke), and eCalibre's find and replace function could tell the difference between them, even though I could not find a way for it to show me the differences...

I hate "smart" quotes. They just muck things up.

In sed in my scripts i remove them. They aren't worth it.

Code:
s:[ā€œā€]:":g;
s:[ā€˜ā€™]:':g;
 
Question on the calibre metadata - specifically the book comments section.
This data does not appear to be saved in the epub, is there an easy way to get it into there in mass?
The reason I know this is the case is because I have this massive library 62k+ books that includes Goodreads' best, Amazon's Best, NYT Best and other bests and booklists I downloaded over the years, that over time I sort through and populate the goodreads book data to the comments section, and if it seems to be a book I'd likely read, I move it to the appropriate sub-libraries I have set up.
But I recently did that to a couple books I'd just loaded the data to, and the data didn't transfer with the epub... errrrrggggghhhh!
Thoughts?!? Anyone?
 
Question on the calibre metadata - specifically the book comments section.
This data does not appear to be saved in the epub, is there an easy way to get it into there in mass?

Not aware so much on comments.

The tags i'm aware of that we should try to populate is
dc:title - book title/name
dc:creator - author name in the 'aut' portion
dc:description - short book description, back cover?
dc:genre - kinks or genres of books for sorting.

Beyond that... i suppose we could always add a tag, but that doesn't mean readers will acknowledge or honor them.

I assume you mean adding the comments. If there's a uniform structure to the page(s) then sure. 7zip is good about adding files from the command-line to i can add it in a script, rather than unpacking and then repacking the archive.

I'd probably use sed. With the metadata extracted into a BASH variable like COMMENT then it would do something like this:
Bash:
sed -E -e '/<\/metadata>/{s:^:<dc\:comment>'"${COMMENT}"'</dc\:comment>\n:}' content.opf > content.opf.tmp

That SHOULD inject the comment as a final line before the metadata ends. Course you can repeat several times to inject multiple comments. Course the comment needs to be extracted and then sterilized to make sure it won't cause problems with sed.

just adding the extracted html page to the epub to process later like goodreads.html would suffice and then make a batch to process them in bulk...
 
Last edited:
No, I'm not thinking about adding, because they are already there when I send it to the e-reader I use (Readera). But apparently they aren't kept in the epub itself. I'm guessing it is kept instead in the metadata.opf file eCalibre makes, but I'm not sure how to go about verifying this, and then easily transferring it to the new library.
Although I am disappointed that eCalibre apparently doesn't store it with the book itself.
and what you describe is at, or above, my current level of competence, and ... I'm not sure worth the effort.

I'm still trying to digest the implications of this disappointment, and how best to work around it.
I initially separated out my Erotica from my other books, so I didn't out myself to my, assumed vanilla, friends, and family, I share my library with. And then finding the lag of working/tweaking with a huge library I separated out my Fantasy and Sci-Fi sections also. But now it's looking as if I might be better off recombining it all, and that too has the same issue because those libraries are also decent sized, but have all the comment data I like, but that won't seem to transfer without another century of cut & paste...

Sorry, thinking in public...
 
I'm guessing it is kept instead in the metadata.opf file eCalibre makes, but I'm not sure how to go about verifying this, and then easily transferring it to the new library.
Although I am disappointed that eCalibre apparently doesn't store it with the book itself.

Yeah i find myself not trusting calibre as a library manager. I was experimenting with adding tags and wasn't happy with the results. If i can't have it be inside the epub/book then i don't want to use it, since it won't be transferred with the book itself.

There's a datadata.db file that is in the Calibre Library save/working directory, which i suspect is where it stores the information. Uses SQLite, so you could peruse but i haven't incorporated SQLite in any projects so not sure how much i could automate it using the tables it provides.


I initially separated out my Erotica from my other books, so I didn't out myself to my (assumed vanilla, friends, and family), I share my library with.

And then finding the lag of working/tweaking with a huge library I separated out my Fantasy and Sci-Fi sections also. But now it's looking as if I might be better off recombining it all,

Though, if you have books separated in directories of different genres, then applying and adding that tag to all books within wouldn't be that hard.

--

I came to the conclusion unless it's blatantly erotica in title and/or cover, then you can treat it as non-erotica. There's a lot of non-erotica that has erotic scenes, but they tend to be here or there, or some that reference sex but never put the scene up.

Piers Anthony, one of my favorite authors (as a child), has several sections where it vaguely describes something sexual or skips past it. Geis of the Gargoyle comes to mind. Plenty of descriptions of topless girls, but doesn't go full out into erotic sex, and 'what the magician married' i think., though cant' find the exact book.

Then there's books my mom described to me and how she would skip the sex scenes, Clan of the Cave Bear by Jean Auel, books of prehistoric living of humans vs Neandertals and knowledge inferred by birth vs experience and how it changed the clans.

Then there's Anne Rice, with vampires, but descriptions of the vampires giving of the act of feeding having a higher/on par erotic euphoria as sex/orgasm. Or how Lestat would intentionally target whores and husband killers...

Then Eric Vall, has Succubus lord.... and as it enters any sex scene it leaves you with blue balls as it immediately bypasses it. (well first book anyways, i haven't gone further than the 4th chapter due to the blueballness i got). Very disappointed here... Finished Metal Mage and it describes where apparently him and a half-elf finally opened up to having sex (and lots of it) and randomly describes how her breasts would heave as he fantasizes about her, but not have the scenes. I'm feeling Vall shouldn't be qualified so much as erotica literature personally. Or maybe he wants the first few books of each series to be SFW-ish... don't know.

Another, Larrell K Hamilton with the Anita Blake series, first few books are combination dark horror vampire hunting. Then book 5 or something and every 3 chapters it's a huge orgy sex scene. (Yes there's a in-story reason for the sexual tension and it growing powers, but it sorta is a bait and switch).

Romance novels in general have erotica scenes, or euphemisms that are laughable. 'He touched her femininity and it felt like sparks from angels wings' comes to mind read to me by a friend.


Guess i'm saying, don't worry too hard. Unless the title suggests it's erotica, or a author that is blatantly erotica, or a publishing that's blatently erotica, then it's probably okay.
 
Last edited:
No, I'm not thinking about adding, because they are already there when I send it to the e-reader I use (Readera). But apparently they aren't kept in the epub itself. I'm guessing it is kept instead in the metadata.opf file eCalibre makes, but I'm not sure how to go about verifying this, and then easily transferring it to the new library.
Although I am disappointed that eCalibre apparently doesn't store it with the book itself.
and what you describe is at, or above, my current level of competence, and ... I'm not sure worth the effort.

I'm still trying to digest the implications of this disappointment, and how best to work around it.
I initially separated out my Erotica from my other books, so I didn't out myself to my, assumed vanilla, friends, and family, I share my library with. And then finding the lag of working/tweaking with a huge library I separated out my Fantasy and Sci-Fi sections also. But now it's looking as if I might be better off recombining it all, and that too has the same issue because those libraries are also decent sized, but have all the comment data I like, but that won't seem to transfer without another century of cut & paste...

Sorry, thinking in public...
I've noted that most epubs I get from Smashwords have comments and tags embedded in the epub, at least as far as calibre is concerned, because when I import them into calibre, that stuff gets populated. Here's one example that does that.

Maybe you, or yano2mch, can look at this file to see how it does that.
 

Attachments

[/QUOTE]
Maybe you can look at this file to see how it does that.

I see description and genre tags... nothing stood out from the story. Which 'comments' are you talking on? Or are they all listed down below? (chopped description for wordwrap reasons)

XML:
    <dc:description>A Wife Gets Shared With Three Of Her Husband's Friends! Debbie is happy with her new garden.
  The tall trees surrounding it give her all the privacy she needs for a little sunbathing. There's no privacy from
  her husband and his friends though, and when they come home from the bar to light the barbecue, Debbie
  feigns sleep as they compliment her and talk about what they'd like to do.</dc:description>
    <dc:language>en</dc:language>
    <dc:creator opf:role="aut">Gemma Harris</dc:creator>
    <meta name="calibre:timestamp" content="2020-02-25T04:54:28.814435"/>
    <dc:title>My Husband Shared Me With His Friends</dc:title>
    <dc:contributor opf:role="bkp">Smashwords, Inc.</dc:contributor>
    <dc:subject>watching wife</dc:subject>
    <dc:subject>fuck my wife</dc:subject>
    <dc:subject>sharing wife</dc:subject>
    <dc:subject>wife anal</dc:subject>
    <dc:subject>wife gangbang</dc:subject>
    <dc:subject>wife slut sex</dc:subject>
    <dc:subject>slut sexy wife</dc:subject>
    <dc:subject>hotwife menage</dc:subject>
    <dc:subject>hotwife shared</dc:subject>
    <dc:subject>group sex with hot slut wife</dc:subject>
 

I see description and genre tags... nothing stood out from the story. Which 'comments' are you talking on? Or are they all listed down below? (chopped description for wordwrap reasons)

XML:
    <dc:description>A Wife Gets Shared With Three Of Her Husband's Friends! Debbie is happy with her new garden.
  The tall trees surrounding it give her all the privacy she needs for a little sunbathing. There's no privacy from
  her husband and his friends though, and when they come home from the bar to light the barbecue, Debbie
  feigns sleep as they compliment her and talk about what they'd like to do.</dc:description>
    <dc:language>en</dc:language>
    <dc:creator opf:role="aut">Gemma Harris</dc:creator>
    <meta name="calibre:timestamp" content="2020-02-25T04:54:28.814435"/>
    <dc:title>My Husband Shared Me With His Friends</dc:title>
    <dc:contributor opf:role="bkp">Smashwords, Inc.</dc:contributor>
    <dc:subject>watching wife</dc:subject>
    <dc:subject>fuck my wife</dc:subject>
    <dc:subject>sharing wife</dc:subject>
    <dc:subject>wife anal</dc:subject>
    <dc:subject>wife gangbang</dc:subject>
    <dc:subject>wife slut sex</dc:subject>
    <dc:subject>slut sexy wife</dc:subject>
    <dc:subject>hotwife menage</dc:subject>
    <dc:subject>hotwife shared</dc:subject>
    <dc:subject>group sex with hot slut wife</dc:subject>
[/QUOTE]
Yes, the <dc:description>A wife Gets Shared... bit is what I'm referring to as "comments".

If you edit the meta data in calibre, that text appears in the box on the right of the editing window and is labeled "comemnts".

BTW, Word uses the same labels as calibre if you click on file/info for any document. If you populate those fields, and import the Word document into calibre, it reads them and populates the meta data appropriately.
 
Yes, the <dc:description>A wife Gets Shared... bit is what I'm referring to as "comments".

Good to know. That was from the content.opf file (or metadata.opf, whatever it's named). I comment on it in the 4th post regarding the opf, but easy enough to chop it up to misunderstanding (afterall i'd think a comment is closer to a review, user generated and not author generated). With recent OCR work I've been fetching and adding the description tags and other data if i can find them. Often the kinks are missing and i guess based on title and skimming content when i was working on it.

But adding the tag isn't hard. I've a template of both the description and 5 subjects/genres to fill out, which i then edit the opf file directly (even in calibre's editor). Add it in before the </metadata> closing tag.

XML:
    <dc:description></dc:description>
    <dc:subject></dc:subject>
    <dc:subject></dc:subject>
    <dc:subject></dc:subject>
    <dc:subject></dc:subject>
    <dc:subject></dc:subject>
 
Last edited:
Comments, in the single book view of eCalibre the Right Hand window is labeled "Comments".
And yes I've noticed that when you load most books this section gets filled out, but usually it isn't as good a description as the one in Goodreads - if nothing else it's missing the genre tags (which I use to sort the books). But when I overwrite the "original" comments the new ones I pasted in don't carry with the book, only the original ones if they existed.
I think the "original" comments are, as OPT points out, are in the "dc:description" which will obviously carry - but new comments pasted in won't... Which I find especially annoying with anthologies, because I want the comment section to list all the stories/authors and put them in there.
 
I'm also wondering about what happens to the populated data if you use the download metadata option, and actually get some metadata.
I gave up on that function long ago, as if it delivered, it usually delivered stuff not as good as goodreads info.
 
So I'm doing a little experimenting with eCalibre, and find I'm not sure what you mean that adding tags don't get put into the epub. I just tried changing the tags in a book, and let eCalibre save the book. Going back into the book with edit, I see the changed tags as dc:subject in content.opf
 
But now that is just wrong! In my current experiment - it loaded the info I added into the dc:description but it was blank initially - and the books I tried it on first weren't so...
And it transferred when I changed the data and saved... hmmm... I'm missing something, I wish I could remember which books they were.

Education only ends in the grave - if you're an ?atheist? Maybe?
 
And it transferred when I changed the data and saved... hmmm... I'm missing something, I wish I could remember which books they were.

Well i notice that Calibre likes to have an extracted opf file in the directories with the books. Meaning likely modified opf files are kept outside and not re-injected into the book, as that's the job of the editor not the library manager.

At least i think that might be a case. That or it uses it's own database.
 
FC1, I just did an experiment.

I had an epub in calibre with no comments or tags in the metadata.

I edited the metadata in calibre, downloading the applicable info.

I saved the epub in calibre to an epub.

It changed the epub I had initially imported into calibre to a *.original_epub file.

I copied that file to another folder on my PC and then imported it into calibre. It recognized it as being the same title and author as the book already in calibre, and I imported it anyway.

The comments and tags were blank in calibre, just as it had been when I first imported the epub.

I then repeated this with the epub I'd created with calibre.

The comments and tags were there in calibre.

I don't know if this means they would be read by any e-reader software.
 
Back
Top Bottom