Welcome

A brand new start!
  • Staff Applications are OPEN! [ Staff / Moderator ] More Info HERE Help us make a better forum for everyone!

How to extract images from pdf files

Urikf

Member
Joined
Dec 9, 2023
Messages
108
I just found out that some posters extract images from pdf files. Unfortunately, majority of modern software doesn't perform this task properly and you will get a distorted images. I made a lot of experiments and at last (after consultations with guru) and find out a solution. There are only 2 paid software (PDF Image Extraction Wizard and PDF Explorer) and freeware command line tool (utility xpdf) did this task properly. Al of other software failed. Please be careful.
 
You can get any pdf file from this collection and make experiments. I spent a lot of times and I asked advises at specialized computer forum. I doubt if you will find anything extra.

https://mega.nz/folder/Y8NGRAyR#k6pZP3VxiUjmav5Wgo0u-g

You can even create your own pdf file from your collection of comics and extract images using different tools. You will see a difference in resolution between original and extracted images.
 
There is a difference between extracting images and converting to images.

You can use 10 different apps to extract and they will give you the exact same results.

It's when you convert to images that you will have different resolutions. That is because you need to setup the quality and DPI when you convert. Some have a default that you can't change and others do.

In any case, there is no need to pay for software to do this as there are a ton that do it for free. Here is a list of a few i have used in the past and work very well:

  1. PDF24 Creator (My favorite one. I use it 99.9% of the times)
  2. AVS Document Converter (The one i was using before PDF24)
  3. PDF Candy Desktop (What i was using before AVS... see the pattern :D)
  4. PDFArea PDF Image Extractor (Used that when AVS didn't work)
My recommendation is to use PDF24. It is actively being developed and has regular updates. It's free, fast and is all-in-one. If the PDF has a password, it will automatically remove it before extracting the images.
 
There is a difference between extracting images and converting to images.

You can use 10 different apps to extract and they will give you the exact same results.

It's when you convert to images that you will have different resolutions. That is because you need to setup the quality and DPI when you convert. Some have a default that you can't change and others do.

In any case, there is no need to pay for software to do this as there are a ton that do it for free. Here is a list of a few i have used in the past and work very well:

  1. PDF24 Creator (My favorite one. I use it 99.9% of the times)
  2. AVS Document Converter (The one i was using before PDF24)
  3. PDF Candy Desktop (What i was using before AVS... see the pattern :D)
  4. PDFArea PDF Image Extractor (Used that when AVS didn't work)
My recommendation is to use PDF24. It is actively being developed and has regular updates. It's free, fast and is all-in-one. If the PDF has a password, it will automatically remove it before extracting the images.
I used AVS Document Converter - test failed. By the way I repeat - you can create a pdf from just 1 jpg file. Please use all of tools you mentioned above and compare resolution in initial image and extracted image. If at least one of these tools return THE SAME RESOLUTION I will give up. I don't say that these tools don't work. I say that all of them CHANGE original file. Maybe your eye is not able to see difference because when you create a pdf such term as dpi doesn't make any sense but just compare properties of both file in File/Windows Explorer. That's it.
 
I used AVS Document Converter - test failed. By the way I repeat - you can create a pdf from just 1 jpg file. Please use all of tools you mentioned above and compare resolution in initial image and extracted image. If at least one of these tools return THE SAME RESOLUTION I will give up. I don't say that these tools don't work. I say that all of them CHANGE original file. Maybe your eye is not able to see difference because when you create a pdf such term as dpi doesn't make any sense but just compare properties of both file in File/Windows Explorer. That's it.

Well clearly you didn't read carefully what i said. I told you to EXTRACT IMAGES.

I bet you $100 you clicked the Convert Now button instead of the Extract Images button.

1705093484694.png


Just to humour you, i did your test. Using 10 images just to be absolutely sure. And of course, as expected, they are the same.

I created a PDF with PDF24 and then i extracted the files with AVS:

1705093884615.png
 
Maybe your eye is not able to see difference because when you create a pdf such term as dpi doesn't make any sense but just compare properties of both file in File/Windows Explorer. That's it.

Also, please don't patronize me. I have held every IT job possible and i am what you consider a true power user. I have also done all the developer jobs and now am a Software Delivery Manager.

I spoke about the DPI in my initial reply so i don't get why you decided to be a condescending prick when i am just trying to help. Clearly we can see i know what i am talking about and you don't
 
Also, please don't patronize me. I have held every IT job possible and i am what you consider a true power user. I have also done all the developer jobs and now am a Software Delivery Manager.

I spoke about the DPI in my initial reply so i don't get why you decided to be a condescending prick when i am just trying to help. Clearly we can see i know what i am talking about and you don't
In stead of posting angry reply please just create a pdf file from 1 jpg and extract this image. I tried this application and I got files with the same size but when I checked resolution I found out that resolution is different. After you will extract this image please compare resolutions. I used this application 6 months ago (the latest version). Therefore you are wrong.

As seems to me I worked more than 6 years as VB6 programmer and then more than 4 years as .NET programmer. Plus I worked as a Database developer. Do you have any additional questions? Don't you think if I don't understand how to create bat files and write scripts?

I never write anything if I am not TOTALLY confident. I don't want if people waste their time. OK?

Sorry, it is not a political forum, and I am not interested in proving if I am right or wrong. I am not in arguing people. I just give an information. It is up to you accept it or reject it. I listened to your reasons and I will make my decision based on my experience and opinion of people at specialized computer forum. That's it. I believe it is the end of story.
 
Last edited:
You are clearly dense and can't read.

I created a PDF with 10 images in PDF24. I then extracted these images with AVS. I then compared the ORIGINAL FILES USED TO CREATE THE PDF with the ones EXTRACTED FROM THE PDF WITH AVS.

I did a BINARY COMPARE and the files are IDENTICAL except for the modified date because of obviously.

You also clearly don't know how to create a PDF out of images properly because again, i bet you the difference happens when you create the PDF. If you knew how to create a PDF without altering the images then you'd see i am right.

1705114323324.png


Oh shit... you can write scripts??? I must bow to you...

1705114623053.png


Oh wait? Are those scripts but integrated into the windows context menu? I guess i can write scripts too

If you can't read English, i can't do anything for you.

Like wow...

1705114455419.png
 
Last edited:
Sorry, i don't listened to me. I ask you did you compare resolution in both files? I wrote you that I also didn't see any difference in files but resolution was different. Please explain me how it happened? Files were identical but resolution was different. That's the problem.
 
You are right. I had to EXTRACT images not CONVERT. When I extracted I got the same resolution. I surrendered. I WAS TOTALLY WRONG. I got the same resolution as from PDF Image Extraction Wizard. I don't understand how I didn't see this option. BY the way PDF Image Extraction Wizard allow you to extract not all of images. Moreover you can get a page it self like "print page" if there a images and texts at the page. It allow you not re-number pages.
 
Last edited:
You are right. I had to EXTRACT images not CONVERT. When I extracted I got the same resolution. I surrendered. I WAS TOTALLY WRONG. I got the same resolution as from PDF Image Extraction Wizard. I don't understand how I didn't see this option. BY the way PDF Image Extraction Wizard allow you to extract not all of images. Moreover you can get a page it self like "print page" if there a images and texts at the page. It allow you not re-number pages.

All good.

Sorry i got heated but i have a short fuse when people question my competency. I, like you, want to be sure before i affirm anything.
 
This tread is a godsend. I was stuck converting pdfs using ghostscript and was looking for such extraction tools. Thank you both.
 
I've been trying out Pdf24.
It's great, but in some of the pdfs the text is separate from the images, so when extracting the images the text bubble are blank.
I've seen that when using the pdf24 reader I can save an image I select as PNG with the text (might be a conversion, but at least it's at the right resolution, and it's a png.). But I cannot find a way to do it in batch, save all the pages in a pdf this way. Does anyone know of a way to do it? (The "usual" convert to image batch option is the classic one depending on the DPI you set, not what I'd like)
 
The reason is because the PDF was not created with premade images with text. The text was added IN the PDF.

My suggestion would be to extract the images, then look at the EXIF, IPTC, or XMP properties to see the DPI and quality of the large images.

Then use the PDF24 conversion and set the DPI and Quality to what the original was. That way you are as close to the original as possible. If the quality is not shown, just use 95 as it is the default artists use when creating JPGs. If you find the images are too big then drop the quality to 90 but the most important is the DPI. It is usually 96 but i have seen 72 a lot

1706400527624.png
 
The reason is because the PDF was not created with premade images with text. The text was added IN the PDF.

My suggestion would be to extract the images, then look at the EXIF, IPTC, or XMP properties to see the DPI and quality of the large images.

Then use the PDF24 conversion and set the DPI and Quality to what the original was. That way you are as close to the original as possible. If the quality is not shown, just use 95 as it is the default artists use when creating JPGs. If you find the images are too big then drop the quality to 90 but the most important is the DPI. It is usually 96 but i have seen 72 a lot

View attachment 160164
I believe you are right. The text is not part of image. I think it is annotation. I tried PDF Image Extraction Wizard and it works. I also believe that ADS Document converter will also work. It would be like option pdf to image (not extract) like you copy page and paste it in any jpg editor like Paint. You don't extract images but you will convert it. Otherwise if you want to keep an original image you area absolutely right. It is necessary extract image and then add text in a jpg editor. OK?
 
Guess I'm back to manual labor on each of these files. Too bad.

Edit:
could you confirm that the "extract pdf pages" and/or "remove pdf pages" options in pdf24 does not alter the quality of the remaining pages? (I do expect it that way, but better to be certain)

I'm thinking I will clean up the "problematic" pdfs this way at least, then might convert some on a case by case basis, as the need arise. At least now I know how to check the DPI (and related resolution) of images in pdf files, and I have the tools to do it.
That's what I was missing with the conversion tools I used.

I used to convert them all into pngs at 300 dpi, then guess at the original resolution based on the pixelisation and convert them into .webp at that guessed resolution. (but always higher than the true one, to be safe; which might bring about other problems. if for example I end up wanting to scale them up with IA later, the "too big images would end up crappy because the pixelisation would be magnified, even with AI "assistance".)
 
Last edited:
Curious i found this thread after i've been using PDF Image Extractor.

The only problems with it so far is 1) If images are low bpp (1-4bit?) it can barf.
2) It doesn't seem to offer a CLI option so you're stuck with choosing a directory/files to extract.
3) Some pdf files are problematic, usually using jp2 where it has two or more layers for a single page, likely encoding some type of differential to make the images smaller. Example page from a TOC.
image_p03_4.jp2.png.jpg
image_p03_5.jp2.png.jpg

If someone knows how these are suppose to be decoded let me know; Image Magick i can probably combine them and convert to proper images.

The common output formats i see which nearly matches the original PDF file (so i'm pretty sure it's extract embedded images rather than recode), is png, jpg, jp2 (Jpeg 2000), and jpbig (obfuscated text mostly). Though i'm sure there's more.

could you confirm that the "extract pdf pages" and/or "remove pdf pages" options in pdf24 does not alter the quality of the remaining pages? (I do expect it that way, but better to be certain)

If worse comes to worse, qpdf does encoding and managing pdf files, including password removal, protecting/unprotecting documents, compression, and removing pages. Far as i can tell qpdf doesn't alter image contents, though there is an optimize option for images (though not sure what it tries to do).
 
Back
Top Bottom