How to search text in PDF files?

Question

1 Answer

Best answer

When searching for text within PDF documents , pdfgrep is a complete online command tool that allows us to perform this function.

It presents a way of working very similar to that of grep , with which it maintains many similarities in its form of execution.

The simple way to start with the program is to place ourselves in the directory - where our search object is - with the terminal and execute: pdfgrep followed by the search term and the pdf file in question.

In this example we look for the word "status" in a file called manual.pdf :

one	pdfgrep status manual.pdf

We can make the thing more interesting, including a couple of options by making it "insensitive" (-i) to the subject of upper and lower case , as well as activating the pager (-n ) to show us where the hell is the word we are looking for :

one	pdfgrep -ni status manual.pdf

The tool is compatible with regular expressions , being able to use all types of wildcards such as the one that allows us to search for a text string in several PDF files at once:

one	pdfgrep -ni status *.pdf

Maybe we just want to count how many times a certain term appears :

one	pdfgrep -c status *.pdf

We also have the option to do a recursive search , for this we use the -r parameter, in this case with the include option that delimits the type of files that are going to be subjected to said search (for now all PDFs):

one	pdfgrep -ni -r --include "*.pdf" status

We may only be interested in examining PDFs that begin with a certain word (eg "Python"). It would be something like this:

one	pdfgrep -ni -r --include "Python*.pdf" status

Or maybe the opposite, for that we have the exclude parameter:

one	pdfgrep -ni -r --exclude "Python*.pdf" status

You have more options, which you can consult in the manual or in the help of the program with:

one	pdfgrep --help

Installing pdfgrep on Linux

The application is distributed under free license (GPL v2), being available in the repositories of several GNU / LINUX distributions.

Users of Arch Linux or any of its derivatives ( Antergos, Manjaro, Apricity ) can be found in the official repositories:

one	sudo pacman -S pdfgrep

In openSUSE Tumbleweed Leap and can be installed from 1 click install.
Debian , derivatives like Ubuntu and daughters ( Linux Mint, Elementary OS ) can install it from the terminal with:

one	sudo apt install pdfgrep

And finally in Fedora :

one

two

3

su -c

dnf install pdfgrep

You have more information about pdfgrep on the project website .

answered Feb 25, 2020 by backtothefuture (552k points)
edited Feb 25, 2020 by backtothefuture

How to search text in PDF files?

Search text in PDF files with pdfgrep

Your answer

1 Answer

Installing pdfgrep on Linux

Your comment on this answer:

Related questions