+4 votes
228 views

Search text in PDF files with pdfgrep


in Linux / Unix by (551k points) | 228 views

1 Answer

+5 votes
Best answer

Installing pdfgrep on Linux

When searching for text within PDF documents , pdfgrep is a complete online command tool that allows us to perform this function.

It presents a way of working very similar to that of grep , with which it maintains many similarities in its form of execution.

The simple way to start with the program is to place ourselves in the directory - where our search object is - with the terminal and execute: pdfgrep followed by the search term and the pdf file in question.

In this example we look for the word "status" in a file called manual.pdf :

one

pdfgrep status manual.pdf

We can make the thing more interesting, including a couple of options by making it "insensitive" (-i) to the subject of upper and lower case , as well as activating the pager (-n ) to show us where the hell is the word we are looking for :

one

pdfgrep -ni status manual.pdf

The tool is compatible with regular expressions , being able to use all types of wildcards such as the one that allows us to search for a text string in several PDF files at once:

one

pdfgrep -ni status *.pdf

Maybe we just want to count how many times a certain term appears :

one

pdfgrep -c status *.pdf

We also have the option to do a recursive search , for this we use the -r parameter, in this case with the include option that delimits the type of files that are going to be subjected to said search (for now all PDFs):

one

pdfgrep -ni -r --include "*.pdf" status

We may only be interested in examining PDFs that begin with a certain word (eg "Python"). It would be something like this:

one

pdfgrep -ni -r --include "Python*.pdf" status

Or maybe the opposite, for that we have the exclude parameter:

one

pdfgrep -ni -r --exclude "Python*.pdf" status

You have more options, which you can consult in the manual or in the help of the program with:

one

pdfgrep --help

Installing pdfgrep on Linux

The application is distributed under free license (GPL v2), being available in the repositories of several GNU / LINUX distributions.

  • Users of Arch Linux or any of its derivatives ( Antergos, Manjaro, Apricity ) can be found in the official repositories:

one

sudo pacman -S pdfgrep

  • In openSUSE Tumbleweed Leap and can be installed from 1 click install.
  • Debian , derivatives like Ubuntu and daughters ( Linux Mint, Elementary OS ) can install it from the terminal with:

one

sudo apt install pdfgrep

  • And finally in Fedora :

one

two

3

su -c

 

dnf install pdfgrep

You have more information about pdfgrep on the project website .


by (551k points)
edited by

Related questions

+5 votes
1 answer
asked Nov 17, 2019 in Linux / Unix by backtothefuture (551k points) | 485 views
+5 votes
1 answer
asked Nov 20, 2021 in Guides by backtothefuture (551k points) | 79 views
+5 votes
1 answer
asked Feb 8, 2023 in Otherapps by backtothefuture (551k points) | 43 views
+4 votes
1 answer
+4 votes
1 answer
Sponsored articles cost $40 per post. You can contact us via Feedback
10,632 questions
10,764 answers
510 comments
3 users