When searching for text within PDF documents , pdfgrep is a complete online command tool that allows us to perform this function.
It presents a way of working very similar to that of grep , with which it maintains many similarities in its form of execution.
The simple way to start with the program is to place ourselves in the directory - where our search object is - with the terminal and execute: pdfgrep followed by the search term and the pdf file in question.
In this example we look for the word "status" in a file called manual.pdf :
one |
pdfgrep status manual.pdf |
We can make the thing more interesting, including a couple of options by making it "insensitive" (-i) to the subject of upper and lower case , as well as activating the pager (-n ) to show us where the hell is the word we are looking for :
one |
pdfgrep -ni status manual.pdf |
The tool is compatible with regular expressions , being able to use all types of wildcards such as the one that allows us to search for a text string in several PDF files at once:
one |
pdfgrep -ni status *.pdf |
Maybe we just want to count how many times a certain term appears :
one |
pdfgrep -c status *.pdf |
We also have the option to do a recursive search , for this we use the -r parameter, in this case with the include option that delimits the type of files that are going to be subjected to said search (for now all PDFs):
one |
pdfgrep -ni -r --include "*.pdf" status |
We may only be interested in examining PDFs that begin with a certain word (eg "Python"). It would be something like this:
one |
pdfgrep -ni -r --include "Python*.pdf" status |
Or maybe the opposite, for that we have the exclude parameter:
one |
pdfgrep -ni -r --exclude "Python*.pdf" status |
You have more options, which you can consult in the manual or in the help of the program with:
Installing pdfgrep on Linux
The application is distributed under free license (GPL v2), being available in the repositories of several GNU / LINUX distributions.
- Users of Arch Linux or any of its derivatives ( Antergos, Manjaro, Apricity ) can be found in the official repositories:
one |
sudo pacman -S pdfgrep |
- In openSUSE Tumbleweed Leap and can be installed from 1 click install.
-
Debian , derivatives like Ubuntu and daughters ( Linux Mint, Elementary OS ) can install it from the terminal with:
one |
sudo apt install pdfgrep |
one
two
3
|
su -c
dnf install pdfgrep
|
You have more information about pdfgrep on the project website .