+5 votes
487 views
Pdfgrep commands to search in PDF files Terminal Linux

in Linux / Unix by (551k points)
reopened | 487 views

1 Answer

+3 votes
Best answer

1. Install Pdfgrep on Linux
2. Use Pdfgrep in Linux

The operating systems are based on command lines that offer us multiple options to increase the distribution capabilities to be able to execute searches, administration actions, support and much more..

Just one of these options is linked to the possibility of searching for certain types of files in Linux and thus easily access their content and that is why today we will talk about pdfgrep which is focused on the search for PDF files .

What is pdfgrep
Pdfgrep is a command line utility to search text in PDF files in a simple and functional way saving us time to access each file and search the text with our own PDF tools.
Some of its features are:
  • Compatible with Grep, we can execute many grep parameters such as -r, -i, -no -c.
  • Ability to search text in multiple PDF files
  • Featured colors, this GNU Grep color option is supported and enabled by default.
  • Supports the use of regular expressions.
  • Free software

To keep up, remember to subscribe to our YouTube channel! SUBSCRIBE


1. Install Pdfgrep on Linux

Step 1

In this case we will use Ubuntu so it is enough to run the following line. There we enter the letter S to accept the download and installation of the packages.
 sudo apt install pdfgrep 

image

Step 2

Other installation options are:
  • Download the .TAR.GZ file at the following link.
Pdfgrep
Step 3
  • Or execute the following command:
 git clone https://gitlab.com/pdfgrep/pdfgrep.git 
Step 4

Then enter each of the following lines in your order:
 ./configure make sudo make install 

2. Use Pdfgrep in Linux

Step 1

Once pdfgrep is installed, this will be the syntax to use:
 pdfgrep [OPTION ...] PATTERN [FILE] 
Step 2

Each of the elements are:
  • Option: Indicates the attributes that we can add in the search, for example -i or --ignore-case , which ignore the distinction of upper and lower case letters between the pattern we have indicated and the one that should match the file.
  • Pattern: Indicates an extended regular expression.
  • File: It is the PDF file where the search is to be executed.
Step 3

We will start with a simple search, for example, we will look for the word TechnoWikis in the file TechnoWikis.pdf, for this we execute the following:
 pdfgrep TechnoWikis TechnoWikis.pdf 
image
Step 4

In this case there is only once this term in that file, but, now we will look for the term Windows in an official Microsoft PDF file and this will be the result we will see: image
Step 5

We can see that the searched word is highlighted which facilitates its location. Now, if we add the -in parameter , it will be possible to see the results with the page number where that term has been detected: image
Step 6

Another option that we can use with pdfgrep is to list the PDF file (s) that contain a certain term, for this we execute the following:
 pdfgrep TechnoWikis * pdf 
Step 7

In this way the PDF file where the term TechnoWikis is found will be listed: image
Step 8

If we want to open the PDF file we can execute the following command:
 xdg-open (File.PDF) 
image
Step 9

The general options offered by pdfgrep are:
-i, --ignore-case
Ignore case distinctions both at the source and in the input files.
-F, --fixed-strings
Interpret PATTERN as a list of fixed chains separated by new lines.
--cache
Use a cache for the rendered text to speed up the operation on large files.
-P, --perl-regexp
Interpret PATTERN as a regular expression compatible with Perl (PCRE).
-H, --with-filename
Print the file name for each match.
-h, --no-file name
Deletes the file name prefix in the output.
-n, --page-number
Prefix each match with the page number where the search term was found.
-c, --count
Suppress normal output and, instead, print the number of matches for each input file.
-p, - Page Counting
Print the number of matches per page. It implies -n.
--color
It allows highlighting file names, page numbers and text matching different sequences to display them in color in the terminal, some of its options are Always, neck or automatic.
-o, --only-matching
Print only the coincident part of a line without any surrounding context.
-r, --recursive
It allows us to recursively search all files (restricted by --include and --exclude) under each directory, following symbolic links only if they are on the command line.
-R, - reference-recursive
Same as -r, but follow all symbolic links.
-quiet or -q
It allows us to exit the application.

With this pdfgrep becomes an ideal solution when working with PDF files in Linux environments..


by (3.5m points)
edited

Related questions

+3 votes
1 answer
+4 votes
1 answer
asked Feb 25, 2020 in Linux / Unix by backtothefuture (551k points) | 230 views
+5 votes
1 answer
asked Nov 14, 2019 in Linux / Unix by backtothefuture (551k points) | 250 views
+5 votes
1 answer
asked Nov 15, 2019 in Linux / Unix by backtothefuture (551k points) | 287 views
+5 votes
1 answer
asked Aug 23, 2019 in Linux / Unix by backtothefuture (551k points) | 300 views
Sponsored articles cost $40 per post. You can contact us via Feedback
10,634 questions
10,766 answers
510 comments
3 users