Grabbing Screen Text with a Shell Script
Recently, I stumbled upon an OCR tool for Linux. However, I didn’t like the idea of needing a GUI app, so I wrote a shell script connecting the right CLI tools instead.
This article dissects “grab-text”, a simple POSIX-compatible shell script I wrote for grabbing text from your screen.
Table of Contents
You can find the full script on GitHub, as the examples have been simplified.
The What & Why
The idea is simple: grab text from the current screen. In reality, it can be quite complicated to do it right.
Unlike the built-in solution of macOS, there’s no equivalent to be found in the usual display/window managers.
In a discussion on Hacker News about a GUI app providing functionality, I found custom shell solutions in the comments that do most of the work by cobbling together a few CLI tools. Given that these scripts were quite simplistic, I decided to create my own version with a few extras.
How to Solve the Task
The overall task breaks down into the following steps:
- Take a screenshot, preferably let the user select a range.
- Process the screenshot to improve text detection.
- Perform OCR on the screenshot.
- Copy the result to the clipboard.
These steps delineate the different tool categories we need to connect to achieve the desired outcome:
- Screenshotter
- Image processing
- Text recognition
- Clipboard management
With the groundwork laid out and the necessary tools identified, we can proceed to how to design the actual script.
What Kind of Script Do We Want?
Before diving into the code itself, we need to consider several key questions:
- Is this script intended as a temporary fix, meant to run a handful of times before being discarded, or is it being developed as a long-term solution?
- Should the script be portable, or does it only run on my machine?
The first question affects the level of documentation and error handling a script does, whereas the second affects the overall approach to the code and available features.
Documentation & Error Handling
In the case of grab-text
, the goal is to serve as a lasting solution.
Therefore, thorough documentation is crucial to ensure that my future self can easily understand and maintain the code without frustration.
When it comes to error management, the script should handle the most common failure modes. So, sufficient error handling from the get-go is a must. Furthermore, the script will most likely be executed from a GUI environment and run in the background, so it must somehow communicate with the user if anything goes wrong.
Script Portability
Writing shell scripts just for yourself can significantly boost your productivity. But making scripts more portable so that other people might use them, regardless of their actual shell environment, is even better!
One thing I recommended in the past is using a more flexible shebang to target bash
:
The benefit here is that rather than using #!/bin/bash
directly, the script opts for the first instance of bash that appears in the user’s $PATH
.
That makes it more portable, like when used on macOS, and you don’t want to use the outdated system version but instead use the one you installed via a package manager.
For grab-text
, though, I wanted a bit more portability than restricting myself to bash
.
If the script doesn’t require any bash
-related features, why not use sh
instead, making the script POSIX compatible?
Let’s take a look at what features we usually need to refrain from using when going the sh
route:
- No shell options via
shopt
- No conditional with
[[ ... ]]
- No
bash
-style arrays $(...)
instead of backticks- Function declaration style might need to be adapted
- No local variables
Given the overall requirements, sticking to POSIX-compliant sh
scripting appears feasible.
Now, it’s finally time to dive into the code!
Refining What the Script Is Supposed to Do
While I’ve already outlined the core task earlier, it doesn’t paint the complete picture of what the script needs to do to achieve its task.
To extend the script’s portability, not just in terms of the used shell but also the used tools, it should support multiple binaries for each category.
With that in mind, and looking at the problem more closely with our “technical glasses”, the overall task looks more like this:
- Detect available binaries.
- Create a working directory and ensure cleanup.
- Take a screenshot.
- Optimize the screenshot.
- Recognize text.
- Copy text to clipboard.
- Notify user of success.
Let’s go over the steps to define what they need to do.
Detect available binaries
There are 5 categories of tools involved:
- Taking a screenshot
- Optimize screenshot
- Recognize text (OCR)
- Copy to clipboard
- Notify user
Even though I could have added alternative options to each category, I’ve decided only to support different screenshotting and clipboard tools. However, the optimization and notification categories can be optional.
To streamline the script and avoid duplicating code, I’ve created a function to locate a binary:
The function accepts a string of space-separated binaries to look for and stores the first one found in the out-of-function variable FOUND_BIN
.
It’s used like this:
Create a working directory and ensure cleanup
For every execution, two files are produced: a screenshot and a text file that holds the recognized text. Both aren’t needed after transferring the text to the clipboard.
Creating a temporary directory is done with mktemp
, and the file names are based on invocation time:
There are systems in place to clean up the temporary directory, yet we should tidy up after ourselves immediately to not clutter up the system unnecessarily.
The best tool for scripts to do that is creating a trap
:
The function _gt_cleanup
checks the first argument to be set and that it’s a directory and then removes it recursively.
The trap
is called on the EXIT
, INT
, and TERM
signals, so whatever happens, the script attempts to remove its files.
Take a screenshot
The call depends on the actual binary used, so a case
statement is required:
To support another screenshot tool, I’d just add it to SCREENSHOT_BINS
to be detected and add another case
clause with the correct arguments to produce a png
file.
Optimize the screenshot
To not force users to install more dependencies than absolutely necessary, this step is optional:
If mogrify
is available, the png
is desaturated and enlarged to improve OCR performance.
Recognize text
The tesseract
call includes the languages to be detected, which are defined in the script’s header.
The call itself is what you’d expect:
Copy text to Clipboard
As there are multiple possible binaries, a case
statement is needed again:
Notify User of Success
The total processing time depends on the screen area chosen. That’s why I decided to add a notification once finished.
To make the code reusable for error handling, too, I created another function:
It accepts a string for the notification’s summary and, optionally, a second argument for its urgency.
More Considerations
Now that we have all the essential parts in place, it’s time for some improvements! Particularly in areas like error handling, as mentioned earlier.
In the same way that we alert the user upon successful completion, it’s just as important to inform them when an error occurs.
Therefore, let’s introduce another function to be called if something goes wrong:
The function simply tries to notify the user with a critical message and then exits the script with 1
.
To use it, just check the previous return code, like after taking the screenshot:
For direct calls, it can be added with the ||
(double-pipe) operator:
And there you have it!
Conclusion
This journey began after reading a few comments on Hacker News, which led to creating a portable script that fits my needs and, hopefully, the needs of others as well.
The laid-out methodical process of writing a simple shell script, complete with thorough documentation and comprehensive error management, may seem overkill for a one-off script. But never forget that any “I’ll only need a once” script might transform into a business-critical tool later. But it’s important to remember that any script you initially thought you’d only use once might become a business-critical tool later.
So save yourself some headaches and invest a little bit more time and keystrokes upfront to write a more polished and easier-to-maintain script.
Your future self will thank you!
Resources
Tools
Supported Screenshot tools:
Supported Clipboard tools:
Optional tools:
- Imagemagick (mogrify) for optimizing the screenshot before OCR
- libnotify (notify-send) for status notifications