Motivation
When we download a copyrighted movie from a questionable source, it often turns out there was a catch to it. One of such common flaws that I encounter, never mind why, is subtitles in Turkish. No, I do not speak it. And even if I wished to watch a movie with subtitles in the respective language, be it whichever one, I would certainly prefer to have them rendered by my movie player so that I can customize their looks.
So, it seems that video with overlayed subtitles is scarred for ever. Before you start to cover the bottom quarter of your television with black duct tape, I would like to present a hopefully less invasive approach.
Algorithm
Let us analyse the situation. Each subtitle shows up, then remains constant for a few seconds, and then again instantly disappears. This is a distinctive property which makes subtitles an easy target to search and destroy. Their colors vary from white to yellow and from gray to black, which may provide a clue but not a reliable trace to follow.
In order to let the program run faster, we can limit the area of interest to the bottom quarter of the screen. We also certainly wish to pose some limits on minimal duration of a subtitle so that we gain some robustness to noise. Surprisingly, these rules alone provide a very good algorithm for subtitle detection. Not surprisingly, the algorithm is easy to implement and takes at most quadratic execution time as it may be necessary to try all begin-end pairs of frames. Usually, this is not the case and it runs in linear time, given that the subtitle duration is a constant.
That is: we pick a minimal duration of a subtitle and keep a rolling buffer of this many frames. For each starting position, we find the area of pixels that remain constant in all of the buffered frames and that have considerably changed a single frame before them. We can spare some time by calculating a rolling average but we still need to check each frame again after any update of that average, so going through the whole buffer is fairly necessary. (No, I do not want to build a gazillion search trees.) If such a constant area has a suitable shape to be a subtitle, most notably if it is of reasonable size, we declare that we found the beginning of a subtitle and we switch state to finding the end of it.
When looking for the end of a subtitle, we keep the average intact (so as not to surrender linear time that easily) and we just check incoming frames against it. If the constant area remains large enough, we append the new frame to our buffer. If there is a considerable change in all pixels of the subtitle, we have found the end, so we remove it from all the buffered frames and start from the beginning. What remains is the unfortunate case when the constant area looses pixels frame after frame until it drops below our size constraints. That means we started in a frame that was actually not a beginning of a subtitle. The only correct option is to flush the starting frame in question and then process the whole buffer again.
What remains is an inpainting algorithm. For the time being, we can relay that task to the OpenCV library.
Program
Download the source here (7.3K). It requires OpenCV and a tiny bit of C++11, so a suitable build command for GCC is:
g++ -O3 subre.cpp -lopencv_videoio -lopencv_photo -lopencv_core -lopencv_imgproc -lopencv_highgui --std=c++11 -o subtitle_removerIf you still have OpenCV 2.x, just remove the first
#define
in the source file and drop one -l
option:
g++ -O3 subre.cpp -lopencv_photo -lopencv_core -lopencv_imgproc -lopencv_highgui --std=c++11 -o subtitle_remover
Feel free to edit and share. I would be grateful if you acknowledge my authorship but you are not obliged to do so. It seems I do not know how to cross-compile for Windows, let alone to link opencv statically. Linux users presumably will know how to build this program themselves.