However most approaches are relatively costly to run, especially without a graphic card, so I wanted to see what result we could get with simple and fast methods.
You can find the code for the steps below in this repository.
To start, we need to find the mask, which correspond to the location of pixels to remove.
Using ffmpeg we can extract the timestamps of the key frames in the video. Getting only the timing is fast, and we can later cap to a maximum the number of frames to actually extract. We could also take random frames, but the key frames are more likely to be diverse (can’t be too close in time), and faster to extract as other frames need to be reconstructed from the closest key.
With ffmpeg, we just need to run:
Then for each
TIMESTAMP obtained, we can extract the frame:
And lastly we can aggregate the results of a simple image filter over all frames, to create a mask:
Bonus, if one wants to do it without python, it should be doable using ImageMagick’s
convert with the existing Sobel filter and
-evaluate-sequence mean *.png mean.png.
We now have a global mask to inpaint, so we can simply use ffmpeg’s
removelogo filter to obtain our cleaned-up video:
This run at around x3 real-time on my aging MacBook Pro (i5-5287U), with a fixed overhead of ~5s for the mask extraction, which is fast! ✨
Clearly the resulting inpainting is not of high quality –we’re not even leveraging temporal information across frames– but is reasonable enough as a baseline to compare against real-time approaches.
Author Paul Willot