In this article we will cover the steps to extract text from a Video (mp4)

“read” the text embedded in images

You’ll need a source video (downloaded from YouTube with an online converter for instance)

You’ll need to install Tesseract and some other Python packages

Ensure that you have tesseract installed and in your PATH

from PIL import Image

import pytesseract

Once you are familiar with the code in the video below you can adapt it to suit your needs or contact us and ask us to do it for you!

See how to extract text using OCR (Tesseract) with Python code

If the text in the video changes at a regular interval, eg 4 seconds, then use the modulus operator to drop all the frames that you don’t need (the duplicates).

By default OpenCV stores images in BGR format and since pytesseract assumes RGB format we need to convert from BGR to RGB format if we want to work with OpenCV.

How to scrape YouTube videos

YouTube example Full demonstration of how to use Python to download videos and then extract text from them using pytesseract and crop the saved images using Pillow (PIL). As an aside I also show you you could get the text/transcript if that is what you need – although the transcript is also auto generated and not 100% accurate either.

Image to text – how to extract text from a picture:

What this video covers:

⭕ How to download a YouTube video with Python – using youtube-dl

⭕ Extract TEXT from video

⭕ Extract images from video Using the following : -Use pip install youtube_dl to get video -Use Tesseract to get TEXT

