Initially, I aimed to make one big post, but I am splitting it up. This Part 1 is about how I made the frames to generate videos from.
Intro
I have been on and off doing this project for a few months. Many things did not work for me – from ffmpeg to sheer OpenCV/ffmpeg installation on Apple M1 systems – and I am not 100% sure what made it work to this day. I plan to put myself in a situation where I have to do it again, and I will make a guide then. However, right now, I can only say that this video tutorial on running native ffmpeg and this guide from OpenCV themselves are good places to start.
The last time I used OpenCV was in 2017 when I worked on my graduate work. Back then, I processed prerecorded videos with no need to worry about audio. So I had some background prior to this project, but I was in no way ready for the ride.
Buckle up!
Table of contents
- What do I have to create?
- Limitations
- Background
- Phrases
- Audio
- Problems
- Problem 1: GIF on a background
- Problem 2: Putting text on the image
- Problem 3: Text wrapping
- Problem 4: Text blocks positioning
- Results so far
What do I have to create?
The video I have to generate consists of these parts:
- Intro
- video – a GIF on a background;
- audio: – prerecorded narratation + music.
- Title
- video – text in 2 languges on a background;
- audio – generated audio in 2 languages of the respective title.
- Body
- video – German and Russian phrases on a background;
- audio – generated audio of the respective phrases in German and Russian.
- Credits
- video – information about the project, website etc. on a background;
- audio – prerecorded narratation + music.

Limitations
Background
The background here is a static image, and my whole process is written by me this way.
Within its educational nature, the video does not imply a distracting element such as a dynamic background. However, I plan to do an upgrade like this in the future.
Phrases
German and Russian phrases are positioned one above another. The phrases are not terribly long, therefore the font on all of the Body frames is expected to be the same size.
Audio
Music and two parts of audio are prerecorded, others need to be voiced by a library.
Problems
So, there are distinct problems to solve I will talk about next.
Problem 1: GIF on a background
I already have a post about it, go check it out: https://djangokatya.com/2021/07/06/how-to-add-gif-to-a-static-background-python-opencv/
Basically, the source is this (a JPEG background and an animated logo):
And the output is this – a pretty perfect union. To address the quality, this particular GIF is downsized tremendously for the sake of putting it online.

The plan I wrote down in that post was later eliminated by my discovery of ffmpeg.
Problem 2: Putting text on the image
It was a familiar ground. This module is used for Title, Body and eventually Credits.
For text I used PIL, in particaluar ImageFont. For both German and Russian text I had to find and set up fonts. The properties were different, in particular size and color. A piece of code here represents the German block in black, font size 72:
de_font = ImageFont.truetype(german_font, 72, encoding='UTF-8')
de_color = (0, 0, 0)
Where germant_font is a path to my .tff of Time New Roman.
The image itself is a PIL ImageDraw object:
draw = ImageDraw.Draw(img)
And finally to draw the text on the image I used .text:
draw.text(position, de_block, de_color, font=de_font)
Problem 3: Text wrapping
Then, I had to figure out text wrapping. Because if the text is longer than a few words it gets over the trailing edge of the image.
First, I used PIL ImageDraw to figure out the frame itself.
Second, I iterated my phrases to split them into pieces that fit in the image:
draw = ImageDraw.Draw(img) ... // current_ current_line = [] current_line_len = [] for word in text: word_len, h = draw.textsize(word, font) if sum(current_line_len + [word_len]) >= max_length: // new line
Where max_length is the width on the frame. Wiith ImageDraw.textsize I managed to capture the size of the line on my particular frame with the particaul font I set up in Problem 2.
Problem 4: Text blocks positioning
But where exactly should I put these 2 blocks of text?
Horyzontal margin
If you want the text to only have 80% of the horyzontal space, then:
- In Part 3 multiply the image width you are using to wrap text by 0.8 (80%)
- Figure out the ratio
The ration problems is a different thin entirely. You can spend a considerably amount of time figuring out the formula, but I did it differently. As the phrases are limited in length, I can hardcode the ration depending of the number of lines.
So I have a dictionary like this, a matrix pretty much, where keys are representing “GER-lines_RU-lines”:
tables_of_sizes = {
'1_1': [2.8, 1.7],
'1_2': [3, 1.7],
'1_3': [3, 1.5],
'2_1': [3, 1.5],
...
}
If there is only 1 German (top block) line and exactly 3 Russian (bottom block) lines, than their ratio is 3 to 1.5.
Then again using ImageDraw.textsize I get the text_width and text_height of the whole block. I need that for image itself as well – Image.size property, so I get image_width and image_height.
Eventully, the position according to the ratio for the top and botton block would be:
// X
(image_width - text_width) / 2
// Y
(image_height - text_height) / respective_ratio
Where respective_ratio is (in this case) 3 for German (top block) and 1.5 for Russian (bottom block).
And then I draw text blocks on the image with ImageDraw.text().
Results so far
So, the source data is of 2 phrases (both mean “I read a detective by Agatha Christie right now.”):
Ich lese gerade einen Krimi von Agatha Christie.
Как раз сейчас я читаю один детектив Агаты Кристи.
I know which on of the is German, and which one is Russian.
By solving there first 4 problems, I got this frame:

Read further about generating auidos, making clips and a resulting video – in the upcoming parts. 🙂
[…] covered the basics in one of my articles on Video generation with Python, and today I want to write about different ways to assemble text on the image so it does not look […]
LikeLike