Video Generation Journey. Part 1: Making frames (problems 1-4)

Initially, I aimed to make one big post, but I am splitting it up. This Part 1 is about how I made the frames to generate videos from.


I have been on and off doing this project for a few months. Many things did not work for me – from ffmpeg to sheer OpenCV/ffmpeg installation on Apple M1 systems – and I am not 100% sure what made it work to this day. I plan to put myself in a situation where I have to do it again, and I will make a guide then. However, right now, I can only say that this video tutorial on running native ffmpeg and this guide from OpenCV themselves are good places to start.

The last time I used OpenCV was in 2017 when I worked on my graduate work. Back then, I processed prerecorded videos with no need to worry about audio. So I had some background prior to this project, but I was in no way ready for the ride.

Buckle up!

Table of contents

  • What do I have to create?
  • Limitations
    • Background
    • Phrases
    • Audio
  • Problems
    • Problem 1: GIF on a background
    • Problem 2: Putting text on the image
    • Problem 3: Text wrapping
    • Problem 4: Text blocks positioning
  • Results so far

What do I have to create?

The video I have to generate consists of these parts:

  1. Intro
    1. video – a GIF on a background;
    2. audio: – prerecorded narratation + music.
  2. Title
    1. video – text in 2 languges on a background;
    2. audio – generated audio in 2 languages of the respective title.
  3. Body
    1. video – German and Russian phrases on a background;
    2. audio – generated audio of the respective phrases in German and Russian.
  4. Credits
    1. video – information about the project, website etc. on a background;
    2. audio – prerecorded narratation + music.
An example of a static background



The background here is a static image, and my whole process is written by me this way.

Within its educational nature, the video does not imply a distracting element such as a dynamic background. However, I plan to do an upgrade like this in the future.


German and Russian phrases are positioned one above another. The phrases are not terribly long, therefore the font on all of the Body frames is expected to be the same size.


Music and two parts of audio are prerecorded, others need to be voiced by a library.


So, there are distinct problems to solve I will talk about next.

Problem 1: GIF on a background

I already have a post about it, go check it out:

Basically, the source is this (a JPEG background and an animated logo):

And the output is this – a pretty perfect union. To address the quality, this particular GIF is downsized tremendously for the sake of putting it online.

The result on adding a GIF on a static background

The plan I wrote down in that post was later eliminated by my discovery of ffmpeg.

Problem 2: Putting text on the image

It was a familiar ground. This module is used for Title, Body and eventually Credits.

For text I used PIL, in particaluar ImageFont. For both German and Russian text I had to find and set up fonts. The properties were different, in particular size and color. A piece of code here represents the German block in black, font size 72:

        de_font = ImageFont.truetype(german_font, 72, encoding='UTF-8')
        de_color = (0, 0, 0)

Where germant_font is a path to my .tff of Time New Roman.

The image itself is a PIL ImageDraw object:

draw = ImageDraw.Draw(img)

And finally to draw the text on the image I used .text:

draw.text(position, de_block, de_color, font=de_font)

Problem 3: Text wrapping

Then, I had to figure out text wrapping. Because if the text is longer than a few words it gets over the trailing edge of the image.

First, I used PIL ImageDraw to figure out the frame itself.

Second, I iterated my phrases to split them into pieces that fit in the image:

    draw = ImageDraw.Draw(img)


    // current_
    current_line = []
    current_line_len = []

    for word in text:
        word_len, h = draw.textsize(word, font)
        if sum(current_line_len + [word_len]) >= max_length:
           // new line

Where max_length is the width on the frame. Wiith ImageDraw.textsize I managed to capture the size of the line on my particular frame with the particaul font I set up in Problem 2.

Problem 4: Text blocks positioning

But where exactly should I put these 2 blocks of text?

Horyzontal margin

If you want the text to only have 80% of the horyzontal space, then:

  1. In Part 3 multiply the image width you are using to wrap text by 0.8 (80%)
  2. Figure out the ratio

The ration problems is a different thin entirely. You can spend a considerably amount of time figuring out the formula, but I did it differently. As the phrases are limited in length, I can hardcode the ration depending of the number of lines.

So I have a dictionary like this, a matrix pretty much, where keys are representing “GER-lines_RU-lines”:

tables_of_sizes = {
    '1_1': [2.8, 1.7],
    '1_2': [3, 1.7],
    '1_3': [3, 1.5],
    '2_1': [3, 1.5],

If there is only 1 German (top block) line and exactly 3 Russian (bottom block) lines, than their ratio is 3 to 1.5.

Then again using ImageDraw.textsize I get the text_width and text_height of the whole block. I need that for image itself as well – Image.size property, so I get image_width and image_height.

Eventully, the position according to the ratio for the top and botton block would be:

// X
(image_width - text_width) / 2

// Y
(image_height - text_height) / respective_ratio

Where respective_ratio is (in this case) 3 for German (top block) and 1.5 for Russian (bottom block).

And then I draw text blocks on the image with ImageDraw.text().

Results so far

So, the source data is of 2 phrases (both mean “I read a detective by Agatha Christie right now.”):

Ich lese gerade einen Krimi von Agatha Christie.
Как раз сейчас я читаю один детектив Агаты Кристи.

I know which on of the is German, and which one is Russian.

By solving there first 4 problems, I got this frame:

Read further about generating auidos, making clips and a resulting video – in the upcoming parts. 🙂

One thought on “Video Generation Journey. Part 1: Making frames (problems 1-4)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.