FFmpeg to infinity and beyond!

My love for FFmpeg knows no bounds and whenever I need to do some slick slicing & splicing I always go with FFmpeg (or Avidemux).

Command line is bash and you’ll obviously need FFmpeg.

Turning a video into a high quality GIF

Turning a video that is over 256 colors into a GIF is an impossible task if you wish to keep all the color information, the GIF format just isn’t designed for that at all.
But what if there was a way to still convert the video into an acceptable GIF?

ffmpeg -i input.mp4 \
    -vf "split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" \
    -loop 0 output.gif

It is possible to generate a color palette with FFmpeg and use it (thanks to a SuperUser post).

The explanation for the video filters used boils down to:

  • palettegenpaletteuse generate and use the generated custom palette;
  • split helps making this command a one liner because the usual way would be to output the palette to a file before using it;
  • loop set to 0 means we have a repeating GIF, otherwise it will loop with the provided number (and no loop is playing only once).

With a complex filter it’s possible accelerate the gif too:

ffmpeg -i input.mp4 \
    -filter_complex "split[s0][s1];[s0]palettegen[p];[s1]setpts=0.5*PTS[s1_fast];[s1_fast][p]paletteuse" \
    -loop 0 output.gif

But my source is high resolution, can I scale it down? Yes!

ffmpeg -i input.mp4 \
    -vf "scale=320:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" \
    -loop 0 output.gif

The scale filter can be set to keep the aspect ratio with -1 like in this case with 320:-1 but that’s not all ! We can also reduce the framerate by prepending fps=10.

While I understand that 10 FPS is pretty low, this is GIF territory so it’s not the best thing ever en 2024+.

Keep in mind that GIFs at 60FPS don’t perform well and are pretty heavy on space. So that why you should reduce the framerate with the -r parameter or in the -vf parameter.

Scroll text on a video

This is one part of the thing I do on the music videos I download from YouTube to use while I’m streaming (of course the audio never makes it in the VOD for obvious DMCA reasons).

For the story I decided to display part of the video on my overlay but have the title scroll (when it’s wider than the video). I won’t be sharing my script but I will be sharing the command line I use with FFmpeg to make text scroll.

# Do not forget to escape quotes, commas and colons too
TITLE="My super song\\: Ayyy LMAO - By the \\\"Senpai\\\""

ffmpeg -i input.mp4 \
    -acodec copy -vcodec h264_nvenc \
    -vf "drawtext=text=\'${TITLE}\':fontfile=./NotoSansJP-VariableFont_wght.ttf:y=(h-text_h)/2:x=w-mod(max(t-0\\,0)*(120)\\, 2*(tw+150)):fontcolor=ffcc00:fontsize=70:shadowx=2:shadowy=2:box=1:boxborderw=3:boxcolor=#000000AA" \
    -b:v 8M -rc vbr -cq 19 -preset slow -profile:v main \
    output.mp4

To avoid a gap in the scrolling it’s possible to add a second filter that will be delayed to avoid having a small gap:

# Do not forget to escape quotes, commas and colons too
TITLE="My super song\\: Ayyy LMAO - By the \\\"Senpai\\\""

ffmpeg -i input.mp4 \
    -acodec copy -vcodec h264_nvenc \
    -vf "drawtext=text=\'${TITLE}\':fontfile=./NotoSansJP-VariableFont_wght.ttf:y=(h-text_h)/2:x=w-mod(max(t\\,0)*(120)\\, 2*(tw+150)):fontcolor=ffcc00:fontsize=70:shadowx=2:shadowy=2:box=1:boxborderw=3:boxcolor=#000000AA,drawtext=text=\'${TITLE}\':fontfile=./NotoSansJP-VariableFont_wght.ttf:y=(h-text_h)/2:x=w-mod(max(t-(tw+150)/(120)-0\\,0)*(120)\\, 2*(tw+150)):fontcolor=ffcc00:fontsize=70:shadowx=2:shadowy=2:box=1:boxborderw=3:boxcolor=#000000AA" \
    -b:v 8M -rc vbr -cq 19 -preset slow -profile:v main \
    output.mp4

Of course you’ll need to encode the video and have the appropriate fonts installed for this to work. The original audio has been kept because we do not need to re-encode it.

Let’s breakdown the filter element by element as it is really messy:

drawtext=
  text=\'${TITLE}\'
  fontfile=./NotoSansJP-VariableFont_wght.ttf
  y=(h-text_h)/2
  x=w-mod(max(t\\,0)*(120)\\, 2*(tw+150))

First we use the filter called drawtext and it will take multiple parameters:

  • text will be the input text, as we are passing it from the command line we need to escape the quotes, colons and commas.
  • fontfile is a relative path to a font, in my case I went with NotoSansJP as it has support for Japanese characters too;
  • xy are the position of the text and instead of using a static position we pass a formula:
    • w stands for the width of the video;
    • tw or text_widthis the width of the rendered text;
    • text_h or th is the height of the rendered text;
    • t represents the timestamp in seconds and can be used for math operations, in this case we use it make the text scroll at a proper speed;
    • Doing (h-text_h)/2 simply means we are centering text horizontally.
  • The other parameters are not really important.

The trick to having scrolling text with no gap if by having the same filter repeated a second time but with a delayed x parameter as such:

x=w-mod(max(t-(tw+150)/(120)\\,0)*(120)\\, 2*(tw+150))

In this case the 150 we had as the second parameter to mod is used in the first parameter with the text width parameter. In fact there’s no delay, the text is just positioned off screen. The following values can be called:

  • 150: Margin;
  • 120: Speed;
  • 0: Start time offset.

For a simplified x parameter (pseudo code explanation):

MARGIN=150
SPEED=120

# First title
x=
  w-mod(
    max(t, 0) * (${SPEED}), 2 * (tw+${MARGIN})
  )

# Second title
x=
  w-mod(
    max(t - (tw + ${MARGIN}) / (${SPEED}), 0) * (${SPEED}), 2 * (tw + ${MARGIN})
  )

It’s confusing but that’s how it works:

# Do not forget to escape quotes, commas and colons too
TITLE="My super song\\: Ayyy LMAO - By the \\\"Senpai\\\""
MARGIN=150
SPEED=120

ffmpeg -i input.mp4 \
    -acodec copy -vcodec h264_nvenc \
    -vf "drawtext=text=\'${TITLE}\':fontfile=./NotoSansJP-VariableFont_wght.ttf:y=(h-text_h)/2:x=w-mod(max(t-0\\,0)*(${SPEED})\\, 2*(tw+${MARGIN})):fontcolor=ffcc00:fontsize=70:shadowx=2:shadowy=2:box=1:boxborderw=3:boxcolor=#000000AA,drawtext=text=\'${TITLE}\':fontfile=./NotoSansJP-VariableFont_wght.ttf:y=(h-text_h)/2:x=w-mod(max(t-(tw+${MARGIN})/(${SPEED})-0\\,0)*(${SPEED})\\, 2*(tw+${MARGIN})):fontcolor=ffcc00:fontsize=70:shadowx=2:shadowy=2:box=1:boxborderw=3:boxcolor=#000000AA" \
    -b:v 8M -rc vbr -cq 19 -preset slow -profile:v main \
    output.mp4

In testing it appeared that sometimes long titles might glitch out a bit and not have a background. Please do some testing and tweak the commands as needed to fill your needs.

Two scrolling titles might not be needed anyway!

Resize the video

Part two of what I do the YouTube music video I use on my stream, I resize them to an appropriate size so it doesn’t take all the screen but still keeps the proper aspect ratio.

ffmpeg input.mp4 \
    -acodec copy -vcodec h264_nvenc \
    -vf "scale=640:-1,crop=in_w:120:0:in_h/2" \
    -b:v 8M -rc vbr -cq 19 -preset slow -profile:v main \
    output.mp4

In this case I’ve used my GPU to do some encoding, feel free to encode however you want. The original audio has been kept because we do not need to re-encode it.

You can combine the filter that adds the scrolling text with the resized output but ordering as follows:

  • Resize the video;
  • Use a comma , to separate filters;
  • Add the scrolling text.

It’s important to keep things in order and you’ll have this big fat command that I’ve reworked a bit to make it easier to appreciate:

# Do not forget to escape quotes, commas and colons too
TITLE="My super song\\: Ayyy LMAO - By the \\\"Senpai\\\""
MARGIN=150
SPEED=120

# Font and box parameters
FONTFILE="./NotoSansJP-VariableFont_wght.ttf"
FONTCOLOR="ffcc00"
BOXCOLOR="000000AA"
FONTPARAMS="fontcolor=${FONTCOLOR}:fontsize=70:shadowx=2:shadowy=2:box=1:boxborderw=3:boxcolor=#${BOXCOLOR}"

# Title scrolling
TITLE_ONE="y=(h-text_h)/2:x=w-mod(max(t-0\\,0)*(${SPEED})\\, 2*(tw+${MARGIN}))"
TITLE_TWO="y=(h-text_h)/2:x=w-mod(max(t-(tw+${MARGIN})/(${SPEED})-0\\,0)*(${SPEED})\\, 2*(tw+${MARGIN}))"

# Scaling (+lanczos as an example) and cropping
RESIZE_LANCZOS="scale=640:-1,crop=in_w:120:0:in_h/2"
RESIZE="scale=640:-1:flags=lanczos,crop=in_w:120:0:in_h/2"

ffmpeg -i input.mp4 \
    -acodec copy -vcodec h264_nvenc \
    -vf "${RESIZE},drawtext=text=\'${TITLE}\':fontfile=${FONTFILE}:${TITLE_ONE}:${FONTPARAMS},drawtext=text=\'${TITLE}\':fontfile=${FONTFILE}:${TITLE_TWO}:${FONTPARAMS}" \
    -b:v 8M -rc vbr -cq 19 -preset slow -profile:v main \
    output.mp4

And this was my secret behind the resized, cropped music videos with scrolling text.

Homework: Combine EVERYTHING into a GIF

It’s as simple as adding the full filter for generating the palette and the gif from the palette. I kid you not! But this is your homework on how to make that work: just don’t forget to remove audio and video encoding.

Beyond infinity

I’d recommend to check ffmprovisr which is a very good guide on how to do some things.

As I don’t support AI I can’t really recommend using ChatGPT to write FFmpeg commands. It might help you with some stuff keep in mind that it can be confidently wrong.

Disable NVIDIA overlay only in certain apps

Having the NVIDIA overlay be enabled on unwanted apps can have one of two unwanted side effects:

  • The overlay can’t be used for games: no ShadowPlay recording is possible;
  • A folder will be created in the folder where you save your recordings.

This can get very annoying very quickly when using apps such as Godot, Elgato’s Stream Deck and Wave Link software and such…

NVidia Profile Inspector

To tweak what app should use the overlay you will need the NVidia Profile Inspector.

Select a profile or create one and add the executables to it.

Editing setting 0x809D5F60 and setting it to 0x10000000 will disable the overlay for the given apps. If the setting doesn’t appear you will need to click the button to show the unknown settings.

The overlay should no longer hook into the app on the next launch.

Conclusion

It’s a shame to have to use the NVIDIA Profile Inspector to disable the overlay in some apps.

Source: https://redd.it/89mtzr

Queueing events in VNyan

VNyan is a VTubing software used for animating a 3D model for streaming and content creation. I’ve been using it for the last couple of months because of some of its features such as the nodes system.

Nodes are a way to visually script events based on Twitch, WebSockets or whatever is supported in the app. Lately I’ve been thinking about doing a bit more and giving the possibility to just go wild and queue events such as model swap.

Version 1.2.1b is required, the update introduces a wizard that can be skipped if you already have VNyan setup.

What is a queue?

We want to push items to the end of the array and take items from the top of it.
This is what we call first in first out. It’s the absolute basic idea behind the queueing of events.

The plan

I want to setup commands to do stuff and have them be processed in order with a short delay between each commands. The commands will be normal Twitch chat commands to make things easier.

We will need two things:

  • An interval timer that will be triggered to run when the queue is not empty;
  • The event handling that will pop the first element on the top of the queue and run it;
  • The logic needed to tell when the loop must run and what it is stopped.

In my example I will have two commands:

  • !throwApple which will throw an apple at me;
  • !throwDuck which will throw a rubber duck at me.

This can work for doing model swaps but will require a bit more thought and care since you might want to animate that (through effects or transformations) and you will want to take into account the loading delay.

Queuing in VNyan

First of all we need to create the logic that will handle the events, the entry point will be a trigger and it will start a timer that will call that trigger again until the queue is empty.

The events will be handled by conditions that will be checked, if true you’ll apply you action and you’ll delay another call of the event loop. If false you’ll just flag the event queue as finished and not call it again (until another command is called).

Each command will push an element into the queue and it if the event queue isn’t running it will call the event trigger and setup a flag.

 

It’s important to note that we use the Enqueue Text Array and the Dequeue Text Array.

Prior to this version dequeueing an item was done by using an Ordered Execution node that would get an item from the queue and remove the item by index.

Making an overhead camera streaming setup for gunpla

I have a passion an addiction called Gunpla, it’s about building models from the Mobile Suit Gundam franchise which is basically a giant robot franchise. Giant robots are great.

While thinking about how to change a bit my streaming content I thought about the huge 13 boxes of Gunpla I have not built yet and that has been staying unopened for a couple of months. Basically I thought about an excuse to force myself to take time to build.

A passion

I first fell in love with Gunpla back in December 2021 and I have been building until October 2022. I stopped since I had planned to travel to Japan between November and December.

I have currently built over 30 models, mostly from the Mobile Suit Gundam franchise, and while I’m starting to have some pretty crowded shelves I have not intention of stopping.

Call it an addiction if you want.

Mobile Suit Gundam

That one big mecha anime franchise that appears to be really hard to get into until you realize it’s easy. Pick wherever you wanna start and start.

Most series have their own timeline and are standalone such as:

  • Iron Blooded Orphans;
  • Mobile Suit Gundam 00;
  • Mobile Suit Gundam SEED (& SEED Destiny);
  • Mobile Suit Gundam Wing;
  • The Witch from Mercury.

The list does go on.
And then some are considered as the main timeline called Universal Century (UC):

  • Mobile Suit Gundam;
  • Mobile Suit Gundam Z and ZZ;
  • Char’s Counterattack;
  • War in the pocket;
  • Stardust Memories;
  • Mobile Suit Gundam Unicorn;
  • Hathaway’s Flash.

And while this isn’t a complete a list of course you should feel free to pick it up anywhere you like. I started with Hathaway’s Flash and Unicorn, you will probably miss some plot points but whatever gets you started is valid.

For real just pick wherever you wanna start, don’t care about people judging you, care only about getting into it.

Not just models

Gunpla comes from Gundam and Plamo, and Plamo comes from Plastic and Models. The truth is that they are much more than just mere models you look at: they are fun to build and pose.

Unlike Games Workshop’s Warhammer (40K) franchise you get to built you model without needing glue nor paint and the figures can move thanks to over engineering part that just move in the most satisfying way.
I never thought I’d be blown away by how a plastic model can move a legs and feet but I still am to this day. Some models have some very precise movement that actually does improve the movement range beyond what you would expect.

Am I fanboying over plastic models? Yeah I am.

I’d personally recommend to built the RG Nu Gundam to whoever got into Gunpla and is looking to try the best model there is to build out there at the time of writing.

Setting up a stream camera

Let’s get into the main subject after presenting the hobby and my addiction.

I’m the happy owner of an Avermedia GC551 capture card, a 7 meter long HDMI cable, a Canon 90D and a RØDE VideoMic GO II. This is all you need to get started with showing an image if you know what you are doing.

The basic setup I was going for at the beginning was pretty terrible since I was using a tripod on the side but I’ve since then upgraded to Elgato’s Master Mount L and Flex Arm L to hold my camera in an overhead view… And promptly sent it back because the Flex Arm can’t hold a DSLR when it weighs just 1Kg…

Elgato Flex Arm L used with a camera, as seen on Elgato.com

I should have thought about why they don’t specify how much load it can take and some things are surprising…

How it performs with a DSLR (>1kg)

So if the Elgato Flex Arm isn’t an option, what else can I try? Well there’s this Tarion Camera Destop Arm (spelt like that). Am I throwing money to the problem to check what sticks? Yes…

Up to 3Kg should do it right?!

If we take a closer look it’s shorter, with the vertical pole being only 52cm long compared to the Elgato one going up to 128cm. Is it going to be an issue?

If I’m using my Canon EF 50mm f/1.8 STM Lens I need at least 1 meter of distance between the lens and the table to properly record the surface. Buying a 35mm lens would help but it would increase the cost too much for it to be worth right now.

After fiddling a bit with it I’m able to clear the distance and find out that I was wrong: I need more than 1 meter of clearance. So now I’m looking at the SmallRig RA-S280A Air-cushioned Light Stand with Arm 3737, it’s called a light stand but it will be perfect for the camera.

SmallRig can take this DSLR

I can also use a counterweight on the handle thanks to a very handy screw in hook and what it looks like from the point of view of the stream ?

It’s perfect for what I need.

What’s next?

Next up is setting up a microphone and an iPhone 11 to capture my voice and facial expressions. Let’s do a combo.

I’m mounting the phone on a Joby GorillaPod 5K with a K&F Concept CA02 Quick Release Plate clip and on top of that I’m putting the RØDE VideoMic GO II on that clip. The mic is connected to the camera and the sound is then fed into the the capture card over HDMI.

The GoPro Hero 11 with the 3-Way Tripod is used as a webcam to complete the body tracking and needs to be facing me.

Adding iPhone facetracking, with webcam body tracking ?

Oh yeah, because I’m not streaming in front of my desktop BEEF PC I need to connect a camera to track my body… My laptop is a MacBook Pro M1 Pro, I’m lucking that I can connect my GoPro as a webcam (as it need to be centered) and I can run OpenSeeFace, the facetracking backend for VSeeFace.

OpenSeeFace requires Python 3, I tried using Python 3.10 but one dependency doesn’t seem to exist for my MacBook, so I’ve opted for Python 3.9 and used the following commands to install and run:

# Clone or download the zip
git clone git@github.com:emilianavt/OpenSeeFace.git
cd OpenSeeFace

# Install the latest Python version & pew for easy venv
brew install python@3.9
pip install pew

# Create an environment for the appropriate Python version
pew new openseeface -p3.9
# Or run you venv
pew workon openseeface

# Install dependencies
pip install onnxruntime opencv-python pillow numpy

# Run facetracker.py
python facetracker.py -c 2 -F 30 -W 1920 -H 1080 \
    --model 4 --gaze-tracking 1 --discard-after 0 \
    --scan-every 0 --no-3d-adapt 1 --max-feature-updates 900 \
    --log-output output.log --max-feature-updates 900 \
    --ip 192.<the_computer_with_vseeface> -p 11573

Now to explain how to select the camera:

  • Parameter -c is for selecting the number associated with the camera;
  • Parameter -F is for the framerate of the camera;
  • Parameters -W and -H are for the camera resolution;
  • Parameter --model improves accuracy, I don’t know why I was able to go with 4 but this is the command line I got from VSeeFace running on Windows and it ran on Mac OS.

Sadly, without running the OpenSeeFace scripts on PC you will have to guess the camera number, frame rate and resolution… Yeah that’s how those things work when there’s no support for Linux and Mac.
My time to shine with a pull request, I should look into that.

For the iPhone I’m running iFacialMocap and this is all you need for your facial expressions, but do setup blendshapes for your model and face, this post isn’t about this.

I only use iFacialMocap and skip the webcam for body tracking, this makes the setup easier but I still wanted to write up all that in case I change my mind.

Conclusion

I didn’t cover the VTubing aspect of it because it’s out the scope of the video but I’ll have to write a post about some of that and explain my setup since my desktop computer is used for streaming but I’m not building anything on my desk.

I need to improve the lighting and maybe do some color calibration for streaming a better image quality.

Stop upscaling video to 4K60FPS

Updated 2023/01/03: Added a bit about Frieren EP9.

On YouTube there’s always someone using AI to upscale to 4K and interpolate frames in anime openings. Some people think it looks good because it “moves” more and they are wrong.

I like referencing the video of Noodle on Smoother animation ≠ Better animation. You do not need to watch this video to understand this post but I still recommend watching it as Noodle is an actual animator as I don’t plan on just repeating what this video says.
I’ll be using my own words and examples.

What is upscaling?

When it comes to images and videos upscaling is increasing the dimension of the frame. To illustrate this example let’s take a 102 by 102 pixel image and upscale it using Photoshop and Waifu2x:

Click to enlarge

Doubling the image’s dimension does harm the quality of the image by doubling and softening the pixels. The 4 times image is unusable and down right bad.

Waifu2x uses a deep convolutional neural network to try minimize the quality loss and retain the detail from the original image. It works quite nicely bit but it far from perfect. Other AI enhancements can be used to upscale images.

In all the illustrated cases we are taking an image that is 102 pixels by 102 and adding more pixels. The only thing we can do is thus interpolating.

What is interpolating?

Interpolation is a very complicated subject that can be basically watered down to estimating new data based on existing data.

In the case of this post I already referenced interpolating pixels to fill in the blanks when upscaling an image. If we took an image and did not interpolate pixels we would retain the original pixels but spaced out in a grid like this:

Click to enlarge

Since I’m increasing the size from 102 by 102 pixels to 408 by 408 pixels I’ve blacked out the pixels that will have to be estimated. Different sharpening and scaling algorithms will process the images differently and alter the output.
This example is only illustrating what is missing and what needs to be generated.

The same can be explained for increasing the framerate from 23.976 to 60FPS. Wait a second, 23.976FPS to 60FPS. Decimal numbers?

That’s a fun framerate to multiply to 60. Where are my comfy integer numbers? I like whole numbers. Anything that floats is complicated and scary.

Why interpolating DOESN’T work?

If upscaling images with AI can work pretty well, on the other hand the frame rate interpolation doesn’t work.

First of all the video frame rate needs to be constant, if it’s defined as 23.976FPS we would need multiply it by 2.5 to reach 59.94FPS to get as close to 60FPS as possible but it is not a whole number so we will need to alter existing frames while generating 1.5 frames for each existing frames.

And this breaks everything. Here are two 5 seconds sequences taken from the opening of Zom 100 on YouTube.

Original video:

AI enhanced video:

Let’s get the total number of frames the lazy way:

ffprobe -v error -select_streams v:0 -count_packets \
    -show_entries stream=nb_read_packets -of csv=p=0 \
    zom100_original_start_5sec.webm
# 127

ffprobe -v error -select_streams v:0 -count_packets \
    -show_entries stream=nb_read_packets -of csv=p=0 \
    zom100_ai_start_5sec.webm
# 323

If my math is right 323 / 127 = 2.543307086614173 making us close to the 2.5 multiplier. But let’s forget numbers because what really counts is the visuals.

Why is it ugly?

The gist of it is that we a creating new frames based on the previous and next frame while also altering existing frames to target the 60FPS that is being uploaded.

Let’s take a look frame by frame, each origin frame lasts for 2 seconds while AI enhanced frames last for 1/3 of a second. Left and right video isn’t synchronized as the frame rate cannot be divided by an integer value.

Obvious artifacting appears very early on around anything that moves. Sometimes it’s just blurring just like if motion blur (on the bike) was introduced and sometimes the shapes are deformed (such as the numbers).

Another example:

The upscaled image is also sharpened and denoised to some degree which ends up messing up the contrast on the line art.

But 60FPS is nice when gaming

Yes, when playing 3D games each frame is rendered before being displayed on the screen. Videos are different because each frames are already rendered.

This is like the difference between pre-rendered cutscenes and in-engine cutscenes. Pre-rendered cutscenes are designed in a way to be of a certain resolution, frame rate, compression and colorspace.

In-engine cutscenes on the other hand will usually target the game’s resolution, framerate, colorspace without applying any compression. Usually because some game engines will lock the frame rate to a lower value such as 30FPS because of engine limitations bad engine programming and design.

Rendering 30, 60, 120 or 144 frames a second by taking a picture of a 3D scene that is built right before being displayed is the key to fluidity.
Already rendered content will never be able to do such thing because of the missing information.

Real world example from 60 to 23.976 to 60

Here is NieR Automata at 60FPS:

Reducing the frame rate to 23.976FPS makes the motion look less satisfying:

Now let’s interpolate the framerate to 71.92806FPS and limit it to 60FPS with Flowframes that permits me to double or triple the frame rate (not set it to 60 directly) and notice how the video is choppy:

Here are the settings used:

This is exactly what is being done to those anime openings… Disgusting.

Let’s compare the original native 60FPS footage with the interpolated to 60FPS from 23.976FPS:

Click to enlarge
Click to enlarge
Click to enlarge

The screenshots talk by themselves and expression the difference in a much more obvious way, but let’s go a bit further and slow down the video clips while having them side by side and centered on 2B:

Slowed down to 25% does show how bad it gets. Same goes for that Jujutsu Kaisen S2 opening:

It jitters, the text is deformed… I hate it. Anime is not drawn for 60FPS.

Misguided demande exists

Sadly TVs are being sold with some smoothing technology and marketed as good looking. Not sure how people really perceive it as something nice while it should be weirding them out.

I also remember that the Hobbit being 48FPS at the movie theatre was something that did weird out people quite a bit, but was it because of the 3D effect included too? No idea.

NVidia is also working on adding frame rate smoothing to their GPUs when encoding, weird idea…

The reason I made this post is that I’m tired of seeing openings such as Zom100 and Jujutsu Kaisen S2 being upscaled, sharpened to hell and then interpolated to get weird motion and no improvement.

Conclusion

Most people that aim for +60FPS content do so because fluidity is important and enhances the pleasure of the visuals. Sadly most people don’t seem to actually pay attention to the composition and missout on the botched detail and weird movements.

Do I envy people that are not able to perceive what’s wrong with frame rate interpolating? No I don’t. I just think it’s sad that they are missing out on the destroyed detail.

This trend of upscaling and interpolating the framerate will not die anytime soon and this makes me sad because we have some very nice animations that are design for low framerate and will only work like so.

Spoilers ahead but here are some nice animations:

In all of the above scenes we have animations that are not 60FPS and do not need to be. Motion is properly conveyed through direction and effects.

As a closing note I suggest watching Satoshi Kon Editing Space & Time by Every Frame a Painting. At the 4:47 mark they talk about motion, it’s interesting to see how animation can convey more actions with less frames than live action.

Sousou no Frieren EP9

If Japanese animation proved anything in 2023 it’s that anime still doesn’t need to by high framerate. In Sousou no Frieren episode 9 we have multiple fights happening at the same time in the second half of the episode and the animation is perfect.

The fights are either fast paced and smooth or detailed and smooth. No frame is out of place. This is the standard of animation that we would expect from a full blown movie.

Since it’s copyrighted and I don’t want to stretch faire use (especially because Japan doesn’t really practice that) I will only be posting one clip.

Mad House published some tweets to show the behind the scenes for the keyframes:

Those keyframes are so clean the finished version can only be the best:

Full animated scene from Sakugabooru.

Windows 10’s image viewer is trash, bugged and won’t ever be fixed

I’ve been running Windows 10 since the end of 2017 and while it has worked fine most of the time I still get annoyed by the many quirks and bugs that Windows has had over the years even before Windows 10.

One of those issues is the the IME keyboard listing that automatically adds back English US at random times for no good reason, I’ve had that bug since Windows 8 and I want it to just not be a thing anymore. But this post isn’t about this bug.

This post if about the Windows 10 image viewer and how it consistently fails at doing what you would expect from a basic feature such as an image viewer.

Scrolling images through folders doesn’t work

I use the Windows image viewer to browse folder images, it’s integrated into Windows and should be able to handle the kind of files I throw at it such as PNGs, JPEGs and sometimes my CR3 RAWs.

It does need to be perfect when it comes to display images as it is a basic feature of any operating system.

When I open a image from within a folder I usually scroll through the pictures with the mouse wheel. So what’s wrong?

Basically what’s wrong is that if I stop scrolling for a couple of seconds, when I start scrolling the pictures again I’ll be starting back from the picture I first opened in the image viewer.
The image viewer loses track of where it last was. This happens consistently in folders with over 3000 media files…

I’m not the only one

Searching for it brings up a post about the same issue:

The usual response from the Independent Advisor is to reset the apps, then we are told to run DISM and update our system. The Microsoft forums have once again let down the whole world and not one managed to act surprised.

Now I must admit that the wording is not ideal and it also contains another issue that I do not care for. I still want to be able to scroll through my meme collection without going back to the very start and this is kinda forcing me into thinking about what it takes to make a proper image viewer.

I have not done C# in a long time and I think I would enjoy making a lightweight basic image viewer.

IrfanView always works

Back in the early 2000s I came across IrfanView, a graphic viewer that supports many formats and plugins. It’s a very powerful too that can view, convert, batch process and basically save your day when it comes to media files.
It’s a graphic viewer that even plays music! Talk about going above and beyond.

When I open an image I can scroll through the folder’s images with the mouse wheel AND ALSO the previous and next mouse buttons. I could do that on Windows 7, 8 and 8.1. How come this isn’t a thing anymore on Windows 10?!

It also can rotate images based on EXIF data and it just works. The only thing I need it an image viewer that just works!

Conclusion

Windows is a total let down when it comes to basic features but the good thing is that for now we can still install third party soft such as IrfanView, MPC-BE, ShareX and many more.

I hope Microsoft won’t kill this anytime soon but they are kinda hinting at taking control over what can be installed with their Microsoft Store, just by letting such an abomination exist.

I trust Microsoft will do what’s wrong for their consumers because we live in the era of SaaS and never ending advertising.

Export video with alpha channel in Davinci Resolve

A great alternative to the Adobe video editing suite is Davinci Resolve because it’s free (paid version exists, locks just a bit of features).

Exporting in Davinci Resolve

Let’s say you have a timeline with your elements places and transparency set. You will now go to the deliver tab, select Quicktime as the format and GoPro Cineform as the codec.

The most important thing is to select RGB-16 in the type and tick export Alpha.

This is the final step unless you need to publish the file as a WebM.

Convert the MOV to WEBM

When building alerts and overlays, certain services like StreamElements won’t accept the Quicktime file, you will need to convert it to a WebM with the VP9 codec to retain the transparency.

Use FFmpeg with the following command line:

ffmpeg -i inputmove.mov -c:v libvpx -pix_fmt yuva420p -b:v 2000k -auto-alt-ref 0 output.webm

Keep in my you can change the bitrate and increase it by tweaking the value following -b:v.

Conclusion

This small trick is something I spent a long time looking for and I have had mixed results with it. But lately I’ve been getting into preparing custom alerts on StreamElements for my own streams and I was hoping to add something nice.

Adding subtitles to your stream

Ever wanted to subtitle your stream in your language and add one or two other languages with automatic translation? Looks like there’s a tool for that.

Step 1: Setting up a Apps Script (Google)

Setting up the Apps Script start with creating a new script on Apps Script: https://script.google.com

In the editor you will need to paste the following code:

function doGet(e) {
  const params = e.parameter
  var translatedText = LanguageApp.translate(params.text, params.source, params.target);
  const output = ContentService.createTextOutput();
  output.setMimeType(ContentService.MimeType.JSON);
  output.setContent(translatedText);
  return output;
}

The code will need to be deployed by click on the deploy menu on the top left:

The app must be deployed as a web app:

In the form you must let anyone access it:

When it’s deployed you will be provided the app key, this is what you will need for the next steps:

Step 2: Setting up the page

This is where the fun begins. There’s a form that is totally in Japanese that will require a couple of edits to be usable. Go here to start filling the form: https://sayonari.coresv.com/ninshikiChan/20210822_config.html

In the input field for Google Script API-KEY paste the one from your Apps Script. If you didn’t copy the key in the previous step you can retrieve if by going into the editor, in the deploy menu you need to click on manage deployments and you’ll find it under deployment ID:

Change the languages you want to use lower on the page, feel free to use Google Translate to find out what is what:

On the bottom of the page you will find the URL that you can copy and paste in your web browser, Google Chrome might be the only supported browser at the time of writting.

Step 3: Setting up OBS

Now comes the less fun part. Since the web page needs to listen to your voice you will have to run the page in your own Google Chrome instance, I’m unsure if it works on other browser since I wasn’t able to make the web app perform every single time.

To set it up all you need is a window capture and some filters. If the content of the web browser doesn’t appear you might need to disable hardware acceleration.

For the filters I usually just crop the top and a bit of the bottom, then I apply a chroma key filter to remove the screen background:

Congratulations, you are know streaming while subtitling at the same time. Subs are the way to go and this is what it looks like:

Troubleshooting

Refreshing the tab might help making the web app work, but if your speech isn’t appearing on screen you should instead try closing and reopening your browser.

I kept the tab loaded before closing Google Chrome and when I opened it again it was working.

Once again this isn’t flawless and some words might appear wrong or be heard wrong, these are the limits of speech recognition. Unless we slap some machine learning over your accent.

It might also strain your performances and require relaunching Google Chrome once in a while.

Credits

This is basically a translation of a Japanese page that was shared with me: http://www.sayonari.com/trans_asr/index_asr.html

This is a very useful tool for hearing impaired viewers and to enable people to appreciate better a stream in another language.