On YouTube there’s always someone using AI to upscale to 4K and interpolate frames in anime openings. Some people think it looks good because it “moves” more and they are wrong.
I like referencing the video of Noodle on Smoother animation ≠ Better animation. You do not need to watch this video to understand this post but I still recommend watching it as Noodle is an actual animator as I don’t plan on just repeating what this video says.
I’ll be using my own words and examples.
What is upscaling?
When it comes to images and videos upscaling is increasing the dimension of the frame. To illustrate this example let’s take a
102 pixel image and upscale it using Photoshop and Waifu2x:
Doubling the image’s dimension does harm the quality of the image by doubling and softening the pixels. The 4 times image is unusable and down right bad.
Waifu2x uses a deep convolutional neural network to try minimize the quality loss and retain the detail from the original image. It works quite nicely bit but it far from perfect. Other AI enhancements can be used to upscale images.
In all the illustrated cases we are taking an image that is
102 pixels by
102 and adding more pixels. The only thing we can do is thus interpolating.
What is interpolating?
Interpolation is a very complicated subject that can be basically watered down to estimating new data based on existing data.
In the case of this post I already referenced interpolating pixels to fill in the blanks when upscaling an image. If we took an image and did not interpolate pixels we would retain the original pixels but spaced out in a grid like this:
Since I’m increasing the size from
102 pixels to
408 pixels I’ve blacked out the pixels that will have to be estimated. Different sharpening and scaling algorithms will process the images differently and alter the output.
This example is only illustrating what is missing and what needs to be generated.
The same can be explained for increasing the framerate from
60FPS. Wait a second,
60FPS. Decimal numbers?
Why interpolating DOESN’T work?
If upscaling images with AI can work pretty well, on the other hand the frame rate interpolation doesn’t work.
First of all the video frame rate needs to be constant, if it’s defined as
23.976FPS we would need multiply it by
2.5 to reach
59.94FPS to get as close to
60FPS as possible but it is not a whole number so we will need to alter existing frames while generating 1.5 frames for each existing frames.
And this breaks everything. Here are two 5 seconds sequences taken from the opening of Zom 100 on YouTube.
AI enhanced video:
Let’s get the total number of frames the lazy way:
ffprobe -v error -select_streams v:0 -count_packets \ -show_entries stream=nb_read_packets -of csv=p=0 \ zom100_original_start_5sec.webm # 127 ffprobe -v error -select_streams v:0 -count_packets \ -show_entries stream=nb_read_packets -of csv=p=0 \ zom100_ai_start_5sec.webm # 323
If my math is right
323 / 127 = 2.543307086614173 making us close to the
2.5 multiplier. But let’s forget numbers because what really counts is the visuals.
Why is it ugly?
The gist of it is that we a creating new frames based on the previous and next frame while also altering existing frames to target the
60FPS that is being uploaded.
Let’s take a look frame by frame, each origin frame lasts for 2 seconds while AI enhanced frames last for 1/3 of a second. Left and right video isn’t synchronized as the frame rate cannot be divided by an integer value.
Obvious artifacting appears very early on around anything that moves. Sometimes it’s just blurring just like if motion blur (on the bike) was introduced and sometimes the shapes are deformed (such as the numbers).
The upscaled image is also sharpened and denoised to some degree which ends up messing up the contrast on the line art.
But 60FPS is nice when gaming
Yes, when playing 3D games each frame is rendered before being displayed on the screen. Videos are different because each frames are already rendered.
This is like the difference between pre-rendered cutscenes and in-engine cutscenes. Pre-rendered cutscenes are designed in a way to be of a certain resolution, frame rate, compression and colorspace.
In-engine cutscenes on the other hand will usually target the game’s resolution, framerate, colorspace without applying any compression. Usually because some game engines will lock the frame rate to a lower value such as
30FPS because of
engine limitations bad engine programming and design.
144 frames a second by taking a picture of a 3D scene that is built right before being displayed is the key to fluidity.
Already rendered content will never be able to do such thing because of the missing information.
Real world example from 60 to 23.976 to 60
Here is NieR Automata at
Reducing the frame rate to
23.976FPS makes the motion look less satisfying:
Now let’s interpolate the framerate to
71.92806FPS and limit it to
60FPS with Flowframes that permits me to double or triple the frame rate (not set it to
60 directly) and notice how the video is choppy:
Here are the settings used:
This is exactly what is being done to those anime openings… Disgusting.
Let’s compare the original native
60FPS footage with the interpolated to
The screenshots talk by themselves and expression the difference in a much more obvious way, but let’s go a bit further and slow down the video clips while having them side by side and centered on 2B:
Slowed down to 25% does show how bad it gets. Same goes for that Jujutsu Kaisen S2 opening:
It jitters, the text is deformed… I hate it. Anime is not drawn for 60FPS.
Misguided demande exists
Sadly TVs are being sold with some smoothing technology and marketed as good looking. Not sure how people really perceive it as something nice while it should be weirding them out.
I also remember that the Hobbit being
48FPS at the movie theatre was something that did weird out people quite a bit, but was it because of the 3D effect included too? No idea.
NVidia is also working on adding frame rate smoothing to their GPUs when encoding, weird idea…
Most people that aim for +60FPS content do so because fluidity is important and enhances the pleasure of the visuals. Sadly most people don’t seem to actually pay attention to the composition and missout on the botched detail and weird movements.
Do I envy people that are not able to perceive what’s wrong with frame rate interpolating? No I don’t. I just think it’s sad that they are missing out on the destroyed detail.
This trend of upscaling and interpolating the framerate will not die anytime soon and this makes me sad because we have some very nice animations that are design for low framerate and will only work like so.
Spoilers ahead but here are some nice animations:
- Psycho Pass (movie);
- Tengoku Daimakyou EP1;
- Darker than Black Ryuusei no Gemini;
- Shingeki no Kyojin.
In all of the above scenes we have animations that are not
60FPS and do not need to be. Motion is properly conveyed through direction and effects.
As a closing note I suggest watching Satoshi Kon Editing Space & Time by Every Frame a Painting. At the
4:47 mark they talk about motion, it’s interesting to see how animation can convey more actions with less frames than live action.