Making an overhead camera streaming setup for gunpla

I have ~~a passion~~ an addiction called Gunpla, it’s about building models from the Mobile Suit Gundam franchise which is basically a giant robot franchise. Giant robots are great.

While thinking about how to change a bit my streaming content I thought about the huge 13 boxes of Gunpla I have not built yet and that has been staying unopened for a couple of months. Basically I thought about an excuse to force myself to take time to build.

A passion

I first fell in love with Gunpla back in December 2021 and I have been building until October 2022. I stopped since I had planned to travel to Japan between November and December.

I have currently built over 30 models, mostly from the Mobile Suit Gundam franchise, and while I’m starting to have some pretty crowded shelves I have not intention of stopping.

Call it an addiction if you want.

Mobile Suit Gundam

That one big mecha anime franchise that appears to be really hard to get into until you realize it’s easy. Pick wherever you wanna start and start.

Most series have their own timeline and are standalone such as:

Iron Blooded Orphans;
Mobile Suit Gundam 00;
Mobile Suit Gundam SEED (& SEED Destiny);
Mobile Suit Gundam Wing;
The Witch from Mercury.

The list does go on.
And then some are considered as the main timeline called Universal Century (UC):

Mobile Suit Gundam;
Mobile Suit Gundam Z and ZZ;
Char’s Counterattack;
War in the pocket;
Stardust Memories;
Mobile Suit Gundam Unicorn;
Hathaway’s Flash.

And while this isn’t a complete a list of course you should feel free to pick it up anywhere you like. I started with Hathaway’s Flash and Unicorn, you will probably miss some plot points but whatever gets you started is valid.

For real just pick wherever you wanna start, don’t care about people judging you, care only about getting into it.

Not just models

Gunpla comes from Gundam and Plamo, and Plamo comes from Plastic and Models. The truth is that they are much more than just mere models you look at: they are fun to build and pose.

Unlike Games Workshop’s Warhammer (40K) franchise you get to built you model without needing glue nor paint and the figures can move thanks to over engineering part that just move in the most satisfying way.
I never thought I’d be blown away by how a plastic model can move a legs and feet but I still am to this day. Some models have some very precise movement that actually does improve the movement range beyond what you would expect.

Am I fanboying over plastic models? Yeah I am.

I’d personally recommend to built the RG Nu Gundam to whoever got into Gunpla and is looking to try the best model there is to build out there at the time of writing.

Setting up a stream camera

Let’s get into the main subject after presenting the hobby and my addiction.

I’m the happy owner of an Avermedia GC551 capture card, a 7 meter long HDMI cable, a Canon 90D and a RØDE VideoMic GO II. This is all you need to get started with showing an image if you know what you are doing.

The basic setup I was going for at the beginning was pretty terrible since I was using a tripod on the side but I’ve since then upgraded to Elgato’s Master Mount L and Flex Arm L to hold my camera in an overhead view… And promptly sent it back because the Flex Arm can’t hold a DSLR when it weighs just 1Kg…

Elgato Flex Arm L used with a camera, as seen on Elgato.com

I should have thought about why they don’t specify how much load it can take and some things are surprising…

So if the Elgato Flex Arm isn’t an option, what else can I try? Well there’s this Tarion Camera Destop Arm (spelt like that). Am I throwing money to the problem to check what sticks? Yes…

If we take a closer look it’s shorter, with the vertical pole being only 52cm long compared to the Elgato one going up to 128cm. Is it going to be an issue?

If I’m using my Canon EF 50mm f/1.8 STM Lens I need at least 1 meter of distance between the lens and the table to properly record the surface. Buying a 35mm lens would help but it would increase the cost too much for it to be worth right now.

After fiddling a bit with it I’m able to clear the distance and find out that I was wrong: I need more than 1 meter of clearance. So now I’m looking at the SmallRig RA-S280A Air-cushioned Light Stand with Arm 3737, it’s called a light stand but it will be perfect for the camera.

I can also use a counterweight on the handle thanks to a very handy screw in hook and what it looks like from the point of view of the stream ?

It’s perfect for what I need.

What’s next?

Next up is setting up a microphone and an iPhone 11 to capture my voice and facial expressions. Let’s do a combo.

I’m mounting the phone on a Joby GorillaPod 5K with a K&F Concept CA02 Quick Release Plate clip and on top of that I’m putting the RØDE VideoMic GO II on that clip. The mic is connected to the camera and the sound is then fed into the the capture card over HDMI.

The GoPro Hero 11 with the 3-Way Tripod is used as a webcam to complete the body tracking and needs to be facing me.

Adding iPhone facetracking, with webcam body tracking ?

Oh yeah, because I’m not streaming in front of my desktop BEEF PC I need to connect a camera to track my body… My laptop is a MacBook Pro M1 Pro, I’m lucking that I can connect my GoPro as a webcam (as it need to be centered) and I can run OpenSeeFace, the facetracking backend for VSeeFace.

OpenSeeFace requires Python 3, I tried using Python 3.10 but one dependency doesn’t seem to exist for my MacBook, so I’ve opted for Python 3.9 and used the following commands to install and run:

# Clone or download the zip
git clone git@github.com:emilianavt/OpenSeeFace.git
cd OpenSeeFace

# Install the latest Python version & pew for easy venv
brew install python@3.9
pip install pew

# Create an environment for the appropriate Python version
pew new openseeface -p3.9
# Or run you venv
pew workon openseeface

# Install dependencies
pip install onnxruntime opencv-python pillow numpy

# Run facetracker.py
python facetracker.py -c 2 -F 30 -W 1920 -H 1080 \
    --model 4 --gaze-tracking 1 --discard-after 0 \
    --scan-every 0 --no-3d-adapt 1 --max-feature-updates 900 \
    --log-output output.log --max-feature-updates 900 \
    --ip 192.<the_computer_with_vseeface> -p 11573

Now to explain how to select the camera:

Parameter -c is for selecting the number associated with the camera;
Parameter -F is for the framerate of the camera;
Parameters -W and -H are for the camera resolution;
Parameter --model improves accuracy, I don’t know why I was able to go with 4 but this is the command line I got from VSeeFace running on Windows and it ran on Mac OS.

Sadly, without running the OpenSeeFace scripts on PC you will have to guess the camera number, frame rate and resolution… Yeah that’s how those things work when there’s no support for Linux and Mac.
My time to shine with a pull request, I should look into that.

For the iPhone I’m running iFacialMocap and this is all you need for your facial expressions, but do setup blendshapes for your model and face, this post isn’t about this.

I only use iFacialMocap and skip the webcam for body tracking, this makes the setup easier but I still wanted to write up all that in case I change my mind.

Conclusion

I didn’t cover the VTubing aspect of it because it’s out the scope of the video but I’ll have to write a post about some of that and explain my setup since my desktop computer is used for streaming but I’m not building anything on my desk.

I need to improve the lighting and maybe do some color calibration for streaming a better image quality.

Stop upscaling video to 4K60FPS

Updated 2025/02/20: Fake frames by Gamers Nexus: a bit on DLSS 4.
Updated 2023/01/03: Added a bit about Frieren EP9.

On YouTube there’s always someone using AI to upscale to 4K and interpolate frames in anime openings. Some people think it looks good because it “moves” more and they are wrong.

I like referencing the video of Noodle on Smoother animation ≠ Better animation. You do not need to watch this video to understand this post but I still recommend watching it as Noodle is an actual animator as I don’t plan on just repeating what this video says.
I’ll be using my own words and examples.

What is upscaling?

When it comes to images and videos upscaling is increasing the dimension of the frame. To illustrate this example let’s take a 102 by 102 pixel image and upscale it using Photoshop and Waifu2x:

Doubling the image’s dimension does harm the quality of the image by doubling and softening the pixels. The 4 times image is unusable and down right bad.

Waifu2x uses a deep convolutional neural network to try minimize the quality loss and retain the detail from the original image. It works quite nicely bit but it far from perfect. Other AI enhancements can be used to upscale images.

In all the illustrated cases we are taking an image that is 102 pixels by 102 and adding more pixels. The only thing we can do is thus interpolating.

What is interpolating?

Interpolation is a very complicated subject that can be basically watered down to estimating new data based on existing data.

In the case of this post I already referenced interpolating pixels to fill in the blanks when upscaling an image. If we took an image and did not interpolate pixels we would retain the original pixels but spaced out in a grid like this:

Since I’m increasing the size from 102 by 102 pixels to 408 by 408 pixels I’ve blacked out the pixels that will have to be estimated. Different sharpening and scaling algorithms will process the images differently and alter the output.
This example is only illustrating what is missing and what needs to be generated.

The same can be explained for increasing the framerate from 23.976 to 60FPS. Wait a second, 23.976FPS to 60FPS. Decimal numbers?

That’s a fun framerate to multiply to 60. Where are my comfy integer numbers? I like whole numbers. Anything that floats is complicated and scary.

Why interpolating DOESN’T work?

If upscaling images with AI can work pretty well, on the other hand the frame rate interpolation doesn’t work.

First of all the video frame rate needs to be constant, if it’s defined as 23.976FPS we would need multiply it by 2.5 to reach 59.94FPS to get as close to 60FPS as possible but it is not a whole number so we will need to alter existing frames while generating 1.5 frames for each existing frames.

And this breaks everything. Here are two 5 seconds sequences taken from the opening of Zom 100 on YouTube.

Original video:

AI enhanced video:

Let’s get the total number of frames the lazy way:

ffprobe -v error -select_streams v:0 -count_packets \
    -show_entries stream=nb_read_packets -of csv=p=0 \
    zom100_original_start_5sec.webm
# 127

ffprobe -v error -select_streams v:0 -count_packets \
    -show_entries stream=nb_read_packets -of csv=p=0 \
    zom100_ai_start_5sec.webm
# 323

If my math is right 323 / 127 = 2.543307086614173 making us close to the 2.5 multiplier. But let’s forget numbers because what really counts is the visuals.

Why is it ugly?

The gist of it is that we a creating new frames based on the previous and next frame while also altering existing frames to target the 60FPS that is being uploaded.

Let’s take a look frame by frame, each origin frame lasts for 2 seconds while AI enhanced frames last for 1/3 of a second. Left and right video isn’t synchronized as the frame rate cannot be divided by an integer value.

Obvious artifacting appears very early on around anything that moves. Sometimes it’s just blurring just like if motion blur (on the bike) was introduced and sometimes the shapes are deformed (such as the numbers).

Another example:

The upscaled image is also sharpened and denoised to some degree which ends up messing up the contrast on the line art.

But 60FPS is nice when gaming

Yes, when playing 3D games each frame is rendered before being displayed on the screen. Videos are different because each frames are already rendered.

This is like the difference between pre-rendered cutscenes and in-engine cutscenes. Pre-rendered cutscenes are designed in a way to be of a certain resolution, frame rate, compression and colorspace.

In-engine cutscenes on the other hand will usually target the game’s resolution, framerate, colorspace without applying any compression. Usually because some game engines will lock the frame rate to a lower value such as 30FPS because of ~~engine limitations~~ bad engine programming and design.

Rendering 30, 60, 120 or 144 frames a second by taking a picture of a 3D scene that is built right before being displayed is the key to fluidity.
Already rendered content will never be able to do such thing because of the missing information.

Real world example from 60 to 23.976 to 60

Here is NieR Automata at 60FPS:

Reducing the frame rate to 23.976FPS makes the motion look less satisfying:

Now let’s interpolate the framerate to 71.92806FPS and limit it to 60FPS with Flowframes that permits me to double or triple the frame rate (not set it to 60 directly) and notice how the video is choppy:

Here are the settings used:

This is exactly what is being done to those anime openings… Disgusting.

Let’s compare the original native 60FPS footage with the interpolated to 60FPS from 23.976FPS:

The screenshots talk by themselves and expression the difference in a much more obvious way, but let’s go a bit further and slow down the video clips while having them side by side and centered on 2B:

Slowed down to 25% does show how bad it gets. Same goes for that Jujutsu Kaisen S2 opening:

It jitters, the text is deformed… I hate it. Anime is not drawn for 60FPS.

Misguided demande exists

Sadly TVs are being sold with some smoothing technology and marketed as good looking. Not sure how people really perceive it as something nice while it should be weirding them out.

I also remember that the Hobbit being 48FPS at the movie theatre was something that did weird out people quite a bit, but was it because of the 3D effect included too? No idea.

NVidia is also working on adding frame rate smoothing to their GPUs when encoding, weird idea…

The reason I made this post is that I’m tired of seeing openings such as Zom100 and Jujutsu Kaisen S2 being upscaled, sharpened to hell and then interpolated to get weird motion and no improvement.

Conclusion

Most people that aim for +60FPS content do so because fluidity is important and enhances the pleasure of the visuals. Sadly most people don’t seem to actually pay attention to the composition and missout on the botched detail and weird movements.

Do I envy people that are not able to perceive what’s wrong with frame rate interpolating? No I don’t. I just think it’s sad that they are missing out on the destroyed detail.

This trend of upscaling and interpolating the framerate will not die anytime soon and this makes me sad because we have some very nice animations that are design for low framerate and will only work like so.

Spoilers ahead but here are some nice animations:

In all of the above scenes we have animations that are not 60FPS and do not need to be. Motion is properly conveyed through direction and effects.

As a closing note I suggest watching Satoshi Kon Editing Space & Time by Every Frame a Painting. At the 4:47 mark they talk about motion, it’s interesting to see how animation can convey more actions with less frames than live action.

Sousou no Frieren EP9

If Japanese animation proved anything in 2023 it’s that anime still doesn’t need to by high framerate. In Sousou no Frieren episode 9 we have multiple fights happening at the same time in the second half of the episode and the animation is perfect.

The fights are either fast paced and smooth or detailed and smooth. No frame is out of place. This is the standard of animation that we would expect from a full blown movie.

Since it’s copyrighted and I don’t want to stretch faire use (especially because Japan doesn’t really practice that) I will only be posting one clip.

Mad House published some tweets to show the behind the scenes for the keyframes:

TVアニメ『葬送のフリーレン』
ご視聴ありがとうございました✨

🪄線撮紹介🪄

第9話「断頭台のアウラ」
フェルンvsリュグナー
その①

担当：藤本航己さん#フリーレン #frieren pic.twitter.com/owQtcRpM1W

— MADHOUSE Inc. (@Madhouse_News) November 3, 2023

Those keyframes are so clean the finished version can only be the best:

Full animated scene from Sakugabooru.

Fake Frame Image Quality: DLSS 4 & MFG

Gamers Nexus did investigate what they call fake frames and I think it has value since frame generation is being pushed in gaming harder than ever since 2024.

RErideD: The episode 1 that didn’t try

I came across the first four episodes of RErideD, scheduled to air starting October 2018. I gave it a try and was disappointed.
I’ve only watched the first episode and will not be watching more.

Trailer

Disclaimer

I will be spoiling the first episode, this isn’t much to be honest because it’s not really the destination that counts but the experience.

Continue reading RErideD: The episode 1 that didn’t try