Academic Project

In this system, the image and video data obtained from a single camera are first processed using the MediaPipe framework to estimate the 3D body poses, and inverse kinematics is employed to calculate joint rotation angles.

The ARkit tool is utilized to recognize facial expressions and head poses. Unity game engine is employed for graphic rendering and interactive functionalities.

This system is capable of accurately capturing the user's key poses and expressions, generating real-time virtual character animations, and supporting the selection of animation models.

Tools

Unity, C#

Motion Capture

Input Processing: MobileNetV2 extracts features from live video of the performer.
Keypoint Detection: MediaPipe detects hand and body keypoints for real-time pose estimation.
Smoothing: Filter smoothing reduces noise for stable and continuous joint tracking.
Inverse Kinematics: Converts 2D keypoints into joint angles to drive character animation.
Output: A virtual character mimics the performer’s movements in real time with high fidelity.

Virtual Charcter

I used VRoid Studio to design and create a stylized 3D virtual character.

VRoid Studio offers a user-friendly interface and customizable templates, allowing detailed control over facial features, hairstyles, outfits, and expressions. I adjusted parameters for facial morphs and emotions to ensure expressive animation compatibility.

After completing the model, I exported it in FBX format with a Humanoid rig, making it easy to integrate into Unity for animation and real-time interaction. This character served as the visual output for a motion capture pipeline powered by MediaPipe and ARKit.

Augmented Reality

Integrated ARKit on iOS to enable real-time facial expression tracking using the device’s depth camera and motion sensors
Detected facial feature points (eyes, eyebrows, mouth, etc.) to capture fine-grained facial movements and expressions
Recognized and classified expressions (e.g., smile, blink, mouth open) using facial landmark data and expression weight coefficients
Mapped facial expressions onto a virtual 3D character for real-time animation and interaction
Applied facial data to control blendshapes and drive expressive character performance in Unity
Rendered and visualized results in AR, allowing users to interact with animated characters through a mobile device