Skip to content

Incorporating human supervision can significantly enhance the accuracy of results. This interface is designed to facilitate post-editing of sign language data generated by an automated system.

Notifications You must be signed in to change notification settings

HoseinRanjbar/Post-Editing-Interface

Repository files navigation

Post-Editing-Interface

With the rapid advancement of artificial intelligence, the demand for user-friendly interfaces has increased significantly. At the same time, post-editing has become a key part of sign language technology, allowing users to refine outputs from automatic sign language translation and production systems. Moreover, post-editing is an essential tool for modifying real, human-created sign language data to meet specific needs. These considerations inspired the development of a human-centered post-editing interface designed specifically for sign language.

Among the various ways to represent sign language, two significant formats are skeleton poses and videos. Skeleton pose data are typically generated by pose estimation models, which can encounter challenges in certain scenarios. Videos, on the other hand, are either recorded by signers or generated by sign language production systems and often require further adjustments. To address these limitations, this study introduces an interface with two key components: pose editing and video editing.

Skeleton Pose Editing:

Skeleton pose is a fundamental form of sign language, widely used in sign language recognition, translation, and production. Accurate skeleton pose data enhance the quality of sign language production's output and improve the performance of sign language recognition and translation systems. However, existing pose estimators such as DWPose, OpenPose, AlphaPose, and MediaPipe have notable limitations. For instance, they often fail to detect or accurately locate hand keypoints during rapid movements.

Our system allows users to upload videos or pose data. For videos, users can choose a pose estimator (DWPose, OpenPose, or MediaPipe). Video frames are displayed for selection, and the corresponding pose is shown for manual editing to correct inaccuracies. Updated pose data can then be saved.

video1_15fps.mp4

Video Editing:

Sign language production systems often fall short of meeting audience needs. Many systems generate sign language videos using skeleton poses as intermediate representations, guiding GANs or diffusion models with pose sequences. Common methods like stitching (using a gloss dictionary) and transformer-based encoder-decoder models can produce inadequate results, such as substituting a missing sign with fingerspelling, which may not resonate with viewers. To tackle these issues, we developed a tool that enables users to manually edit synthetic videos with AI support. Additionally, the interface is useful for editing real sign language videos that contain linguistic errors or issues with the signer’s performance. Redoing such videos can be costly due to the need to recreate the same setting, background, and clothing.

The interface lets users select a segment of a video for modification and upload a replacement video or pose sequence to correct the motion. Using ControlNeXt-SVD-v2, the system creates the updated video by combining the initial pose sequence, the new pose sequence, smooth transitions, and a reference image of the signer. Finally, post-processing with the FaceFusion framework improves the quality of the final video.

video_editing.mp4

A synthetic video sample:

synthetic_video.mp4

Running the Interface

Installing Packages and Libraries

Use the following command to install all the necessary packages and libraries:

pip install -r requirements.txt

Cloning the MusePose Repository

Clone the MusePose repository to your S3IT account using the following command. Since inference with the MusePose model requires a high-end GPU, we recommend running video synthesis on S3IT for optimal performance:

git clone https://github.com/TMElyralab/MusePose.git

Before running the interface, make sure to update the config.yaml file by changing the path to your desired directory.

To run the interface, use the following command:

python main.py

About

Incorporating human supervision can significantly enhance the accuracy of results. This interface is designed to facilitate post-editing of sign language data generated by an automated system.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages