Discussions

Ask a Question
Back to all

Technical Inquiry: Face Selection & Speaker Isolation in Multi-Person Photos (v2 & Avatar 4.0)

Hello Engineering Team,

We are developing a video generation platform where users upload photos containing two people (e.g., partners or colleagues in a single frame). We are currently using the v2 API and planning to upgrade to Avatar 4.0.

We have two specific technical challenges regarding face selection:

  1. Regarding Talking Photo (Current API): The system currently detects and animates only the dominant face automatically.

Question: Is there a parameter (like face_index, roi, or coordinates) to force the engine to animate the non-dominant face instead? We need to programmatically select which person speaks without cropping the image.

  1. Regarding Avatar 4.0 (Target Engine): We understand Avatar 4.0 has advanced multi-face capabilities and often animates both faces.

Requirement: We need to generate a video where ONLY ONE specific person speaks while the other person in the photo remains present but static/silent.

Question: Does Avatar 4.0 support "Speaker Isolation"? Can we specify which face should be active and which should be passive in a single request?

We are looking for a native API solution to control "Who is speaking" in a multi-face source image.

Best regards,