Image Lipsync (Photo Talk)
Generate talking-head videos from a portrait image and audio using AI lip-sync technology
Tool Overview
The Image Lipsync (Photo Talk) tool generates talking-head videos by combining a portrait image with audio input. It uses AI-driven lip-sync technology to animate the character's mouth, facial expressions, and subtle movements to match the provided audio, creating a realistic speaking video.
Tool Identifier
image-lipsync-api
Parameters
Required Parameters
image
- Type:
string(required) - Description: Portrait image URL for the character to be animated
- Format: Must be a complete HTTP/HTTPS URL (e.g.,
https://cdn.sjinn.ai/uploads/portrait.jpg) - Validation: Must be a non-empty, full URL starting with
http://orhttps://
audio
- Type:
string(required) - Description: Audio URL containing the speech to drive the lip-sync animation
- Format: Must be a complete HTTP/HTTPS URL (e.g.,
https://cdn.sjinn.ai/uploads/speech.mp3) - Validation: Must be a non-empty, full URL starting with
http://orhttps:// - Limits: Audio duration must not exceed 600 seconds (10 minutes)
prompt
- Type:
string(required) - Description: Text description to guide the animation style and character expression
- Validation: Must be a non-empty string
- Example:
"A young woman speaking naturally with gentle expressions"
Pricing
- Credits Consumed: 30 * audio_duration_in_seconds credits per task
- Example: A 10-second audio costs 30 * 10 = 300 credits
- Membership Requirement: None
Request Examples
Basic Usage
curl -X POST https://sjinn.ai/api/un-api/create_tool_task \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"tool_type": "image-lipsync-api",
"input": {
"image": "https://cdn.sjinn.ai/uploads/portrait.jpg",
"audio": "https://cdn.sjinn.ai/uploads/speech.mp3",
"prompt": "A young woman speaking naturally with gentle expressions"
}
}'
Using JavaScript
const response = await fetch('https://sjinn.ai/api/un-api/create_tool_task', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
tool_type: 'image-lipsync-api',
input: {
image: 'https://cdn.sjinn.ai/uploads/portrait.jpg',
audio: 'https://cdn.sjinn.ai/uploads/speech.mp3',
prompt: 'A young woman speaking naturally with gentle expressions',
},
}),
});
const result = await response.json();
console.log('Task ID:', result.data.task_id);
Using Python
import requests
url = 'https://sjinn.ai/api/un-api/create_tool_task'
headers = {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
}
data = {
'tool_type': 'image-lipsync-api',
'input': {
'image': 'https://cdn.sjinn.ai/uploads/portrait.jpg',
'audio': 'https://cdn.sjinn.ai/uploads/speech.mp3',
'prompt': 'A young woman speaking naturally with gentle expressions'
}
}
response = requests.post(url, json=data, headers=headers)
result = response.json()
print('Task ID:', result['data']['task_id'])
Response Examples
Success Response
{
"success": true,
"errorMsg": "",
"error_code": 0,
"data": {
"task_id": "550e8400-e29b-41d4-a716-446655440000"
}
}
Error Response
{
"success": false,
"errorMsg": "image must be a full URL (http:// or https://)",
"error_code": 400
}
{
"success": false,
"errorMsg": "Audio duration exceeds maximum limit of 600 seconds, your audio duration is 720",
"error_code": 101
}
Best Practices
- Image Quality: Use a clear, front-facing portrait image for the best lip-sync results. Avoid images with occlusions on the face area.
- Audio Quality: Use clean audio with minimal background noise. Clear speech produces more accurate lip-sync animations.
- Audio Duration: Keep audio under 600 seconds. Shorter audio clips (under 60 seconds) tend to produce higher quality results.
- Prompt Tips: Describe the character's expression and speaking style in the prompt to guide the animation (e.g., "speaking confidently", "with a warm smile").
- Generation Time: Generation time depends on audio duration. Short clips (under 30 seconds) typically complete in 1-3 minutes; longer clips may take 5-10 minutes.
