SJinn-Agent-Guide
What is SJinn Agent?
SJinn Agent is an all-in-one AI-powered creative agent for image and video production. Through natural conversation, it autonomously handles the entire content creation process — from story videos and children's content to cinematic films, anime, UGC, advertisements, podcasts, and virtually any video format you can imagine.
Keywords: Strategy, Planning, Autonomous, End-to-End
SJinn goes beyond simple execution. It understands your ultimate goal, breaks it down into a strategic plan (scripting, storyboarding, asset generation), and brings it to life autonomously. You provide the vision; SJinn directs the entire creation process.
How to Communicate with SJinn Agent
Think of SJinn Agent not as a tool, but as a creative collaborator. Interact with it the way you would with a creative director — describe your vision, and it will bring it to life.
If the output doesn't meet your expectations, simply tell it what needs improvement, and it will regenerate those specific parts.
Using Sora2 or Veo3
By default, SJinn Agent uses video generation models(auto select Seedance Kling Hailuo). To use Sora2 or Veo3, you have two options:
Option 1: Direct Command
Simply tell the Agent:
"Use Sora2""Use Veo3"
Note: Currently, only Sora2 and Veo3 can be manually specified. For other models, the Agent will automatically select the most suitable one based on your content, no support specific.
Option 2: Use Templates
Choose from pre-configured templates optimized for these models:
- Veo3 Story Video (Consistent Characters): View Template
- Sora2 Story Video v2 (Consistent Characters): View Template
- Sora2 Extend: View Template
- Veo3 Extend: View Template
About Templates
Templates guide the Agent through specific workflows or provide context tailored to particular video styles. Using the right template ensures more consistent results and higher quality output.
Custom Templates
SJinn supports creating and publishing your own custom templates for others to use. Learn more in our Template Creation Guide.
Content Creation Guides
Story / Anime / Film Videos
Narrative-driven content with plot and characters.
real Films
For live-action video content, we recommend using the Veo3 Story Template for optimal results.
Cartoon & Kids Stories
For narration/explainer-style content:
If you're creating educational or narration-driven kids content, use this template:
For dialogue-based content:
If your story features character dialogue, the Veo3 Story Template works well:
Anime Films
For anime-style videos, we recommend either using no template or leveraging Sora2 templates for the best visual results.
For high scene continuity:
If maintaining smooth transitions and narrative flow is your priority, use:
For high character consistency:
If preserving consistent character appearance across scenes is critical, use:
UGC Videos
User-generated content style videos.
Coming soon...
Podcast Videos
Regarding podcast videos, we have created templates for both solo and two-person podcast videos.
Music Videos
Regarding music videos, we have built a template that generates music videos based on the music, with video content matching and synchronized with the lyrics.
Music story Video (Sync with lyrics)
Trending Effect Videos
Popular visual effects and styles.
Coming soon...
Available Tools
SJinn Agent comes equipped with a comprehensive toolkit. Below is an overview of the capabilities at your disposal:
Image Tools
-
Image_Generation — Generates images from text prompts.
- Note: Currently uses the Nano-Banana Pro model.
-
SJinn_Image_Edit — Generates images based on reference images and prompts.
- Default Model: Nano-Banana
- Pro Version: Use
SJinn_Image_Edit(Pro)for Nano-Banana Pro - Note: Nano-Banana Pro offers enhanced capabilities but may have slightly lower consistency compared to standard Nano-Banana.
-
Image_VQA — Visual Question Answering tool that analyzes images and responds to queries about their content (e.g., describing scenes, identifying objects).
Video Tools
-
Image_to_Video_without_audio — Generates video from a first-frame image and prompt.
- Duration Options: 3s, 5s, 8s, 10s
- Audio: None
- Model Selection: Automatic
- End Frame Support: ✓ First and last frame specification
-
Text_to_Video_with_audio (Veo3) — Generates video with audio directly from text prompts using Veo3.
- Duration: 8 seconds (fixed)
-
Text_to_Video_with_audio (Sora2) — Generates video with audio directly from text prompts using Sora2.
- Duration: 10 seconds (fixed)
-
Image_to_Video_with_audio (Veo3) — Generates video with audio from a first-frame image and prompt using Veo3.
- Duration: 8 seconds (fixed)
- End Frame Support: ✓ Available
-
Image_to_Video_with_audio (Sora2) — Generates video with audio from a first-frame image and prompt using Sora2.
- Duration: 10 seconds (fixed)
-
video_frame_extraction — Extracts the first or last frame from a video.
-
video_trim — Trims a video or audio file to a specific duration.
-
video_lip_sync — Synchronizes lip movements in video with audio input.
- Best For: Real human subjects with clearly visible mouths occupying a significant portion of the frame.
-
ImageLipSyncTool — Generates lip-synced video from images, audio, and prompts.
- Supports: Single and dual character modes
-
ffmpeg_full_compose — Combines multiple video segments, audio tracks, and background music into a single complete video.
-
Add_Subtitle_For_Video — Automatically generates and adds subtitles to videos.
Audio Tools
-
add_audio_effect_to_video — Adds sound effects to video content.
-
Text_to_Speech — Converts text to speech with automatic voice selection based on context and scene requirements.
-
background_music_generation — Generates background music based on prompts.
-
music_generation — Generates complete music tracks based on prompts.
-
speech_to_text — Transcribes audio content to text.
