What is SJinn Agent?

SJinn Agent is an all-in-one AI-powered creative agent for image and video production. Through natural conversation, it autonomously handles the entire content creation process — from story videos and children's content to cinematic films, anime, UGC, advertisements, podcasts, and virtually any video format you can imagine.

Keywords: Strategy, Planning, Autonomous, End-to-End

SJinn goes beyond simple execution. It understands your ultimate goal, breaks it down into a strategic plan (scripting, storyboarding, asset generation), and brings it to life autonomously. You provide the vision; SJinn directs the entire creation process.

How to Communicate with SJinn Agent

Think of SJinn Agent not as a tool, but as a creative collaborator. Interact with it the way you would with a creative director — describe your vision, and it will bring it to life.

If the output doesn't meet your expectations, simply tell it what needs improvement, and it will regenerate those specific parts.

Using Sora2 or Veo3

By default, SJinn Agent uses video generation models(auto select Seedance Kling Hailuo). To use Sora2 or Veo3, you have two options:

Option 1: Direct Command

Simply tell the Agent:

"Use Sora2"
"Use Veo3"

Note: Currently, only Sora2 and Veo3 can be manually specified. For other models, the Agent will automatically select the most suitable one based on your content, no support specific.

Option 2: Use Templates

Choose from pre-configured templates optimized for these models:

Veo3 Story Video (Consistent Characters): View Template
Sora2 Story Video v2 (Consistent Characters): View Template
Sora2 Extend: View Template
Veo3 Extend: View Template

About Templates

Templates guide the Agent through specific workflows or provide context tailored to particular video styles. Using the right template ensures more consistent results and higher quality output.

Custom Templates

SJinn supports creating and publishing your own custom templates for others to use. Learn more in our Template Creation Guide.

Content Creation Guides

Story / Anime / Film Videos

Narrative-driven content with plot and characters.

real Films

For live-action video content, we recommend using the Veo3 Story Template for optimal results.

Veo3 Story Video (Consistent Characters)

Cartoon & Kids Stories

For narration/explainer-style content:

If you're creating educational or narration-driven kids content, use this template:

Kids Short Video (Consistent Characters)

For dialogue-based content:

If your story features character dialogue, the Veo3 Story Template works well:

Veo3 Story Video (Consistent Characters)

Anime Films

For anime-style videos, we recommend either using no template or leveraging Sora2 templates for the best visual results.

For high scene continuity:

If maintaining smooth transitions and narrative flow is your priority, use:

Sora2 Extend

For high character consistency:

If preserving consistent character appearance across scenes is critical, use:

Sora2 Story Video v2

UGC Videos

User-generated content style videos.

Coming soon...

Podcast Videos

Regarding podcast videos, we have created templates for both solo and two-person podcast videos.

single podcast

dual-character podcast

Music Videos

Regarding music videos, we have built a template that generates music videos based on the music, with video content matching and synchronized with the lyrics.

Music story Video (Sync with lyrics)

Available Tools

SJinn Agent comes equipped with a comprehensive toolkit. Below is an overview of the capabilities at your disposal:

Image Tools

Image_Generation — Generates images from text prompts.
- Note: Currently uses the Nano-Banana Pro model.
SJinn_Image_Edit — Generates images based on reference images and prompts.
- Default Model: Nano-Banana
- Pro Version: Use SJinn_Image_Edit(Nano-Banana-pro) for Nano-Banana Pro. Use SJinn_Image_Edit(seedream-4.5) for seedream 4.5.
- Note: Nano-Banana Pro offers enhanced capabilities but may have slightly lower consistency compared to standard Nano-Banana.
Image_VQA — Visual Question Answering tool that analyzes images and responds to queries about their content (e.g., describing scenes, identifying objects).

Video Tools

Image_to_Video_without_audio — Generates video from a first-frame image and prompt.
- Duration Options: 3s, 5s, 8s, 10s
- Audio: None
- Model Selection: Automatic
- End Frame Support: ✓ First and last frame specification
Text_to_Video_with_audio (Veo3) — Generates video with audio directly from text prompts using Veo3.
- Duration: 8 seconds (fixed)
Text_to_Video_with_audio (Sora2) — Generates video with audio directly from text prompts using Sora2.
- Duration: 10 seconds (fixed)
Image_to_Video_with_audio (Veo3) — Generates video with audio from a first-frame image and prompt using Veo3.
- Duration: 8 seconds (fixed)
- End Frame Support: ✓ Available
Image_to_Video_with_audio (Sora2) — Generates video with audio from a first-frame image and prompt using Sora2.
- Duration: 10 seconds (fixed)
video_frame_extraction — Extracts the first or last frame from a video.
video_trim — Trims a video or audio file to a specific duration.
video_lip_sync — Synchronizes lip movements in video with audio input.
- Best For: Real human subjects with clearly visible mouths occupying a significant portion of the frame.
ImageLipSyncTool — Generates lip-synced video from images, audio, and prompts.
- Supports: Single and dual character modes
ffmpeg_full_compose — Combines multiple video segments, audio tracks, and background music into a single complete video.
Add_Subtitle_For_Video — Automatically generates and adds subtitles to videos.

Audio Tools

add_audio_effect_to_video — Adds sound effects to video content.
Text_to_Speech — Converts text to speech with automatic voice selection based on context and scene requirements.
background_music_generation — Generates background music based on prompts.
music_generation — Generates complete music tracks based on prompts.
speech_to_text — Transcribes audio content to text.

Additional Tips

1. Video Upscaling

For video upscaling, we recommend waiting until your final video is fully generated and you've confirmed no further edits are needed before proceeding with the upscale.

Using the Agent:
Simply tell the Agent "upscale this video" and it will automatically invoke the video upscaling tool.

Using the Tool directly:
You can also access the video upscaling tool directly at:
Video Upscaler Tool

SJinn-Agent-Guide