SJinn
SJinn-Agent-Guide

SJinn-Agent-Guide

What is SJinn Agent?

SJinn Agent is an all-in-one AI-powered creative agent for image and video production. Through natural conversation, it autonomously handles the entire content creation process — from story videos and children's content to cinematic films, anime, UGC, advertisements, podcasts, and virtually any video format you can imagine.

Keywords: Strategy, Planning, Autonomous, End-to-End

SJinn goes beyond simple execution. It understands your ultimate goal, breaks it down into a strategic plan (scripting, storyboarding, asset generation), and brings it to life autonomously. You provide the vision; SJinn directs the entire creation process.


How to Communicate with SJinn Agent

Think of SJinn Agent not as a tool, but as a creative collaborator. Interact with it the way you would with a creative director — describe your vision, and it will bring it to life.

If the output doesn't meet your expectations, simply tell it what needs improvement, and it will regenerate those specific parts.


Using Sora2 or Veo3

By default, SJinn Agent uses video generation models(auto select Seedance Kling Hailuo). To use Sora2 or Veo3, you have two options:

Option 1: Direct Command

Simply tell the Agent:

  • "Use Sora2"
  • "Use Veo3"

Note: Currently, only Sora2 and Veo3 can be manually specified. For other models, the Agent will automatically select the most suitable one based on your content, no support specific.

Option 2: Use Templates

Choose from pre-configured templates optimized for these models:


About Templates

Templates guide the Agent through specific workflows or provide context tailored to particular video styles. Using the right template ensures more consistent results and higher quality output.

Custom Templates

SJinn supports creating and publishing your own custom templates for others to use. Learn more in our Template Creation Guide.


Content Creation Guides

Story / Anime / Film Videos

Narrative-driven content with plot and characters.

real Films

For live-action video content, we recommend using the Veo3 Story Template for optimal results.

Cartoon & Kids Stories

For narration/explainer-style content:

If you're creating educational or narration-driven kids content, use this template:

For dialogue-based content:

If your story features character dialogue, the Veo3 Story Template works well:

Anime Films

For anime-style videos, we recommend either using no template or leveraging Sora2 templates for the best visual results.

For high scene continuity:

If maintaining smooth transitions and narrative flow is your priority, use:

For high character consistency:

If preserving consistent character appearance across scenes is critical, use:

UGC Videos

User-generated content style videos.

Coming soon...

Podcast Videos

Regarding podcast videos, we have created templates for both solo and two-person podcast videos.

single podcast

dual-character podcast

Music Videos

Regarding music videos, we have built a template that generates music videos based on the music, with video content matching and synchronized with the lyrics.

Music story Video (Sync with lyrics)

Trending Effect Videos

Popular visual effects and styles.

Coming soon...


Available Tools

SJinn Agent comes equipped with a comprehensive toolkit. Below is an overview of the capabilities at your disposal:

Image Tools

  • Image_Generation — Generates images from text prompts.

    • Note: Currently uses the Nano-Banana Pro model.
  • SJinn_Image_Edit — Generates images based on reference images and prompts.

    • Default Model: Nano-Banana
    • Pro Version: Use SJinn_Image_Edit(Pro) for Nano-Banana Pro
    • Note: Nano-Banana Pro offers enhanced capabilities but may have slightly lower consistency compared to standard Nano-Banana.
  • Image_VQA — Visual Question Answering tool that analyzes images and responds to queries about their content (e.g., describing scenes, identifying objects).

Video Tools

  • Image_to_Video_without_audio — Generates video from a first-frame image and prompt.

    • Duration Options: 3s, 5s, 8s, 10s
    • Audio: None
    • Model Selection: Automatic
    • End Frame Support: ✓ First and last frame specification
  • Text_to_Video_with_audio (Veo3) — Generates video with audio directly from text prompts using Veo3.

    • Duration: 8 seconds (fixed)
  • Text_to_Video_with_audio (Sora2) — Generates video with audio directly from text prompts using Sora2.

    • Duration: 10 seconds (fixed)
  • Image_to_Video_with_audio (Veo3) — Generates video with audio from a first-frame image and prompt using Veo3.

    • Duration: 8 seconds (fixed)
    • End Frame Support: ✓ Available
  • Image_to_Video_with_audio (Sora2) — Generates video with audio from a first-frame image and prompt using Sora2.

    • Duration: 10 seconds (fixed)
  • video_frame_extraction — Extracts the first or last frame from a video.

  • video_trim — Trims a video or audio file to a specific duration.

  • video_lip_sync — Synchronizes lip movements in video with audio input.

    • Best For: Real human subjects with clearly visible mouths occupying a significant portion of the frame.
  • ImageLipSyncTool — Generates lip-synced video from images, audio, and prompts.

    • Supports: Single and dual character modes
  • ffmpeg_full_compose — Combines multiple video segments, audio tracks, and background music into a single complete video.

  • Add_Subtitle_For_Video — Automatically generates and adds subtitles to videos.

Audio Tools

  • add_audio_effect_to_video — Adds sound effects to video content.

  • Text_to_Speech — Converts text to speech with automatic voice selection based on context and scene requirements.

  • background_music_generation — Generates background music based on prompts.

  • music_generation — Generates complete music tracks based on prompts.

  • speech_to_text — Transcribes audio content to text.