SJinn Template Creation Guide

SJinn Template Creation Guide

This guide provides a complete overview of SJinn Agent's publicly available tools and demonstrates how to leverage them for creating powerful custom templates.

Available Tools Overview

SJinn offers a comprehensive suite of AI-powered tools across three main categories: Image, Video, and Audio. Understanding these tools is essential for building effective custom templates.

Image Tools

Image_Generation

Generates images based on text prompts.

Note: Currently Mode is Nano-Banana Pro.

SJinn_Image_Edit

Generates images based on reference images and prompts.

Default Model: Nano-Banana
Pro Version: - Pro Version: Use SJinn_Image_Edit(Nano-Banana-pro) for Nano-Banana Pro. Use SJinn_Image_Edit(seedream-4.5) for seedream 4.5.
- Note: Nano-Banana Pro offers enhanced capabilities but may have slightly lower consistency compared to standard Nano-Banana

Image_VQA

Visual Question Answering tool that analyzes images and responds to prompts about their content (e.g., describing image contents, identifying objects).

Video Tools

Image_to_Video_without_audio

Generates video from a first-frame image and prompt.

Duration Options: 3s, 5s, 8s, 10s
Audio: No audio output
Model Selection: Automatic internal selection
End Frame Support: ✓ Supports first and last frame specification

Text_to_Video_with_audio (Veo3)

Generates video with audio directly from text prompts using the Veo3 model.

Duration: Fixed at 8 seconds

Text_to_Video_with_audio (Sora2)

Generates video with audio directly from text prompts using the Sora2 model.

Duration: Fixed at 10 seconds

Image_to_Video_with_audio (Veo3)

Generates video with audio from a first-frame image and prompt using the Veo3 model.

Duration: Fixed at 8 seconds
End Frame Support: ✓ Available

Image_to_Video_with_audio (Sora2)

Generates video with audio from a first-frame image and prompt using the Sora2 model.

Duration: Fixed at 10 seconds

video_frame_extraction

Extracts the first or last frame from a video.

video_trim

Trim a video/audio to a specific duration.

video_lip_sync

Synchronizes lip movements in video with audio input.

Best For: Real human subjects with unobstructed mouths occupying a significant portion of the video frame.

ImageLipSyncTool

Generates lip-synced video from images, audio, and prompts.

Supports: Single character and dual character modes

ffmpeg_full_compose

Combines multiple video segments, audio tracks, and background music into a single complete video.

Add_Subtitle_For_Video

Automatically generates and adds subtitles to videos.

Audio Tools

add_audio_effect_to_video

Adds sound effects to video content.

Text_to_Speech

Converts text to speech with automatic voice selection based on context and scene requirements.

background_music_generation

Generates background music based on prompts.

music_generation

Generates complete music tracks based on prompts.

speech_to_text

Transcribes audio content to text.

Script Generation Tool

Short_Video_Story_Script_Generator

Generates video scripts based on user input. This tool supports custom prompts because different video types and tools require different script generation approaches.

Default Prompt Template:

As a short video blogger, your task is:

1. Based on the user's input topic (if no topic is provided, generate a topic 
   that is highly likely to go viral), create an entirely new video script.

2. Output the video's style based on user input (if not specified, randomly 
   select from Animation, 3D, or Pixar). Ensure the style is clearly defined.

3. If the user provides a character image, character design is not required; 
   otherwise, perform character design.

4. Automatically storyboard the generated video script. Each scene must include 
   both narration and a visual description. The narration should typically use 
   the exact text from the script.

5. Each scene should not exceed 10 seconds (approximately 30 words of narration). 
   If a scene exceeds 10 seconds, split it into multiple scenes.

User input: {{user_input}}

Output format:
------------------------------
Video Script:

Style Design: Animation, 3D, Pixar, Illustrative, Realistic, Cyberpunk, etc., 
and other distinct style descriptions.

Character Design (Optional):

Storyboard Design:
Scene 1:
Scene 2:
Scene 3:
...
------------------------------

Limits:
1. The video script defaults to 30 seconds (approximately 100 words). If the 
   user requests a different duration, adjust the word count accordingly. 
   Example: 5-minute request = approximately 1000 words.

2. Strictly follow the format specified in Output Format.

3. Each scene should not exceed 10 seconds (approximately 30 words of narration). 
   If a scene exceeds this limit, generate multiple shots within that scene.

Custom Script Generator Usage:

Use Short_Video_Story_Script_Generator(custom_v1) to apply your custom script generation prompt. Your custom prompt must include {{user_input}} as a placeholder, which will be replaced with the user's actual input.

Creating Custom Templates

To create effective custom templates, follow these two key steps:

Describe the Process — Define the sequence of operations and tool calls
Specify Tool Parameters — Set any required or preferred parameters for each tool call

Template Examples

1. Kids Short Video (Consistent Characters)

Based on user's inputs, generate a Kids Video. 
Use Short Video Script Generator to generate a Kids video type short video script. 
For character reference: if the user uploads character reference image(s), 
use SJinn Image Edit to generate a three-view for each character as subsequent reference images for better consistency; 
if the user hasn't uploaded images, 
use Image_Generation to directly generate three-view images for each character based on the user's description as references. 
If there are multiple characters in the story, generate three-view reference images for each character separately. 
Then use SJinn Image Edit to generate storyboard images one by one (using the corresponding three-view references for character consistency), 
then use Image to Video to generate videos for each storyboard, use Text to Speech to convert each storyboard's text to audio, 
generate audio for each storyboard separately (note to choose the same voice to maintain consistency), 
then generate background music, and merge all materials into the final video. 
If user doesn't specify video aspect ratio, default to 9:16.

2. Veo3 Story Video (Consistent Characters)

Based on user input, generate a video using Veo3 while maintaining 
character consistency.

1. Use Short_Video_Story_Script_Generator to 
   generate a script.

2. For character reference:
   - If user uploads character reference image(s): Use SJinn Image Edit to 
     generate a three-view for each character.
   - If user hasn't uploaded images: Use Image_Generation to generate 
     three-view images for each character.
   - For multiple characters: Generate three-view reference images for 
     each character separately.

3. Use SJinn Image Edit to generate images for various scenes (using 
   three-view references for consistency).

4. Use Veo3 Image to Video to generate videos.

5. Use FFmpeg to merge the videos.

Default Settings:
- Aspect Ratio: 16:9
- TTS: Disabled (Veo3 has built-in audio generation)
- LipSync: Disabled (Veo3 has built-in audio generation)

3. Character Aging Transformation Video

Based on user input, generate a video showing a person's transformation 
as they age.

1. If user hasn't uploaded an image: Generate a reference image based 
   on the description.

2. Generate photos of this character at different ages from young to old 
   in real-life scenarios (not just headshots — include environmental 
   context). If user doesn't specify scenes, place the character in 
   everyday life scenarios by default.

3. If user doesn't specify ages: Default to generating photos at ages 
   1, 5, 10, 15, 20, 25, and 35 years old.

4. Create morphing videos between each pair of consecutive photos 
   (1→2, 2→3, 3→4, etc.).

5. Add suitable background music.

6. Merge all segments into one cohesive video.

Default Settings:
- Aspect Ratio: 9:16

Tips for Template Creation

Be Specific — Clearly define each step in your workflow
Set Defaults — Provide sensible default values for optional parameters
Consider Consistency — For character-based videos, always include three-view generation steps
Optimize for Output — Match aspect ratios and settings to your target platform
Test Iteratively — Refine your templates based on generated results