SJinn Template Creation Guide
A comprehensive guide to SJinn's open tools and how to create custom templates
SJinn Template Creation Guide
This guide provides a complete overview of SJinn Agent's publicly available tools and demonstrates how to leverage them for creating powerful custom templates.
Available Tools Overview
SJinn offers a comprehensive suite of AI-powered tools across three main categories: Image, Video, and Audio. Understanding these tools is essential for building effective custom templates.
Image Tools
Image_Generation
Generates images based on text prompts.
Note: Currently Mode is Nano-Banana Pro.
SJinn_Image_Edit
Generates images based on reference images and prompts.
- Default Model: Nano-Banana
- Pro Version: - Pro Version: Use
SJinn_Image_Edit(Nano-Banana-pro)for Nano-Banana Pro. UseSJinn_Image_Edit(seedream-4.5)for seedream 4.5.- Note: Nano-Banana Pro offers enhanced capabilities but may have slightly lower consistency compared to standard Nano-Banana
Image_VQA
Visual Question Answering tool that analyzes images and responds to prompts about their content (e.g., describing image contents, identifying objects).
Video Tools
Image_to_Video_without_audio
Generates video from a first-frame image and prompt.
- Duration Options: 3s, 5s, 8s, 10s
- Audio: No audio output
- Model Selection: Automatic internal selection
- End Frame Support: ✓ Supports first and last frame specification
Text_to_Video_with_audio (Veo3)
Generates video with audio directly from text prompts using the Veo3 model.
- Duration: Fixed at 8 seconds
Text_to_Video_with_audio (Sora2)
Generates video with audio directly from text prompts using the Sora2 model.
- Duration: Fixed at 10 seconds
Image_to_Video_with_audio (Veo3)
Generates video with audio from a first-frame image and prompt using the Veo3 model.
- Duration: Fixed at 8 seconds
- End Frame Support: ✓ Available
Image_to_Video_with_audio (Sora2)
Generates video with audio from a first-frame image and prompt using the Sora2 model.
- Duration: Fixed at 10 seconds
video_frame_extraction
Extracts the first or last frame from a video.
video_trim
Trim a video/audio to a specific duration.
video_lip_sync
Synchronizes lip movements in video with audio input.
Best For: Real human subjects with unobstructed mouths occupying a significant portion of the video frame.
ImageLipSyncTool
Generates lip-synced video from images, audio, and prompts.
- Supports: Single character and dual character modes
ffmpeg_full_compose
Combines multiple video segments, audio tracks, and background music into a single complete video.
Add_Subtitle_For_Video
Automatically generates and adds subtitles to videos.
Audio Tools
add_audio_effect_to_video
Adds sound effects to video content.
Text_to_Speech
Converts text to speech with automatic voice selection based on context and scene requirements.
background_music_generation
Generates background music based on prompts.
music_generation
Generates complete music tracks based on prompts.
speech_to_text
Transcribes audio content to text.
Script Generation Tool
Short_Video_Story_Script_Generator
Generates video scripts based on user input. This tool supports custom prompts because different video types and tools require different script generation approaches.
Default Prompt Template:
As a short video blogger, your task is:
1. Based on the user's input topic (if no topic is provided, generate a topic
that is highly likely to go viral), create an entirely new video script.
2. Output the video's style based on user input (if not specified, randomly
select from Animation, 3D, or Pixar). Ensure the style is clearly defined.
3. If the user provides a character image, character design is not required;
otherwise, perform character design.
4. Automatically storyboard the generated video script. Each scene must include
both narration and a visual description. The narration should typically use
the exact text from the script.
5. Each scene should not exceed 10 seconds (approximately 30 words of narration).
If a scene exceeds 10 seconds, split it into multiple scenes.
User input: {{user_input}}
Output format:
------------------------------
Video Script:
Style Design: Animation, 3D, Pixar, Illustrative, Realistic, Cyberpunk, etc.,
and other distinct style descriptions.
Character Design (Optional):
Storyboard Design:
Scene 1:
Scene 2:
Scene 3:
...
------------------------------
Limits:
1. The video script defaults to 30 seconds (approximately 100 words). If the
user requests a different duration, adjust the word count accordingly.
Example: 5-minute request = approximately 1000 words.
2. Strictly follow the format specified in Output Format.
3. Each scene should not exceed 10 seconds (approximately 30 words of narration).
If a scene exceeds this limit, generate multiple shots within that scene.
Custom Script Generator Usage:
Use Short_Video_Story_Script_Generator(custom_v1) to apply your custom script generation prompt. Your custom prompt must include {{user_input}} as a placeholder, which will be replaced with the user's actual input.
Creating Custom Templates
To create effective custom templates, follow these two key steps:
- Describe the Process — Define the sequence of operations and tool calls
- Specify Tool Parameters — Set any required or preferred parameters for each tool call
Template Examples
1. Kids Short Video (Consistent Characters)
Based on user's inputs, generate a Kids Video.
Use Short Video Script Generator to generate a Kids video type short video script.
For character reference: if the user uploads character reference image(s),
use SJinn Image Edit to generate a three-view for each character as subsequent reference images for better consistency;
if the user hasn't uploaded images,
use Image_Generation to directly generate three-view images for each character based on the user's description as references.
If there are multiple characters in the story, generate three-view reference images for each character separately.
Then use SJinn Image Edit to generate storyboard images one by one (using the corresponding three-view references for character consistency),
then use Image to Video to generate videos for each storyboard, use Text to Speech to convert each storyboard's text to audio,
generate audio for each storyboard separately (note to choose the same voice to maintain consistency),
then generate background music, and merge all materials into the final video.
If user doesn't specify video aspect ratio, default to 9:16.
2. Veo3 Story Video (Consistent Characters)
Based on user input, generate a video using Veo3 while maintaining
character consistency.
1. Use Short_Video_Story_Script_Generator to
generate a script.
2. For character reference:
- If user uploads character reference image(s): Use SJinn Image Edit to
generate a three-view for each character.
- If user hasn't uploaded images: Use Image_Generation to generate
three-view images for each character.
- For multiple characters: Generate three-view reference images for
each character separately.
3. Use SJinn Image Edit to generate images for various scenes (using
three-view references for consistency).
4. Use Veo3 Image to Video to generate videos.
5. Use FFmpeg to merge the videos.
Default Settings:
- Aspect Ratio: 16:9
- TTS: Disabled (Veo3 has built-in audio generation)
- LipSync: Disabled (Veo3 has built-in audio generation)
3. Character Aging Transformation Video
Based on user input, generate a video showing a person's transformation
as they age.
1. If user hasn't uploaded an image: Generate a reference image based
on the description.
2. Generate photos of this character at different ages from young to old
in real-life scenarios (not just headshots — include environmental
context). If user doesn't specify scenes, place the character in
everyday life scenarios by default.
3. If user doesn't specify ages: Default to generating photos at ages
1, 5, 10, 15, 20, 25, and 35 years old.
4. Create morphing videos between each pair of consecutive photos
(1→2, 2→3, 3→4, etc.).
5. Add suitable background music.
6. Merge all segments into one cohesive video.
Default Settings:
- Aspect Ratio: 9:16
Tips for Template Creation
- Be Specific — Clearly define each step in your workflow
- Set Defaults — Provide sensible default values for optional parameters
- Consider Consistency — For character-based videos, always include three-view generation steps
- Optimize for Output — Match aspect ratios and settings to your target platform
- Test Iteratively — Refine your templates based on generated results
