Commit 22552bfc authored by duanjinfei's avatar duanjinfei

update document

parent 26b06316
---
title: AI Face Swap API Usage Guide
slug: eC4eIgRJKwIGIsmMqAH4l
createdAt: Thu Jul 18 2024 06:04:57 GMT+0000 (Coordinated Universal Time)
updatedAt: Thu Jul 18 2024 13:43:11 GMT+0000 (Coordinated Universal Time)
---
# AI Face Swap API Usage Guide
## Introduction
This document will guide developers on how to use the aonweb library to call the AI Face Swap API.
## Prerequisites
- Node.js environment
- `aonweb` library installed
- Valid Aonet APPID
## Basic Usage
### 1. Import Required Modules
```js
import { AI, AIOptions } from 'aonweb';
```
### 2. Initialize AI Instance
```js
const ai_options = new AIOptions({
appId: 'your_app_id_here',
dev_mode: true
});
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
```js
const data = {
input: {
"swap_image": "https://aonweb.ai/pbxt/JoBuzfSVFLb5lBqkf3v9xMnqx3jFCYhM5JcVInFFwab8sLg0/long-trench-coat.png",
"target_image": "https://replicate.delivery/pbxt/JoBuz3wGiVFQ1TDEcsGZbYcNh0bHpvwOi32T1fmxhRujqcu7/9X2.png"
}
};
```
### 4. Call the AI Model
```js
const price = 8; // Cost of the AI call
try {
const response = await aonweb.prediction("/predictions/ai/face-swap", data, price);
// Handle response
console.log("Face swap result:", response);
} catch (error) {
// Error handling
console.error("Error performing face swap:", error);
}
```
### Parameter Description
- `swap_image`: URL of the image containing the face to be swapped.
- `target_image`: URL of the target image where the face will be swapped.
### Notes
- Ensure that the provided image URLs are publicly accessible.
- The API may take some time to process the images, consider implementing appropriate wait or retry logic.
- Handle possible errors, such as network issues or API limitations.
### Example Response
The API response will contain the URL of the processed image or other relevant information. Parse and use the response data according to the actual API documentation.
## Advanced Usage
- Consider implementing error retry mechanisms.
- Add image validation logic to ensure the provided URLs point to valid image files.
- For production environments, consider implementing rate limiting and caching mechanisms to optimize API usage.
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......
......@@ -36,13 +36,34 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
input: {
"task": "image_captioning",
"image": "https://replicate.delivery/mgxm/f4e50a7b-e8ca-432f-8e68-082034ebcc70/demo.jpg"
"image": "https://replicate.delivery/mgxm/f4e50a7b-e8ca-432f-8e68-082034ebcc70/demo.jpg",
}
};
```
```js
const data = {
input: {
"task": "image_text_matching",
"image": "https://replicate.delivery/mgxm/2c0e8a0d-232d-4cfa-b97a-b3258be2a2e5/demo.jpg",
"caption": "a dog and a cat are playing in the garden"
}
};
```
```js
const data = {
input: {
"task": "visual_question_answering",
"image": "https://replicate.delivery/mgxm/32e7126a-dd86-4e8c-8706-5855b8b69cf1/demo.jpg",
"question": "where is the woman?"
}
};
```
......@@ -54,33 +75,42 @@ const price = 8; // Cost of the AI call
try {
const response = await aonweb.prediction("/predictions/ai/blip", data, price);
// Handle response
console.log("Blip result:", response);
} catch (error) {
// Error handling
console.error("Error generating :", error);
}
```
### Parameter Description
- `task`: String, the text content to be converted into speech.
- `image`: String, the URL of the audio file used as the voice sample for cloning.
- `task`: String, The type of task the model needs to perform,choose[`image_captioning`,`image_text_matching`],
- default: `image_captioning`
- `image_captioning`: Identify the scene in the picture and give the result
- `image_text_matching`: The result is obtained by matching the given caption parameters with the scene in the picture
- `visual_question_answering`: Ask questions about the scene in the picture
- `image`: String, provide the URL of the image that needs to be recognized.
- `caption`:String, A text description of the scene in the image
- `question`:String,Provide a textual description of the questions
### Notes
- Ensure that the provided audio URL is publicly accessible and of good quality to achieve the best cloning effect.
- Ensure that the provided image URL is publicly accessible and of good quality to achieve the best recognition results.
- The API may take some time to process the input and generate the result, consider implementing appropriate wait or loading states.
- Handle possible errors, such as network issues, invalid input, or API limitations.
- Adhere to the terms of use and privacy regulations, especially when handling voice samples of others.
- Adhere to the terms of use and privacy regulations, especially when handling image samples of others.
### Example Response
The API response will contain the URL of the generated cloned voice or other relevant information. Parse and use the response data according to the actual API documentation.
The API response will contain the results of the image recognition or other relevant information. Parse and use the response data according to the actual API documentation.
## Advanced Usage
- Implement batch text-to-speech conversion by processing multiple text segments in a loop or concurrent requests.
- Add a user interface that allows users to upload their own voice samples and input custom text.
- Implement voice post-processing features, such as adjusting volume, adding background music, or applying audio effects.
- Integrate a voice storage solution to save and manage the generated voice files.
- Consider implementing a voice recognition feature to convert the generated voice back to text for verification or other purposes.
- Unimodal encoders, which separately encode image and text. The image encoder is a vision transformer. The text encoder is the same as BERT. A token is appended to the beginning of the text input to summarize the sentence.
- Image-grounded text encoder, which injects visual information by inserting a cross-attention layer between the self-attention layer and the feed forward network for each transformer block of the text encoder. A task-specific token is appended to the text, and the output embedding of is used as the multimodal representation of the image-text pair.
- Image-grounded text decoder, which replaces the bi-directional self-attention layers in the text encoder with causal self-attention layers. A special token is used to signal the beginning of a sequence.
- Image-Text Contrastive Loss (ITC) activates the unimodal encoder. It aims to align the feature space of the visual transformer and the text transformer by encouraging positive image-text pairs to have similar representations in contrast to the negative pairs.
- Image-Text Matching Loss (ITM) activates the image-grounded text encoder. ITM is a binary classification task, where the model is asked to predict whether an image-text pair is positive (matched) or negative (unmatched) given their multimodal feature.
- Language Modeling Loss (LM) activates the image-grounded text decoder, which aims to generate textual descriptions conditioned on the images.
......@@ -36,7 +36,27 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
input:{
"image": "https://replicate.delivery/pbxt/IJDXbkPoG6rXgFJ08ZhuQHCnX4t62zceE2hgriK8yv3vXEBw/merlion_demo.png",
"question": "What is this a picture of?",
"temperature": 1
}
};
```
```js
const data = {
input:{
"image": "https://replicate.delivery/pbxt/IJEPJQ1Cx2l0TVdISAtGBATLGr0bn3sqZfOAY05QXepqDXd5/gg_bridge.jpeg",
"caption": true,
"temperature": 1
}
};
```
```js
const data = {
......@@ -49,6 +69,17 @@ const data = {
};
```
```js
const data = {
input:{
"image": "https://replicate.delivery/pbxt/IJETATwaW4ZAi1I86Xx5H0LjD4f4NBVcMvbSbUxCnPBKFQsE/panda.jpeg",
"context": "question: what animal is this? answer: panda",
"question": "what country is this animal from? ",
"temperature": 1
}
};
```
### 4. Call the AI Model
```js
......@@ -65,21 +96,22 @@ try {
### Parameter Description
- `image`: String, the text content to be converted into speech.
- `caption`: String, the URL of the audio file used as the voice sample for cloning.
- `question`: String, specifies the language of the text, with "en" indicating English.
- `temperature`: Boolean, whether to perform cleanup processing on the generated voice.
- `image`: String, Input image to query or caption.
- `caption`: Boolean, Select if you want to generate image captions instead of asking questions.
- `context`: String,Optional - previous questions and answers to be used as context for answering current question.
- `question`: String, Question to ask about this image. Leave blank for captioning.
- `temperature`: Number, Temperature for use with nucleus sampling.
### Notes
- Ensure that the provided audio URL is publicly accessible and of good quality to achieve the best cloning effect.
- Ensure that the provided image URL is publicly accessible and of good quality to achieve the best recognition results.
- The API may take some time to process the input and generate the result, consider implementing appropriate wait or loading states.
- Handle possible errors, such as network issues, invalid input, or API limitations.
- Adhere to the terms of use and privacy regulations, especially when handling voice samples of others.
- Adhere to the terms of use and privacy regulations, especially when handling image samples of others.
### Example Response
The API response will contain the URL of the generated cloned voice or other relevant information. Parse and use the response data according to the actual API documentation.
The API response will contain the results of the image recognition or other relevant information. Parse and use the response data according to the actual API documentation.
## Advanced Usage
......@@ -88,4 +120,3 @@ The API response will contain the URL of the generated cloned voice or other rel
- Implement voice post-processing features, such as adjusting volume, adding background music, or applying audio effects.
- Integrate a voice storage solution to save and manage the generated voice files.
- Consider implementing a voice recognition feature to convert the generated voice back to text for verification or other purposes.
\ No newline at end of file
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......@@ -53,12 +53,27 @@ const data = {
};
```
```js
const data = {
input:{
"text": "chat T T S is a text to speech model designed for dialogue applications. \\n[uv_break]it supports mixed language input [uv_break]and offers multi speaker \\ncapabilities with precise control over prosodic elements [laugh]like like \\n[uv_break]laughter[laugh], [uv_break]pauses, [uv_break]and intonation. \\n[uv_break]it delivers natural and expressive speech,[uv_break]so please\\n[uv_break] use the project responsibly at your own risk.[uv_break]",
"top_k": 20,
"top_p": 0.7,
"voice": 2222,
"prompt": "",
"skip_refine": 0,
"temperature": 0.3,
"custom_voice": 0
}
};
```
### 4. Call the AI Model
```js
const price = 8; // Cost of the AI call
try {
const response = await aonweb.prediction("/predictions/ai/chattts", data, price);
const response = await aonweb.prediction("/predictions/ai/chat-tts", data, price);
// Handle response
console.log("Chattts result:", response);
} catch (error) {
......@@ -69,21 +84,25 @@ try {
### Parameter Description
- `text`: String, the text content to be converted into speech.
- `speaker`: String, the URL of the audio file used as the voice sample for cloning.
- `language`: String, specifies the language of the text, with "en" indicating English.
- `cleanup_voice`: Boolean, whether to perform cleanup processing on the generated voice.
- `text`: String, Text to be synthesized
- `top_k`: Number, Top-k sampling parameter
- `top_p`: Number, Top-p sampling parameter
- `voice`: Number, Voice identifier
- `prompt`:String,Prompt for refining text
- `skip_refine`:Number,Skip refine text step
- `temperature`:Number,Temperature for sampling
- `custom_voice`:Number,Custom voice identifier
### Notes
- Ensure that the provided audio URL is publicly accessible and of good quality to achieve the best cloning effect.
- Ensure that the provided text is of good quality to achieve the best inference results.
- The API may take some time to process the input and generate the result, consider implementing appropriate wait or loading states.
- Handle possible errors, such as network issues, invalid input, or API limitations.
- Adhere to the terms of use and privacy regulations, especially when handling voice samples of others.
- Adhere to the terms of use and privacy regulations.
### Example Response
The API response will contain the URL of the generated cloned voice or other relevant information. Parse and use the response data according to the actual API documentation.
The API response will contain the URL of the generated text-to-speech output or other relevant information. Parse and use the response data according to the actual API documentation.
## Advanced Usage
......
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......@@ -66,22 +66,22 @@ try {
### Parameter Description
- `image`: String, the text content to be converted into speech.
- `upscale`: String, the URL of the audio file used as the voice sample for cloning.
- `face_upsample`: String, specifies the language of the text, with "en" indicating English.
- `background_enhance`: Boolean, whether to perform cleanup processing on the generated voice.
- `codeformer_fidelity`: Boolean, whether to perform cleanup processing on the generated voice.
- `image`: String, Please provide the image file that needs to be processed.
- `upscale`: String, The final upsampling scale of the image.
- `face_upsample`: String, Upsample restored faces for high-resolution AI-created images.
- `background_enhance`: Boolean, Enhance background image with Real-ESRGAN.
- `codeformer_fidelity`: Boolean, Balance the quality (lower number) and fidelity (higher number).
### Notes
- Ensure that the provided audio URL is publicly accessible and of good quality to achieve the best cloning effect.
- Ensure that the provided image URL is publicly accessible and of good quality to achieve the best recognition results.
- The API may take some time to process the input and generate the result, consider implementing appropriate wait or loading states.
- Handle possible errors, such as network issues, invalid input, or API limitations.
- Adhere to the terms of use and privacy regulations, especially when handling voice samples of others.
- Adhere to the terms of use and privacy regulations, especially when handling image samples of others.
### Example Response
The API response will contain the URL of the generated cloned voice or other relevant information. Parse and use the response data according to the actual API documentation.
The API response will contain the results of the image recognition or other relevant information. Parse and use the response data according to the actual API documentation.
## Advanced Usage
......
......@@ -36,7 +36,39 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
input:{
"seed": 6,
"image": "https://replicate.delivery/pbxt/IYQCkyANILbqCWObhtFANxUyuVIMsLw7pyky9eFlz17MBG9c/house.png",
"scale": 9,
"steps": 20,
"prompt": "a modernist house in a nice landscape",
"structure": "seg",
"low_threshold": 100,
"high_threshold": 200,
"image_resolution": "512"
}
};
```
```js
const data = {
input:{
"seed": 6,
"image": "https://replicate.delivery/pbxt/IYQDgL9mAeNkgdFCTCb0qXKDHnL8a84VArZBRGhBYRsqc1vn/1200.jpeg",
"scale": 9,
"steps": 20,
"prompt": "a metallic cyborg bird",
"structure": "canny",
"low_threshold": 100,
"high_threshold": 200,
"image_resolution": "512"
}
};
```
```js
const data = {
......@@ -46,10 +78,7 @@ const data = {
"scale": 9,
"steps": 20,
"prompt": "a photo of a brightly colored turtle",
"a_prompt": "Best quality, extremely detailed",
"n_prompt": "Longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
"structure": "scribble",
"num_samples": "1",
"low_threshold": 100,
"high_threshold": 200,
"image_resolution": "512"
......@@ -73,26 +102,29 @@ try {
### Parameter Description
- `text`: String, the text content to be converted into speech.
- `speaker`: String, the URL of the audio file used as the voice sample for cloning.
- `language`: String, specifies the language of the text, with "en" indicating English.
- `cleanup_voice`: Boolean, whether to perform cleanup processing on the generated voice.
- `seed` Number,Provide the seeds required for model inference
- `image` String,Please provide the image file that needs to be processed.
- `scale` Number,Scale for classifier-free guidance
- `steps` Number,Provide the steps required for model inference
- `prompt` String, Please provide the prompt that needs to be inferred.
- `structure` String,Controlnet structure to condition on
- `low_threshold` Number,[canny only] Line detection low threshold
- `high_threshold` Number,[canny only] Line detection high threshold
- `image_resolution` String,Resolution of output image (will be scaled to this as its smaller dimension)
- `num_outputs` Number,Number of images to output (higher values may OOM),default:1
- `eta` Number,Controls the amount of noise that is added to the input data during the denoising diffusion process. Higher value -> more noise
- `negative_prompt` String,Provide the negative prompt required for model inference
### Notes
- Ensure that the provided audio URL is publicly accessible and of good quality to achieve the best cloning effect.
- Ensure that the provided image URL is publicly accessible and of good quality to achieve the best recognition results.
- The API may take some time to process the input and generate the result, consider implementing appropriate wait or loading states.
- Handle possible errors, such as network issues, invalid input, or API limitations.
- Adhere to the terms of use and privacy regulations, especially when handling voice samples of others.
- Adhere to the terms of use and privacy regulations, especially when handling image samples of others.
### Example Response
```js
output: [
"https://replicate.delivery/pbxt/BKfYyssmWUUDWiukeXLAQc47KUPBwDMCdbxd8esqr3yE06eDB/out-0.png"
]
```
The API response will contain the URL of the generated cloned voice or other relevant information. Parse and use the response data according to the actual API documentation.
The API response will contain the results of the image recognition or other relevant information. Parse and use the response data according to the actual API documentation.
## Advanced Usage
......
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......@@ -86,10 +86,7 @@ try {
### Parameter Description
- `text`: String, the text content to be converted into speech.
- `speaker`: String, the URL of the audio file used as the voice sample for cloning.
- `language`: String, specifies the language of the text, with "en" indicating English.
- `cleanup_voice`: Boolean, whether to perform cleanup processing on the generated voice.
-
### Notes
......
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......@@ -69,10 +69,7 @@ try {
### Parameter Description
- `text`: String, the text content to be converted into speech.
- `speaker`: String, the URL of the audio file used as the voice sample for cloning.
- `language`: String, specifies the language of the text, with "en" indicating English.
- `cleanup_voice`: Boolean, whether to perform cleanup processing on the generated voice.
-
### Notes
......
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......@@ -64,10 +64,7 @@ try {
### Parameter Description
- `text`: String, the text content to be converted into speech.
- `speaker`: String, the URL of the audio file used as the voice sample for cloning.
- `language`: String, specifies the language of the text, with "en" indicating English.
- `cleanup_voice`: Boolean, whether to perform cleanup processing on the generated voice.
-
### Notes
......
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......@@ -74,10 +74,7 @@ try {
### Parameter Description
- `text`: String, the text content to be converted into speech.
- `speaker`: String, the URL of the audio file used as the voice sample for cloning.
- `language`: String, specifies the language of the text, with "en" indicating English.
- `cleanup_voice`: Boolean, whether to perform cleanup processing on the generated voice.
-
### Notes
......
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......@@ -67,10 +67,7 @@ try {
### Parameter Description
- `text`: String, the text content to be converted into speech.
- `speaker`: String, the URL of the audio file used as the voice sample for cloning.
- `language`: String, specifies the language of the text, with "en" indicating English.
- `cleanup_voice`: Boolean, whether to perform cleanup processing on the generated voice.
-
### Notes
......
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......@@ -61,10 +61,7 @@ try {
### Parameter Description
- `text`: String, the text content to be converted into speech.
- `speaker`: String, the URL of the audio file used as the voice sample for cloning.
- `language`: String, specifies the language of the text, with "en" indicating English.
- `cleanup_voice`: Boolean, whether to perform cleanup processing on the generated voice.
-
### Notes
......
......@@ -36,7 +36,7 @@ const ai_options = new AIOptions({
const aonweb = new AI(ai_options);
```
### 3. Prepare Input Data
### 3. Prepare Input Data Example
```js
const data = {
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment