Commit d0543d3d authored by duanjinfei's avatar duanjinfei

update document

parent 4063a179
......@@ -64,16 +64,16 @@ try {
### Parameter Description
- `img`: String, the text content to be converted into speech.
- `scale`: String, the URL of the audio file used as the voice sample for cloning.
- `version`: String, specifies the language of the text, with "en" indicating English.
- `img`: String, Provide the image file that needs optimization.
- `scale`: Number, Rescaling factor.
- `version`: String, GFPGAN version. v1.3: better quality. v1.4: more details and better identity.
### Notes
- Ensure that the provided audio URL is publicly accessible and of good quality to achieve the best cloning effect.
- The API may take some time to process the input and generate the result, consider implementing appropriate wait or loading states.
- Ensure that the provided image URL is publicly accessible and of good quality to achieve the best recognition results.
- The API may take some time to process the input and generate the result, consider implementing appropriate wait or loading states
- Handle possible errors, such as network issues, invalid input, or API limitations.
- Adhere to the terms of use and privacy regulations, especially when handling voice samples of others.
- Adhere to the terms of use and privacy regulations, especially when handling image samples of others.
### Example Response
......
......@@ -41,12 +41,16 @@ const aonweb = new AI(ai_options);
```js
const data = {
input: {
"seed": 42,
"steps": 30,
"garm_img": "https://replicate.delivery/pbxt/KgwTlZyFx5aUU3gc5gMiKuD5nNPTgliMlLUWx160G4z99YjO/sweater.webp",
"human_img": "https://replicate.delivery/pbxt/KgwTlhCMvDagRrcVzZJbuozNJ8esPqiNAIJS3eMgHrYuHmW4/KakaoTalk_Photo_2024-04-04-21-44-45.png",
"garment_des": "cute pink top"
}
"crop": false,
"seed": 42,
"steps": 30,
"category": "upper_body",
"force_dc": false,
"garm_img": "https://replicate.delivery/pbxt/KgwTlZyFx5aUU3gc5gMiKuD5nNPTgliMlLUWx160G4z99YjO/sweater.webp",
"mask_img": "https://replicate.delivery/pbxt/KnaDKqnN0h1DDF5CnK7iRSSkFnJrk9kyRiQlcc5gBcy8gpPA/replicate-prediction-wfj8g6sgmxrgp0cf1gnv7btfh8.jpg",
"human_img": "https://replicate.delivery/pbxt/KgwTlhCMvDagRrcVzZJbuozNJ8esPqiNAIJS3eMgHrYuHmW4/KakaoTalk_Photo_2024-04-04-21-44-45.png",
"garment_des": "cute pink top"
}
};
```
......@@ -66,11 +70,15 @@ try {
### Parameter Description
- `seed`: Random seed for generating reproducible results
- `steps`: Number of processing steps
- `garm_img`: URL of the garment image
- `human_img`: URL of the human image
- `garment_des`: Description of the garment
- `seed`: Number,Random seed for generating reproducible results
- `steps`: Number,Provide the steps required for model inference
- `garm_img`: String,Garment, should match the category, can be a product image or even a photo of someone
- `human_img`: String,Model, if this is not 3:4 check crop
- `garment_des`: String,Description of garment e.g. Short Sleeve Round Neck T-shirt
- `crop` Boolean,Use cropping? (check this if your image is not 3:4)
- `category` String,Category of garment
- `force_dc` Boolean,Use the DressCode version of IDM-VTON (this is default false, except if category=dresses)
- `mask_img` String,Mask image, optional (but faster)
### Handling the Response
......
......@@ -69,7 +69,14 @@ try {
### Parameter Description
-
- `image` String,Provide the image url required for model inference
- `top_p` Number,Sample from the top p percent most likely tokens
- `prompt` String, Prompt for mini-gpt4 regarding input image
- `num_beams` Number,Number of beams for beam search decoding
- `max_length` Number,Total length of prompt and output in tokens
- `temperature` Number,Temperature for generating tokens, lower = more predictable results
- `max_new_tokens` Number,Maximum number of new tokens to generate
- `repetition_penalty` Number,Penalty for repeated words in generated text; 1 is no penalty, values greater than 1 discourage repetition, less than 1 encourage it.
### Notes
......
......@@ -64,7 +64,9 @@ try {
### Parameter Description
-
- `image` String,Provide the image file that needs optimization
- `scale` Number,Factor to scale image by
- `face_enhance` Boolean,Run GFPGAN face enhancement along with upscaling
### Notes
......
......@@ -57,8 +57,10 @@ const price = 8; // Cost of the AI call
try {
const response = await aonweb.prediction("/predictions/ai/sadtalker", data, price);
// Handle response
console.log("sadtalker result:", response);
} catch (error) {
// Error handling
console.error("Error generating :", error);
}
```
......
......@@ -58,6 +58,46 @@ const data = {
};
```
```js
const data = {
input:{
"width": 1024,
"height": 1024,
"prompt": "A studio photo of a rainbow coloured cat",
"refine": "expert_ensemble_refiner",
"scheduler": "KarrasDPM",
"num_outputs": 1,
"guidance_scale": 7.5,
"high_noise_frac": 0.8,
"prompt_strength": 0.8,
"num_inference_steps": 50
}
};
```
```js
const data = {
input:{
"mask": "https://replicate.delivery/pbxt/JF3Ld3yPLVA3JIELHx1uaAV5CQOyr4AoiOfo6mJZn2fofGaT/dog-mask.png",
"image": "https://replicate.delivery/pbxt/JF3LddQgRiMM9Q4Smyfw7q7BR9Gn0PwkSWvJjKDPxyvr8Ru0/cool-dog.png",
"width": 1024,
"height": 1024,
"prompt": "An orange cat sitting on a bench",
"refine": "base_image_refiner",
"scheduler": "KarrasDPM",
"num_outputs": 1,
"guidance_scale": 7.5,
"high_noise_frac": 0.8,
"prompt_strength": 0.8,
"num_inference_steps": 25
}
};
```
### 4. Call the AI Model
```js
......@@ -74,14 +114,30 @@ try {
### Parameter Description
-
- `width` Number,Width of output image
- `height` Number,Height of output image
- `image` String,Input image for img2img or inpaint mode
- `prompt` String,Provide the prompt required for model inference
- `refine` String,Which refine style to use,Enum:["no_refiner", "expert_ensemble_refiner", "base_image_refiner"]
- `refine_steps` Number,For base_image_refiner, the number of steps to refine, defaults to num_inference_steps
- `scheduler` String,Enum:["DDIM", "DPMSolverMultistep", "HeunDiscrete", "KarrasDPM", "K_EULER_ANCESTRAL", "K_EULER", "PNDM"]
- `lora_scale` Number,LoRA additive scale. Only applicable on trained models.
- `num_outputs` Number,Number of images to output.
- `guidance_scale` Number,Scale for classifier-free guidance
- `apply_watermark` Boolean,Applies a watermark to enable determining if an image is generated in downstream applications. If you have other provisions for generating or deploying images safely, you can use this to disable watermarking.
- `high_noise_frac` Number,For expert_ensemble_refiner, the fraction of noise to use
- `negative_prompt` String,Input Negative Prompt
- `seed` Number,Random seed. Leave blank to randomize the seed
- `prompt_strength` Number,Prompt strength when using img2img / inpaint. 1.0 corresponds to full destruction of information in image
- `num_inference_steps` Number,Number of denoising steps
- `mask` String,Input mask for inpaint mode. Black areas will be preserved, white areas will be inpainted.
### Notes
- Ensure that the provided audio URL is publicly accessible and of good quality to achieve the best cloning effect.
- The API may take some time to process the input and generate the result, consider implementing appropriate wait or loading states.
- Ensure that the provided image URL is publicly accessible and of good quality to achieve the best recognition results.
- The API may take some time to process the input and generate the result, consider implementing appropriate wait or loading states
- Handle possible errors, such as network issues, invalid input, or API limitations.
- Adhere to the terms of use and privacy regulations, especially when handling voice samples of others.
- Adhere to the terms of use and privacy regulations, especially when handling image samples of others.
### Example Response
......
......@@ -45,12 +45,46 @@ const data = {
"scheduler": "K_EULER",
"num_outputs": 1,
"guidance_scale": 7.5,
"image_dimensions": "512x512",
"num_inference_steps": 50
}
};
```
```js
const data = {
input:{
"prompt": "multicolor hyperspace",
"num_outputs": "1",
"guidance_scale": 7.5,
"num_inference_steps": 50
}
};
```
```js
const data = {
input: {
"prompt": "a gentleman otter in a 19th century portrait",
"num_outputs": 1,
"guidance_scale": 7.5,
"num_inference_steps": 100
}
};
```
```js
const data = {
input:{
"width": 512,
"height": 512,
"prompt": "eye",
"num_outputs": "1",
"guidance_scale": 7.5,
"num_inference_steps": 50
}
};
```
### 4. Call the AI Model
```js
......@@ -67,14 +101,22 @@ try {
### Parameter Description
-
- `prompt` String,Provide the prompt required for model inference
- `scheduler` String,Enum:[]
- `num_outputs` Number,Number of images to generate.
- `guidance_scale` Number,Scale for classifier-free guidance
- `num_inference_steps` Number,Number of denoising steps
- `height` Number,Height of generated image in pixels. Needs to be a multiple of 64
- `width` Number,Width of generated image in pixels. Needs to be a multiple of 64
- `negative_prompt` String,Specify things to not see in the output
- `seed` Number,Random seed. Leave blank to randomize the seed
### Notes
- Ensure that the provided audio URL is publicly accessible and of good quality to achieve the best cloning effect.
- The API may take some time to process the input and generate the result, consider implementing appropriate wait or loading states.
- Ensure that the provided image URL is publicly accessible and of good quality to achieve the best recognition results.
- The API may take some time to process the input and generate the result, consider implementing appropriate wait or loading states
- Handle possible errors, such as network issues, invalid input, or API limitations.
- Adhere to the terms of use and privacy regulations, especially when handling voice samples of others.
- Adhere to the terms of use and privacy regulations, especially when handling image samples of others.
### Example Response
......
......@@ -40,7 +40,79 @@ const aonweb = new AI(ai_options);
```js
const data = {
input:
input:{
"audio": "https://replicate.delivery/mgxm/e5159b1b-508a-4be4-b892-e1eb47850bdc/OSR_uk_000_0050_8k.wav",
"model": "large-v3",
"translate": false,
"temperature": 0,
"transcription": "plain text",
"suppress_tokens": "-1",
"logprob_threshold": -1,
"no_speech_threshold": 0.6,
"condition_on_previous_text": true,
"compression_ratio_threshold": 2.4,
"temperature_increment_on_fallback": 0.2
}
};
```
```js
const data = {
input:{
"audio": "https://replicate.delivery/pbxt/LJr3aqYueyyKOKkIwWWIH67SyvzrAKfCm5tNVYc3uSt7oWy4/4th-dimension-explained-by-a-high-school-student.mp3",
"model": "large-v3",
"language": "auto",
"translate": false,
"temperature": 0,
"transcription": "plain text",
"suppress_tokens": "-1",
"logprob_threshold": -1,
"no_speech_threshold": 0.6,
"condition_on_previous_text": true,
"compression_ratio_threshold": 2.4,
"temperature_increment_on_fallback": 0.2
}
};
```
```js
const data = {
input:{
"audio": "https://replicate.delivery/pbxt/LJr3aqYueyyKOKkIwWWIH67SyvzrAKfCm5tNVYc3uSt7oWy4/4th-dimension-explained-by-a-high-school-student.mp3",
"model": "large-v3",
"language": "auto",
"translate": false,
"temperature": 0,
"transcription": "plain text",
"suppress_tokens": "-1",
"logprob_threshold": -1,
"no_speech_threshold": 0.6,
"condition_on_previous_text": true,
"compression_ratio_threshold": 2.4,
"temperature_increment_on_fallback": 0.2
}
};
```
```js
const data = {
input:{
"audio": "https://replicate.delivery/pbxt/LJr3aqYueyyKOKkIwWWIH67SyvzrAKfCm5tNVYc3uSt7oWy4/4th-dimension-explained-by-a-high-school-student.mp3",
"model": "large-v3",
"language": "auto",
"translate": false,
"temperature": 0,
"transcription": "plain text",
"suppress_tokens": "-1",
"logprob_threshold": -1,
"no_speech_threshold": 0.6,
"condition_on_previous_text": true,
"compression_ratio_threshold": 2.4,
"temperature_increment_on_fallback": 0.2
}
};
```
......@@ -61,7 +133,18 @@ try {
### Parameter Description
-
- `audio` String,Provide the audio file that needs optimization
- `model` String,Whisper model size (currently only large-v3 is supported).
- `translate` Boolean,Translate the text to English when set to True
- `patience` Number,optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search
- `temperature` Number,temperature to use for sampling
- `transcription` String,Choose the format for the transcription
- `suppress_tokens` String,comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations
- `logprob_threshold` Number,if the average log probability is lower than this value, treat the decoding as failed
- `no_speech_threshold` Number,if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence
- `condition_on_previous_text` Boolean,if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop
- `compression_ratio_threshold` Number,if the gzip compression ratio is higher than this value, treat the decoding as failed
- `temperature_increment_on_fallback` Number,temperature to increase when falling back when the decoding fails to meet either of the thresholds below
### Notes
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment