-`image` String,Provide the image url required for model inference
-`top_p` Number,Sample from the top p percent most likely tokens
-`prompt` String, Prompt for mini-gpt4 regarding input image
-`num_beams` Number,Number of beams for beam search decoding
-`max_length` Number,Total length of prompt and output in tokens
-`temperature` Number,Temperature for generating tokens, lower = more predictable results
-`max_new_tokens` Number,Maximum number of new tokens to generate
-`repetition_penalty` Number,Penalty for repeated words in generated text; 1 is no penalty, values greater than 1 discourage repetition, less than 1 encourage it.
-`lora_scale` Number,LoRA additive scale. Only applicable on trained models.
-`num_outputs` Number,Number of images to output.
-`guidance_scale` Number,Scale for classifier-free guidance
-`apply_watermark` Boolean,Applies a watermark to enable determining if an image is generated in downstream applications. If you have other provisions for generating or deploying images safely, you can use this to disable watermarking.
-`high_noise_frac` Number,For expert_ensemble_refiner, the fraction of noise to use
-`negative_prompt` String,Input Negative Prompt
-`seed` Number,Random seed. Leave blank to randomize the seed
-`prompt_strength` Number,Prompt strength when using img2img / inpaint. 1.0 corresponds to full destruction of information in image
-`num_inference_steps` Number,Number of denoising steps
-`mask` String,Input mask for inpaint mode. Black areas will be preserved, white areas will be inpainted.
### Notes
- Ensure that the provided audio URL is publicly accessible and of good quality to achieve the best cloning effect.
- The API may take some time to process the input and generate the result, consider implementing appropriate wait or loading states.
- Ensure that the provided image URL is publicly accessible and of good quality to achieve the best recognition results.
- The API may take some time to process the input and generate the result, consider implementing appropriate wait or loading states
- Handle possible errors, such as network issues, invalid input, or API limitations.
- Adhere to the terms of use and privacy regulations, especially when handling voice samples of others.
- Adhere to the terms of use and privacy regulations, especially when handling image samples of others.
-`audio` String,Provide the audio file that needs optimization
-`model` String,Whisper model size (currently only large-v3 is supported).
-`translate` Boolean,Translate the text to English when set to True
-`patience` Number,optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search
-`temperature` Number,temperature to use for sampling
-`transcription` String,Choose the format for the transcription
-`suppress_tokens` String,comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations
-`logprob_threshold` Number,if the average log probability is lower than this value, treat the decoding as failed
-`no_speech_threshold` Number,if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence
-`condition_on_previous_text` Boolean,if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop
-`compression_ratio_threshold` Number,if the gzip compression ratio is higher than this value, treat the decoding as failed
-`temperature_increment_on_fallback` Number,temperature to increase when falling back when the decoding fails to meet either of the thresholds below