Using GPT-Vision

Many OpenAI models support a Vision feature that allows you to upload images and reference them in your prompts. To create tasks using this feature, utilize the Prompt with Files node.

Prompt with Files Node

Setting Up the Input Template

Before using the Prompt with Files node, define an Input Template of the File URL input type. This ensures the template will be available when configuring prompts with files.

Allowed file types for vision requests

Allowed File Types

When using GPT-Vision, the following file types are supported:

webp
png
jpeg
pdf – Converted into one image per page:
- The quality may be lower than the original.
- Large or unusual dimensions can be problematic.
- If you need to preserve very high quality, submit pages individually.
pptx – Converted into one image per slide (quality may be reduced).

Note: Make sure the file extension matches the actual file type.

Image Submission Limits

You can submit up to 10 images in a single request. Tests suggest that including more images in one request may reduce output quality.

Models Supporting Vision

The following models support vision:

azure gpt-4o-mini
gpt-4o-mini
azure-4-o
gpt-4-turbo-with-vision (only describes the first picture)
gpt-4-o

Important: If you select a model that does not support vision, it may hallucinate and describe a random image you did not provide. Some models may explicitly indicate they cannot process images.

Using GPT-Vision

Setting Up the Input Template​

Allowed File Types​

Image Submission Limits​

Models Supporting Vision​

Setting Up the Input Template

Allowed File Types

Image Submission Limits

Models Supporting Vision