Image to Text (OCR) Node¶

The Image to Text node in the agent builder processes an uploaded image and generates text responses based on the user’s prompt. It can provide descriptions, answer image-related questions, or extract text from the image. This node leverages external LLM models like OpenAI and Anthropic for image processing and text generation.

A sample use case involves an insurance company assessing vehicle damage to estimate compensation and verify customer claims. The Image to Text node processes the uploaded image of the damaged vehicle, analyzes the extent of the damage, and helps determine repair costs. The File Upload API generates the file source (URL) at the agent endpoint, which is required as input for the node. Any publicly accessible URLs (public repositories) can also be used for the File Source.

Important Considerations

The user can upload only one file at a time for processing.
Except for image input handling, the OCR node functions like the existing Gen AI node.
Sending images and related settings are handled by the File Upload API.
Image input preprocessing is supported in the following formats:

Binary, base64-encoded for Anthropic models.
Both binary, base64-encoded, and image URLs for OpenAI models.

Steps to Add and Configure the Node¶

To add and configure the node, follow the steps below:

Note

Before proceeding, you must add an external LLM to your account using either Easy Integration or Custom API integration

On the Agents tab, click the name of the agent to which you want to add the node. The Agent Flow page is displayed.
Click Go to flow to edit the in-development version of the flow.
In the flow builder, click the + icon for Image to Text under AI in the Assets panel. Alternatively, drag the node from the panel onto the canvas. You can also click AI in the pop-up menu and click Image to text.
Click the added node to open its properties dialog box. The General Settings for the node are displayed.
Enter or select the following General Settings:

Node Name: Enter an appropriate name for the node. For example, “InsuranceEvaluation.”
Select a model from the list of configured models.

Note

Only the OpenAI (gpt-4o and gpt-4o-mini) and Anthropic (Claude Sonnet Vision) models are currently supported.

Provide the File URL of the public repository where your image file exists or is returned by the Upload File API at the agent endpoint.

Note

Only PNG, JPEG, and JPG file formats are supported.
The file source url must be valid for the node to function properly.

System Prompt: System prompts guide the model’s behavior and response style. Enter a system prompt to define its role for your use case. For example: "You are a vehicle insurance assistant that analyzes uploaded vehicle images to assess damage and estimate repair costs in USD."
Prompt: User prompts define specific questions or requests for the model. Provide clear instructions for the model to follow, using context variables for dynamic inputs in the syntax: {{context.variable_name}}. Example: "Check the image provided for the damaged parts in the car and select what parts are affected from the list below - {{context.parts_list}}."

Click the Connections icon and select the Go to Node for success and failure conditions.

On Success > Go to Node: After the current node is successfully executed, go to a selected node in the flow to execute next, such as a Gen AI node, Function node, Condition node, API node, or End node.
On Failure > Go to Node: If the execution of the current node fails, go to the End node to display any custom error message from the Image to Text node.

Finally, test the flow and fix any issues found. Click the Run Flow button at the top-right corner of the flow builder and follow the onscreen instructions.

Standard Error

When the Model is not selected, the prompt details are not provided, or both, the following error message is displayed: “Proper data needs to be provided in the LLM node”.