Import Datasets¶

Evaluation Studio provides a flexible approach for importing and managing datasets, making it easier to evaluate model outputs. There are several ways to import and handle datasets, enabling flexibility in evaluating model outputs.

Here's how you can bring data into the platform:

Import a dataset: Users can upload datasets in CSV format. These datasets may contain:
- Input-output pairs (where each input has a corresponding output)
- Input data only (where outputs need to be generated by a user-defined model)
Evaluation Studio supports three scenarios for handling the datasets:
- Scenario 1: One Input, One Output: The simplest scenario where a dataset has one input column and one output column. Users map the input and output variables to run the evaluation, making it easy to evaluate model predictions.
- Scenario 2: One Input, Multiple Outputs: In more complex scenarios, one input may predict multiple outputs through different models. The dataset will have one input column and multiple output columns. Users map the input to the corresponding model outputs for evaluation. Also, users can upload ground truth columns to compare responses.
- Scenario 3: Input Only: In this scenario, the dataset contains only input data with no output columns. When users have input data but no corresponding outputs, they can generate model outputs using a pre-trained model. The system will automatically create a new output column based on the input, enabling users to evaluate the pre-trained model's generation.
Import production data: Users can import data from real-time deployed models. Evaluation Studio allows specific filters like the date range, source where the model is deployed, and columns from the model traces. This helps evaluate your deployed model’s generations. For example, users can identify outputs that took longer to generate or consumed more tokens, gaining insights to achieve an optimal balance between output quality and resource consumption, such as time and token usage.

Adding Datasets to an Evaluation¶

Users can add datasets to evaluations within a project. Each dataset represents a collection of inputs and outputs for a specific use case.

Steps to import a dataset:

Navigate to Evaluation Studio.
Click the Projects tab, and choose the relevant project.
Select the specific evaluation to which you want to add datasets.
Choose one of the following import methods to import the dataset for evaluation:
1. Upload from device: Click the Upload file link and select your CSV file saved on your local machine.
2. Import production data: Click Proceed and fill in the required fields in the Import production data dialog:
  1. Models: Choose the model deployed in production (open-source or commercial). You can select any model used in GALE within Agents, Prompts, and endpoints. Only data related to the selected model will be retrieved from Model Traces.
  2. Source: Select the specific source where the model is deployed, such as Agents, Prompts, or endpoints. You can also select the ‘All’ option to import data from all available sources or specify individual sources like specific prompts or agents. For example, you can select a specific agent to see how the model is performing within that agent.
  3. Date: Set the desired date range for the data you want to import. By default, the last 30 days are selected.
  4. Columns: The system automatically fetches the input and output columns by default. If you need more detailed analysis, you can select additional columns such as request ID, input tokens, response time, and other relevant metrics. The selected columns will appear in the evaluation table.
Check the preview of the dataset (first 10 rows). To confirm and finalize the import, click Proceed.

The dataset is imported into Evaluation Studio and linked to the selected evaluation. You can then view your data in a tabular format in the evaluation table.
Click the + button on the Evaluations page to access additional dataset actions:
- Run a prompt: Run a prompt by selecting model name and configurations.
- Add an evaluator: Add a quality or safety evaluator to the dataset.

Running a Prompt¶

The Run a Prompt option enables users to generate customized data based on a specific model and prompt. This feature streamlines data creation and enables easy edits and adjustments for continuous improvements.

For instance, if you want to replace the manual effort of summarizing customer conversations with a fine-tuned model, you can use Evaluation studio to evaluate its summaries. Start by bringing your conversations as input and deploying the fine-tuned model in GALE. Then, in Evaluation studio, select 'Run a prompt' and choose your fine-tuned model. In the prompt, you can specify 'summarize the {{input}}' (column as a variable). This variable will capture the conversations, and based on the additional prompt instructions, the model will generate the summary. Finally, you can assign desired evaluators to evaluate the output produced by the fine-tuned model.

Key Benefits

Efficiency: Generate content for multiple categories quickly and easily.
Customization: Edit prompts and regenerate content to match evolving needs.
Streamlined Workflow: Manage all content generation tasks in one central place.

Steps to run a prompt:

On the Evaluations page, click the + button and select the Run a Prompt option.
In the Run a Prompt dialog:
1. Enter the Column Name for the output data.
2. Choose the appropriate Model and Configuration settings.
3. Type the prompt that describes the data you want to generate, making sure to include any mapped variables.
Click Run to generate a new output column in your data table with the results.

After running the prompt, the following additional options are available:

To modify your prompt or configurations, click the Properties.
To refresh the output based on a new prompt or updated data, click Regenerate.
To remove an output column, click Delete. Before deleting, ensure that no evaluators are dependent on this column to avoid any errors.

Key Points:

A project can contain multiple evaluations, and users can add a dataset to any evaluation.
Users can upload a dataset into the Evaluation Studio and run evaluations to measure model performance.
If importing data from production, carefully select the model, source, and date range to ensure you're importing the relevant data.
Running a prompt enables flexible data generation, allowing users to create customized data based on specific instructions.