⚒️ Evaluation metric: summarization

⚡ 1,131 views · ⚒️ Engineering

Description

This evaluation works best for an AI summarization workflows.
For our scoring, we simple compare the generated response to the original transcript.
A key factor is to look out information in the response which is not mentioned in the documents.
A high score indicates LLM adherence and alignment whereas a low score could signal inadequate prompt or model hallucination.

Webhook, Google Drive, Basic LLM Chain, OpenAI Chat Model, Structured Output Parser, Extract from File

Download workflow.json and import into n8n: Workflow menu → Import from File