⚒️ Compare GPT-4, Claude & Gemini Responses with Contextual AI’s LMUnit Evaluation

2,833 views · ⚒️ Engineering

Description

PROBLEM

Evaluating and comparing responses from multiple LLMs (OpenAI, Claude, Gemini) can be challenging when done manually.

This workflow automates LLM response quality evaluation using Contextual AI’s LMUnit, a natural language unit testing framework that provides systematic, fine-grained feedback on response clarity and conciseness.

> Note: LMUnit offers natural language-based evaluation with a 1–5 scoring scale, enabling consistent and interpretable results across different model outputs.

How it works

How to set up

How to customize the workflow

🔗 Nodes Used

Chat Trigger, OpenAI, Google Gemini, Anthropic, Chat

📥 Import

Download workflow.json and import into n8n: Workflow menu → Import from File

📖 Importing guide · 🔑 Credential setup