AI Model Performance Comparison for Voicebot Meeting Minutes

💡 Core Analysis

Comprehensive evaluation of 8 AI models for generating meeting minutes from voicebot transcripts. Models evaluated include Gemini 2.5 Pro, Gemini Flash variants (with/without thinking), DeepSeek-V3.1, Flash Lite, Minimax-M2, GLM-4.6, and Qwen3-Coder across 8 evaluation dimensions.

🏆 Top 3 Recommendations

🥇 Gemini 2.5 Pro (9.2/10)

Best for: Formal meetings, audits, high-stakes scenarios
Strengths: Most professional and complete, 95% content capture, includes timestamps
Weakness: Slightly verbose
Cost: High, but justified for critical meetings

🥈 Gemini Flash Latest (Thinking) (9.0/10)

Best for: Technical meetings, best cost-performance ratio
Strengths: 95% content capture, innovative checkbox/table format, timestamps, extreme value
Weakness: Table format may be too complex for some users
Cost: Medium with exceptional ROI

🥉 DeepSeek-V3.1 (8.5/10)

Best for: Simple meetings, general use
Strengths: Good quality, reliable, universal application
Weakness: No timestamps
Cost: Medium-low with good quality

📊 Complete Model Rankings

Rank	Model	Score	Rating	Cost (60 meetings/mo)	Free Tier	Value	Summary
1	Gemini 2.5 Pro	9.2/10	⭐⭐⭐	FREE ($2.44*)	✅ Unlimited	🥇	Best choice, professional and complete
2	Gemini Flash Latest (Thinking)	9.0/10	⭐⭐⭐	FREE ($0.61*)	✅ Unlimited	🥇	Best cost-performance, innovative format
3	DeepSeek-V3.1	8.5/10	⭐⭐	$0.30	❌ Paid	🥈	Good quality, no timestamps
4	Gemini Flash Latest (No Thinking)	8.2/10	⭐⭐	FREE ($0.61*)	✅ Unlimited	🥇	Best readability, no timestamps
5	Gemini Flash Lite	7.8/10	⭐	FREE ($0.61*)	✅ Unlimited	🥈	Acceptable, excessive bold
6	Minimax-M2	7.0/10	⭐	$0.24	❌ Paid	🥉	Good format, missing content
7	GLM-4.6	5.5/10	-	$0.14	❌ Paid	❌	Too vague
8	Qwen3-Coder	4.0/10	❌	$0.10	❌ Paid	❌	Not suitable for meeting minutes

*Paid tier costs shown in parentheses for reference

📈 8-Dimension Evaluation Matrix

Model	Date Accuracy	Completeness	Readability	Context Depth	Format	Professionalism	Risk Recognition	Action Items	Timestamps
Gemini 2.5 Pro	9/10	9.5/10	9/10	9.5/10	9/10	9/10	9/10	9/10	✅ Yes
Flash Latest (Think)	9/10	9.5/10	8.5/10	9/10	9.5/10	9/10	9.5/10	9/10	✅ Yes
DeepSeek-V3.1	9/10	9/10	9/10	8.5/10	8.5/10	8.5/10	8.5/10	8.5/10	❌ No
Flash Latest (No)	9/10	9/10	9.5/10	8/10	8.5/10	8/10	8.5/10	8/10	❌ No
Flash Lite	8.5/10	8.5/10	6/10	7.5/10	8/10	7.5/10	8/10	8/10	❌ No
Minimax-M2	8/10	7/10	8/10	6/10	8.5/10	6/10	7/10	7/10	❌ No
GLM-4.6	6/10	6/10	7/10	5.5/10	6.5/10	5.5/10	6/10	6/10	❌ No
Qwen3-Coder	3/10	5/10	5/10	4/10	7/10	4/10	4/10	4/10	❌ No

🎯 Key Content Capture Comparison

Model	Site Fault Testing	Country Code Priority	SMS Dual-Carrier Risk	Huawei Stats	UAT Schedule	Historical Context
Gemini 2.5 Pro	✅ Complete + background	✅ “Extremely critical”	✅ With history	✅ Yes	✅ Detailed	✅ All
Flash (Think)	✅ Complete + background	✅ Emphasized	✅ With history	✅ Yes	✅ Detailed	✅ All
DeepSeek	✅ Complete	🟡 Mentioned	✅ Complete	✅ Yes	🟡 Brief	🟡 Partial
Flash (No)	✅ Complete + background	✅ Emphasized	✅ With history	✅ Yes	✅ Detailed	✅ Most
Flash Lite	🟡 No background	🟡 Weak	🟡 No history	🟡 Yes	🟡 Brief	❌ Missing
Minimax-M2	❌ Missing	🟡 Weak	🟡 Partial	🟡 Yes	🟡 Brief	❌ Missing
GLM-4.6	❌ Missing	🟡 Weak	🟡 Weak	🟡 Yes	🟡 Brief	❌ Missing
Qwen3	❌ Missing	❌ Missing	❌ Missing	🟡 Yes	❌ Missing	❌ Missing

Legend: ✅ Complete

🟡 Partial/Weak

❌ Missing

💰 Detailed Cost Analysis

Test Scenario Parameters

Meeting duration: 60 minutes per meeting
Transcript length: ~18,000 characters (~4,500 tokens input per meeting)
Output length: ~3,500 tokens (comprehensive meeting minutes)
Usage: 2 meetings per day (120 minutes total/day)
Monthly volume: 60 meetings (30 business days)
Total monthly tokens: ~480,000 tokens (270K input + 210K output)

Cost Breakdown by Model

Gemini Models (Google AI API)

Gemini 2.5 Pro

Free Tier: ✅ UNLIMITED (Currently free for all input/output)
Paid Tier (if needed):
- Input: $1.25 per 1M tokens × 4,500 tokens = $0.005625 per meeting
- Output: $10.00 per 1M tokens × 3,500 tokens = $0.035 per meeting
- Per meeting: $0.040625
- Daily (2 meetings): $0.08125
- Monthly (60 meetings): $2.44
Rate limits (Free tier): 15 RPM, 1M requests/day
Verdict: ✅ 2 meetings/day stays FREE FOREVER

Gemini Flash Latest (Thinking & No Thinking)

Free Tier: ✅ UNLIMITED (Currently free for all input/output)
Paid Tier (if needed):
- Input: $0.30 per 1M tokens × 4,500 tokens = $0.00135 per meeting
- Output: $2.50 per 1M tokens × 3,500 tokens = $0.00875 per meeting
- Per meeting: $0.0101
- Daily (2 meetings): $0.0202
- Monthly (60 meetings): $0.61
Rate limits (Free tier): 15 RPM, 1M requests/day
Verdict: ✅ 2 meetings/day stays FREE FOREVER

Gemini Flash Lite

Same pricing as Flash Latest
Verdict: ✅ 2 meetings/day stays FREE FOREVER

Ollama Cloud Models (All Others)

Note: Ollama is primarily a local tool. If using cloud hosting providers for these models:

DeepSeek-V3.1

Typical Cloud Pricing: $0.27/1M input, $1.10/1M output (via third-party)
Per meeting: (4,500 × $0.27/1M) + (3,500 × $1.10/1M) = $0.001215 + $0.00385 = $0.005065
Daily (2 meetings): $0.0101
Monthly (60 meetings): $0.30
Verdict: ✅ Very affordable, ~$0.30/month

Minimax-M2

Typical Cloud Pricing: ~$0.50/1M tokens (estimated, blended input/output)
Per meeting: 8,000 tokens × $0.50/1M = $0.004
Monthly (60 meetings): $0.24
Verdict: ✅ Affordable but quality issues

GLM-4.6

Typical Cloud Pricing: ~$0.30/1M tokens (estimated, blended input/output)
Per meeting: 8,000 tokens × $0.30/1M = $0.0024
Monthly (60 meetings): $0.14
Verdict: ⚠️ Cheap but lowest quality

Qwen3-Coder

Typical Cloud Pricing: ~$0.20/1M tokens (estimated, blended input/output)
Per meeting: 8,000 tokens × $0.20/1M = $0.0016
Monthly (60 meetings): $0.10
Verdict: ❌ Cheap but unsuitable for meeting minutes

💎 Cost-Performance Summary

Model	Per Meeting	Daily (2x)	Monthly (60x)	Free Tier Status	Quality	Value Rank
Gemini 2.5 Pro	FREE ($0.041*)	FREE ($0.08*)	FREE ($2.44*)	✅ Unlimited Free	9.2/10	🥇 #1
Flash (Think)	FREE ($0.010*)	FREE ($0.02*)	FREE ($0.61*)	✅ Unlimited Free	9.0/10	🥇 #1
Flash (No)	FREE ($0.010*)	FREE ($0.02*)	FREE ($0.61*)	✅ Unlimited Free	8.2/10	🥇 #1
Flash Lite	FREE ($0.010*)	FREE ($0.02*)	FREE ($0.61*)	✅ Unlimited Free	7.8/10	🥈 #2
DeepSeek-V3.1	$0.0051	$0.010	$0.30	❌ Paid	8.5/10	🥈 #2
Minimax-M2	$0.004	$0.008	$0.24	❌ Paid	7.0/10	🥉 #3
GLM-4.6	$0.0024	$0.005	$0.14	❌ Paid	5.5/10	❌ Poor
Qwen3-Coder	$0.0016	$0.003	$0.10	❌ Paid	4.0/10	❌ Avoid

*Paid tier costs shown in parentheses for reference

🎯 Critical Findings for 2 Meetings/Day

✅ FREE TIER WINNERS (Unlimited)

All Gemini models stay completely FREE:

Gemini 2.5 Pro: Best quality (9.2/10) + FREE = ULTIMATE CHOICE
Gemini Flash (Thinking): Best features (9.0/10) + FREE = BEST INNOVATION
Gemini Flash (No Thinking): Best readability (8.2/10) + FREE = SIMPLEST CHOICE
Gemini Flash Lite: Good enough (7.8/10) + FREE = BACKUP OPTION

Why Gemini wins:

Google’s free tier has NO USAGE LIMITS on free models
Rate limit: 15 requests/minute (900/hour) - far exceeds 2 meetings/day
Daily limit: 1M requests/day - essentially unlimited
✅ 2 meetings/day = ~60 requests/month « 1M requests/day limit

💰 Paid Options (Ollama Cloud)

If you need non-Gemini models:

DeepSeek-V3.1: $0.30/month (good quality, affordable)
Minimax-M2: $0.24/month (acceptable, but quality issues)
GLM-4.6/Qwen3: $0.10-$0.14/month (cheap but quality too poor)

📊 Revised Cost-Performance Rankings

Considering 2 meetings/day scenario:

🥇 TIER S (Free + Excellent):

Gemini 2.5 Pro - FREE + 9.2/10 quality + timestamps
Gemini Flash (Thinking) - FREE + 9.0/10 + innovative format + timestamps

🥈 TIER A (Free + Good):

Gemini Flash (No Thinking) - FREE + 8.2/10 + best readability
Gemini Flash Lite - FREE + 7.8/10 + acceptable quality

🥉 TIER B (Paid but Affordable):

DeepSeek-V3.1 - $0.30/month + 8.5/10 (only if avoiding Google)

❌ TIER C (Not Recommended):

Minimax-M2, GLM-4.6, Qwen3-Coder (low quality despite low cost)

💡 Strategic Recommendations

For Your Use Case (2 meetings/day):

Primary Choice: Gemini 2.5 Pro
- FREE forever under current tier
- Highest quality (9.2/10)
- Includes timestamps
- Zero cost risk
Alternative Choice: Gemini Flash (Thinking)
- FREE forever under current tier
- Innovative format (9.0/10)
- Best cost-performance ratio
- Zero cost risk
Budget Strategy: Stick with Gemini family
- All Gemini models are FREE in current tier
- Even if Google eventually adds limits, 2 meetings/day is minimal usage
- Superior quality compared to paid alternatives
- No need to consider Ollama Cloud models
Risk Mitigation:
- Monitor Google’s free tier policy changes
- Current limits (15 RPM, 1M/day) are extremely generous
- Even if converted to paid, costs would be minimal ($0.61-$2.44/month)
- Ollama Cloud alternatives cost $0.10-$0.30/month but lower quality

🚨 Important Notes

Google Gemini Free Tier:

✅ Explicitly permits commercial use
✅ Full 1 million token context window
✅ No daily token limits, only rate limits (15 RPM)
✅ Rate limits reset at midnight Pacific Time
⚠️ Policy subject to change (monitor Google AI Studio)

Ollama Cloud Considerations:

Ollama is primarily for local deployment (free if self-hosted)
Cloud pricing via third-party providers (Together AI, Replicate, etc.)
Costs shown are estimates based on typical provider pricing
If self-hosting: FREE but requires infrastructure

🎓 Final Verdict

For 2×60-minute meetings/day (~18,000 chars each):

ABSOLUTE WINNER: Gemini 2.5 Pro (FREE Tier)

✅ Stays under free tier (unlimited)
✅ Best quality (9.2/10)
✅ Includes timestamps
✅ Professional format
✅ Zero cost (saves $2.44/month vs paid tier)
✅ Zero risk

No reason to use paid alternatives when highest quality is free.

📋 Use Case Recommendations

Scenario	First Choice	Second Choice	Acceptable	Avoid
Formal audit meetings	2.5 Pro	Flash Think	-	All others
Technical decision meetings	Flash Think	2.5 Pro	DeepSeek	Flash Lite and below
Internal technical discussions	Flash Think	DeepSeek	Flash No	GLM/Qwen3
Quick internal sharing	Flash No	Flash Think	DeepSeek	GLM/Qwen3
Management reports	2.5 Pro	Flash Think	DeepSeek	All others
Simple meetings	Flash No	Minimax	Flash Lite	Qwen3
Extreme budget	Flash No	Flash Lite	Minimax	-
Non-technical meetings	Flash No	Minimax	Flash Lite	Qwen3

🚨 Major Issues by Model

Model	Main Issues	Severity
Gemini 2.5 Pro	Slightly verbose	🟢 Minor
Flash (Think)	Table may be too complex	🟢 Minor
DeepSeek	Missing timestamps	🟡 Medium
Flash (No)	Missing timestamps, less detail	🟡 Medium
Flash Lite	Excessive bold, 30% missing content	🟡 Medium
Minimax-M2	30% missing content, priority errors	🟠 Severe
GLM-4.6	Too vague, 40% missing content	🔴 Critical
Qwen3-Coder	Speaker errors, 50% missing, inconsistent	🔴 Critical

💎 Feature Comparison Matrix

Feature	2.5 Pro	Flash Think	DeepSeek	Flash No	Flash Lite	Minimax	GLM	Qwen3
Timestamp markers	✅	✅	❌	❌	❌	❌	❌	❌
Checkbox format	❌	✅	❌	🟡	❌	❌	❌	❌
Table format	❌	✅	❌	✅	❌	❌	❌	❌
Date accuracy	✅	✅	✅	✅	✅	✅	🟡	❌
Pending items section	✅	✅	✅	✅	✅	✅	🟡	🟡
Risk section	✅	✅	✅	✅	✅	🟡	🟡	🟡
Historical incident records	✅	✅	🟡	✅	❌	❌	❌	❌
Moderate bold usage	✅	✅	✅	✅	❌	✅	✅	✅

🎓 Selection by Need

🏆 Highest Quality

Gemini 2.5 Pro (9.2)
Gemini Flash Latest Thinking (9.0)

💎 Best Cost-Performance

Gemini Flash Latest Thinking (9.0) ⭐⭐⭐⭐⭐
Gemini Flash Latest No Thinking (8.2) ⭐⭐⭐⭐⭐
DeepSeek-V3.1 (8.5) ⭐⭐⭐⭐

📖 Best Readability

Gemini Flash Latest No Thinking (9.5/10)
DeepSeek-V3.1 (9/10)
Gemini 2.5 Pro (9/10)

🔍 Highest Completeness

Gemini 2.5 Pro (95%)
Gemini Flash Latest Thinking (95%)
DeepSeek-V3.1 (90%)

⏱️ Requires Timestamps

Gemini 2.5 Pro ✅
Gemini Flash Latest Thinking ✅
All others ❌

📝 Next Steps

Immediate Actions

For critical meetings: Deploy Gemini 2.5 Pro
For technical meetings: Deploy Gemini Flash Latest (Thinking)
For budget optimization: Test DeepSeek-V3.1 vs Flash Latest (No Thinking)

Testing Recommendations

Run side-by-side comparison on actual meeting transcripts
Measure timestamp importance for your specific use case
Test Flash (Thinking) table format with end users
Validate content capture percentage requirements

Decision Criteria

If timestamps are critical: Choose only from top 2
If budget is constrained: Choose from top 4
If simplicity matters: Avoid Flash (Thinking) tables
If completeness matters: Stay in top 3

🏷️ Tags Analysis

Content Analysis:

Type: reference (Technical comparison data with actionable recommendations)
Topics:
- AI - Core subject: AI model evaluation
- productivity - Meeting minutes optimization
- tools - Model selection for specific use cases
- data-science - Quantitative performance analysis
Characteristics:
- technical - Detailed metrics, performance analysis
- actionable - Clear recommendations and decision framework
Priority: medium - Valuable for model selection decisions

Why These Tags: This is a technical reference document providing actionable insights for selecting AI models for meeting minute generation. The comprehensive scoring, use case mapping, and cost-performance analysis make it valuable for both immediate decision-making and future reference. Tagged with AI (primary domain), productivity (application area), tools (selection context), and data-science (analytical approach).

Suggested Bases Filters:

Find similar technical comparisons: type = reference AND tags contains "AI" AND tags contains "technical"
Find actionable AI insights: tags contains "AI" AND tags contains "actionable" AND status = inbox
Find by priority: priority = medium AND status = inbox AND tags contains "tools"
Find productivity + AI content: tags contains "AI" AND tags contains "productivity"

/semantic-search "AI model comparison evaluation"
/semantic-search "meeting minutes automation productivity"

Captured: 2025-11-13 Status: inbox (needs processing) Next Action: Review comparison data and implement model selection for voicebot system