AI Model Performance Comparison for Voicebot Meeting Minutes

πŸ’‘ Core Analysis

Comprehensive evaluation of 8 AI models for generating meeting minutes from voicebot transcripts. Models evaluated include Gemini 2.5 Pro, Gemini Flash variants (with/without thinking), DeepSeek-V3.1, Flash Lite, Minimax-M2, GLM-4.6, and Qwen3-Coder across 8 evaluation dimensions.

πŸ† Top 3 Recommendations

πŸ₯‡ Gemini 2.5 Pro (9.2/10)

  • Best for: Formal meetings, audits, high-stakes scenarios
  • Strengths: Most professional and complete, 95% content capture, includes timestamps
  • Weakness: Slightly verbose
  • Cost: High, but justified for critical meetings

πŸ₯ˆ Gemini Flash Latest (Thinking) (9.0/10)

  • Best for: Technical meetings, best cost-performance ratio
  • Strengths: 95% content capture, innovative checkbox/table format, timestamps, extreme value
  • Weakness: Table format may be too complex for some users
  • Cost: Medium with exceptional ROI

πŸ₯‰ DeepSeek-V3.1 (8.5/10)

  • Best for: Simple meetings, general use
  • Strengths: Good quality, reliable, universal application
  • Weakness: No timestamps
  • Cost: Medium-low with good quality

πŸ“Š Complete Model Rankings

Rank Model Score Rating Cost (60 meetings/mo) Free Tier Value Summary
1 Gemini 2.5 Pro 9.2/10 ⭐⭐⭐ FREE ($2.44*) βœ… Unlimited πŸ₯‡ Best choice, professional and complete
2 Gemini Flash Latest (Thinking) 9.0/10 ⭐⭐⭐ FREE ($0.61*) βœ… Unlimited πŸ₯‡ Best cost-performance, innovative format
3 DeepSeek-V3.1 8.5/10 ⭐⭐ $0.30 ❌ Paid πŸ₯ˆ Good quality, no timestamps
4 Gemini Flash Latest (No Thinking) 8.2/10 ⭐⭐ FREE ($0.61*) βœ… Unlimited πŸ₯‡ Best readability, no timestamps
5 Gemini Flash Lite 7.8/10 ⭐ FREE ($0.61*) βœ… Unlimited πŸ₯ˆ Acceptable, excessive bold
6 Minimax-M2 7.0/10 ⭐ $0.24 ❌ Paid πŸ₯‰ Good format, missing content
7 GLM-4.6 5.5/10 - $0.14 ❌ Paid ❌ Too vague
8 Qwen3-Coder 4.0/10 ❌ $0.10 ❌ Paid ❌ Not suitable for meeting minutes

*Paid tier costs shown in parentheses for reference

πŸ“ˆ 8-Dimension Evaluation Matrix

Model Date Accuracy Completeness Readability Context Depth Format Professionalism Risk Recognition Action Items Timestamps
Gemini 2.5 Pro 9/10 9.5/10 9/10 9.5/10 9/10 9/10 9/10 9/10 βœ… Yes
Flash Latest (Think) 9/10 9.5/10 8.5/10 9/10 9.5/10 9/10 9.5/10 9/10 βœ… Yes
DeepSeek-V3.1 9/10 9/10 9/10 8.5/10 8.5/10 8.5/10 8.5/10 8.5/10 ❌ No
Flash Latest (No) 9/10 9/10 9.5/10 8/10 8.5/10 8/10 8.5/10 8/10 ❌ No
Flash Lite 8.5/10 8.5/10 6/10 7.5/10 8/10 7.5/10 8/10 8/10 ❌ No
Minimax-M2 8/10 7/10 8/10 6/10 8.5/10 6/10 7/10 7/10 ❌ No
GLM-4.6 6/10 6/10 7/10 5.5/10 6.5/10 5.5/10 6/10 6/10 ❌ No
Qwen3-Coder 3/10 5/10 5/10 4/10 7/10 4/10 4/10 4/10 ❌ No

🎯 Key Content Capture Comparison

Model Site Fault Testing Country Code Priority SMS Dual-Carrier Risk Huawei Stats UAT Schedule Historical Context
Gemini 2.5 Pro βœ… Complete + background βœ… β€œExtremely critical” βœ… With history βœ… Yes βœ… Detailed βœ… All
Flash (Think) βœ… Complete + background βœ… Emphasized βœ… With history βœ… Yes βœ… Detailed βœ… All
DeepSeek βœ… Complete 🟑 Mentioned βœ… Complete βœ… Yes 🟑 Brief 🟑 Partial
Flash (No) βœ… Complete + background βœ… Emphasized βœ… With history βœ… Yes βœ… Detailed βœ… Most
Flash Lite 🟑 No background 🟑 Weak 🟑 No history 🟑 Yes 🟑 Brief ❌ Missing
Minimax-M2 ❌ Missing 🟑 Weak 🟑 Partial 🟑 Yes 🟑 Brief ❌ Missing
GLM-4.6 ❌ Missing 🟑 Weak 🟑 Weak 🟑 Yes 🟑 Brief ❌ Missing
Qwen3 ❌ Missing ❌ Missing ❌ Missing 🟑 Yes ❌ Missing ❌ Missing
Legend: βœ… Complete 🟑 Partial/Weak ❌ Missing

πŸ’° Detailed Cost Analysis

Test Scenario Parameters

  • Meeting duration: 60 minutes per meeting
  • Transcript length: ~18,000 characters (~4,500 tokens input per meeting)
  • Output length: ~3,500 tokens (comprehensive meeting minutes)
  • Usage: 2 meetings per day (120 minutes total/day)
  • Monthly volume: 60 meetings (30 business days)
  • Total monthly tokens: ~480,000 tokens (270K input + 210K output)

Cost Breakdown by Model

Gemini Models (Google AI API)

Gemini 2.5 Pro

  • Free Tier: βœ… UNLIMITED (Currently free for all input/output)
  • Paid Tier (if needed):
    • Input: $1.25 per 1M tokens Γ— 4,500 tokens = $0.005625 per meeting
    • Output: $10.00 per 1M tokens Γ— 3,500 tokens = $0.035 per meeting
    • Per meeting: $0.040625
    • Daily (2 meetings): $0.08125
    • Monthly (60 meetings): $2.44
  • Rate limits (Free tier): 15 RPM, 1M requests/day
  • Verdict: βœ… 2 meetings/day stays FREE FOREVER

Gemini Flash Latest (Thinking & No Thinking)

  • Free Tier: βœ… UNLIMITED (Currently free for all input/output)
  • Paid Tier (if needed):
    • Input: $0.30 per 1M tokens Γ— 4,500 tokens = $0.00135 per meeting
    • Output: $2.50 per 1M tokens Γ— 3,500 tokens = $0.00875 per meeting
    • Per meeting: $0.0101
    • Daily (2 meetings): $0.0202
    • Monthly (60 meetings): $0.61
  • Rate limits (Free tier): 15 RPM, 1M requests/day
  • Verdict: βœ… 2 meetings/day stays FREE FOREVER

Gemini Flash Lite

  • Same pricing as Flash Latest
  • Verdict: βœ… 2 meetings/day stays FREE FOREVER

Ollama Cloud Models (All Others)

Note: Ollama is primarily a local tool. If using cloud hosting providers for these models:

DeepSeek-V3.1

  • Typical Cloud Pricing: $0.27/1M input, $1.10/1M output (via third-party)
  • Per meeting: (4,500 Γ— $0.27/1M) + (3,500 Γ— $1.10/1M) = $0.001215 + $0.00385 = $0.005065
  • Daily (2 meetings): $0.0101
  • Monthly (60 meetings): $0.30
  • Verdict: βœ… Very affordable, ~$0.30/month

Minimax-M2

  • Typical Cloud Pricing: ~$0.50/1M tokens (estimated, blended input/output)
  • Per meeting: 8,000 tokens Γ— $0.50/1M = $0.004
  • Monthly (60 meetings): $0.24
  • Verdict: βœ… Affordable but quality issues

GLM-4.6

  • Typical Cloud Pricing: ~$0.30/1M tokens (estimated, blended input/output)
  • Per meeting: 8,000 tokens Γ— $0.30/1M = $0.0024
  • Monthly (60 meetings): $0.14
  • Verdict: ⚠️ Cheap but lowest quality

Qwen3-Coder

  • Typical Cloud Pricing: ~$0.20/1M tokens (estimated, blended input/output)
  • Per meeting: 8,000 tokens Γ— $0.20/1M = $0.0016
  • Monthly (60 meetings): $0.10
  • Verdict: ❌ Cheap but unsuitable for meeting minutes

πŸ’Ž Cost-Performance Summary

Model Per Meeting Daily (2x) Monthly (60x) Free Tier Status Quality Value Rank
Gemini 2.5 Pro FREE ($0.041*) FREE ($0.08*) FREE ($2.44*) βœ… Unlimited Free 9.2/10 πŸ₯‡ #1
Flash (Think) FREE ($0.010*) FREE ($0.02*) FREE ($0.61*) βœ… Unlimited Free 9.0/10 πŸ₯‡ #1
Flash (No) FREE ($0.010*) FREE ($0.02*) FREE ($0.61*) βœ… Unlimited Free 8.2/10 πŸ₯‡ #1
Flash Lite FREE ($0.010*) FREE ($0.02*) FREE ($0.61*) βœ… Unlimited Free 7.8/10 πŸ₯ˆ #2
DeepSeek-V3.1 $0.0051 $0.010 $0.30 ❌ Paid 8.5/10 πŸ₯ˆ #2
Minimax-M2 $0.004 $0.008 $0.24 ❌ Paid 7.0/10 πŸ₯‰ #3
GLM-4.6 $0.0024 $0.005 $0.14 ❌ Paid 5.5/10 ❌ Poor
Qwen3-Coder $0.0016 $0.003 $0.10 ❌ Paid 4.0/10 ❌ Avoid

*Paid tier costs shown in parentheses for reference

🎯 Critical Findings for 2 Meetings/Day

βœ… FREE TIER WINNERS (Unlimited)

All Gemini models stay completely FREE:

  1. Gemini 2.5 Pro: Best quality (9.2/10) + FREE = ULTIMATE CHOICE
  2. Gemini Flash (Thinking): Best features (9.0/10) + FREE = BEST INNOVATION
  3. Gemini Flash (No Thinking): Best readability (8.2/10) + FREE = SIMPLEST CHOICE
  4. Gemini Flash Lite: Good enough (7.8/10) + FREE = BACKUP OPTION

Why Gemini wins:

  • Google’s free tier has NO USAGE LIMITS on free models
  • Rate limit: 15 requests/minute (900/hour) - far exceeds 2 meetings/day
  • Daily limit: 1M requests/day - essentially unlimited
  • βœ… 2 meetings/day = ~60 requests/month « 1M requests/day limit

πŸ’° Paid Options (Ollama Cloud)

If you need non-Gemini models:

  • DeepSeek-V3.1: $0.30/month (good quality, affordable)
  • Minimax-M2: $0.24/month (acceptable, but quality issues)
  • GLM-4.6/Qwen3: $0.10-$0.14/month (cheap but quality too poor)

πŸ“Š Revised Cost-Performance Rankings

Considering 2 meetings/day scenario:

πŸ₯‡ TIER S (Free + Excellent):

  1. Gemini 2.5 Pro - FREE + 9.2/10 quality + timestamps
  2. Gemini Flash (Thinking) - FREE + 9.0/10 + innovative format + timestamps

πŸ₯ˆ TIER A (Free + Good):

  1. Gemini Flash (No Thinking) - FREE + 8.2/10 + best readability
  2. Gemini Flash Lite - FREE + 7.8/10 + acceptable quality

πŸ₯‰ TIER B (Paid but Affordable):

  1. DeepSeek-V3.1 - $0.30/month + 8.5/10 (only if avoiding Google)

❌ TIER C (Not Recommended):

  • Minimax-M2, GLM-4.6, Qwen3-Coder (low quality despite low cost)

πŸ’‘ Strategic Recommendations

For Your Use Case (2 meetings/day):

  1. Primary Choice: Gemini 2.5 Pro
    • FREE forever under current tier
    • Highest quality (9.2/10)
    • Includes timestamps
    • Zero cost risk
  2. Alternative Choice: Gemini Flash (Thinking)
    • FREE forever under current tier
    • Innovative format (9.0/10)
    • Best cost-performance ratio
    • Zero cost risk
  3. Budget Strategy: Stick with Gemini family
    • All Gemini models are FREE in current tier
    • Even if Google eventually adds limits, 2 meetings/day is minimal usage
    • Superior quality compared to paid alternatives
    • No need to consider Ollama Cloud models
  4. Risk Mitigation:
    • Monitor Google’s free tier policy changes
    • Current limits (15 RPM, 1M/day) are extremely generous
    • Even if converted to paid, costs would be minimal ($0.61-$2.44/month)
    • Ollama Cloud alternatives cost $0.10-$0.30/month but lower quality

🚨 Important Notes

Google Gemini Free Tier:

  • βœ… Explicitly permits commercial use
  • βœ… Full 1 million token context window
  • βœ… No daily token limits, only rate limits (15 RPM)
  • βœ… Rate limits reset at midnight Pacific Time
  • ⚠️ Policy subject to change (monitor Google AI Studio)

Ollama Cloud Considerations:

  • Ollama is primarily for local deployment (free if self-hosted)
  • Cloud pricing via third-party providers (Together AI, Replicate, etc.)
  • Costs shown are estimates based on typical provider pricing
  • If self-hosting: FREE but requires infrastructure

πŸŽ“ Final Verdict

For 2Γ—60-minute meetings/day (~18,000 chars each):

ABSOLUTE WINNER: Gemini 2.5 Pro (FREE Tier)

  • βœ… Stays under free tier (unlimited)
  • βœ… Best quality (9.2/10)
  • βœ… Includes timestamps
  • βœ… Professional format
  • βœ… Zero cost (saves $2.44/month vs paid tier)
  • βœ… Zero risk

No reason to use paid alternatives when highest quality is free.

πŸ“‹ Use Case Recommendations

Scenario First Choice Second Choice Acceptable Avoid
Formal audit meetings 2.5 Pro Flash Think - All others
Technical decision meetings Flash Think 2.5 Pro DeepSeek Flash Lite and below
Internal technical discussions Flash Think DeepSeek Flash No GLM/Qwen3
Quick internal sharing Flash No Flash Think DeepSeek GLM/Qwen3
Management reports 2.5 Pro Flash Think DeepSeek All others
Simple meetings Flash No Minimax Flash Lite Qwen3
Extreme budget Flash No Flash Lite Minimax -
Non-technical meetings Flash No Minimax Flash Lite Qwen3

🚨 Major Issues by Model

Model Main Issues Severity
Gemini 2.5 Pro Slightly verbose 🟒 Minor
Flash (Think) Table may be too complex 🟒 Minor
DeepSeek Missing timestamps 🟑 Medium
Flash (No) Missing timestamps, less detail 🟑 Medium
Flash Lite Excessive bold, 30% missing content 🟑 Medium
Minimax-M2 30% missing content, priority errors 🟠 Severe
GLM-4.6 Too vague, 40% missing content πŸ”΄ Critical
Qwen3-Coder Speaker errors, 50% missing, inconsistent πŸ”΄ Critical

πŸ’Ž Feature Comparison Matrix

Feature 2.5 Pro Flash Think DeepSeek Flash No Flash Lite Minimax GLM Qwen3
Timestamp markers βœ… βœ… ❌ ❌ ❌ ❌ ❌ ❌
Checkbox format ❌ βœ… ❌ 🟑 ❌ ❌ ❌ ❌
Table format ❌ βœ… ❌ βœ… ❌ ❌ ❌ ❌
Date accuracy βœ… βœ… βœ… βœ… βœ… βœ… 🟑 ❌
Pending items section βœ… βœ… βœ… βœ… βœ… βœ… 🟑 🟑
Risk section βœ… βœ… βœ… βœ… βœ… 🟑 🟑 🟑
Historical incident records βœ… βœ… 🟑 βœ… ❌ ❌ ❌ ❌
Moderate bold usage βœ… βœ… βœ… βœ… ❌ βœ… βœ… βœ…

πŸŽ“ Selection by Need

πŸ† Highest Quality

  • Gemini 2.5 Pro (9.2)
  • Gemini Flash Latest Thinking (9.0)

πŸ’Ž Best Cost-Performance

  • Gemini Flash Latest Thinking (9.0) ⭐⭐⭐⭐⭐
  • Gemini Flash Latest No Thinking (8.2) ⭐⭐⭐⭐⭐
  • DeepSeek-V3.1 (8.5) ⭐⭐⭐⭐

πŸ“– Best Readability

  1. Gemini Flash Latest No Thinking (9.5/10)
  2. DeepSeek-V3.1 (9/10)
  3. Gemini 2.5 Pro (9/10)

πŸ” Highest Completeness

  • Gemini 2.5 Pro (95%)
  • Gemini Flash Latest Thinking (95%)
  • DeepSeek-V3.1 (90%)

⏱️ Requires Timestamps

  • Gemini 2.5 Pro βœ…
  • Gemini Flash Latest Thinking βœ…
  • All others ❌

πŸ“ Next Steps

Immediate Actions

  1. For critical meetings: Deploy Gemini 2.5 Pro
  2. For technical meetings: Deploy Gemini Flash Latest (Thinking)
  3. For budget optimization: Test DeepSeek-V3.1 vs Flash Latest (No Thinking)

Testing Recommendations

  • Run side-by-side comparison on actual meeting transcripts
  • Measure timestamp importance for your specific use case
  • Test Flash (Thinking) table format with end users
  • Validate content capture percentage requirements

Decision Criteria

  • If timestamps are critical: Choose only from top 2
  • If budget is constrained: Choose from top 4
  • If simplicity matters: Avoid Flash (Thinking) tables
  • If completeness matters: Stay in top 3

🏷️ Tags Analysis

Content Analysis:

  • Type: reference (Technical comparison data with actionable recommendations)
  • Topics:
    • AI - Core subject: AI model evaluation
    • productivity - Meeting minutes optimization
    • tools - Model selection for specific use cases
    • data-science - Quantitative performance analysis
  • Characteristics:
    • technical - Detailed metrics, performance analysis
    • actionable - Clear recommendations and decision framework
  • Priority: medium - Valuable for model selection decisions

Why These Tags: This is a technical reference document providing actionable insights for selecting AI models for meeting minute generation. The comprehensive scoring, use case mapping, and cost-performance analysis make it valuable for both immediate decision-making and future reference. Tagged with AI (primary domain), productivity (application area), tools (selection context), and data-science (analytical approach).

Suggested Bases Filters:

  • Find similar technical comparisons: type = reference AND tags contains "AI" AND tags contains "technical"
  • Find actionable AI insights: tags contains "AI" AND tags contains "actionable" AND status = inbox
  • Find by priority: priority = medium AND status = inbox AND tags contains "tools"
  • Find productivity + AI content: tags contains "AI" AND tags contains "productivity"
  • /semantic-search "AI model comparison evaluation"
  • /semantic-search "meeting minutes automation productivity"

Captured: 2025-11-13 Status: inbox (needs processing) Next Action: Review comparison data and implement model selection for voicebot system