AI Model Performance Comparison for Voicebot Meeting Minutes
π‘ Core Analysis
Comprehensive evaluation of 8 AI models for generating meeting minutes from voicebot transcripts. Models evaluated include Gemini 2.5 Pro, Gemini Flash variants (with/without thinking), DeepSeek-V3.1, Flash Lite, Minimax-M2, GLM-4.6, and Qwen3-Coder across 8 evaluation dimensions.
π Top 3 Recommendations
π₯ Gemini 2.5 Pro (9.2/10)
- Best for: Formal meetings, audits, high-stakes scenarios
- Strengths: Most professional and complete, 95% content capture, includes timestamps
- Weakness: Slightly verbose
- Cost: High, but justified for critical meetings
π₯ Gemini Flash Latest (Thinking) (9.0/10)
- Best for: Technical meetings, best cost-performance ratio
- Strengths: 95% content capture, innovative checkbox/table format, timestamps, extreme value
- Weakness: Table format may be too complex for some users
- Cost: Medium with exceptional ROI
π₯ DeepSeek-V3.1 (8.5/10)
- Best for: Simple meetings, general use
- Strengths: Good quality, reliable, universal application
- Weakness: No timestamps
- Cost: Medium-low with good quality
π Complete Model Rankings
| Rank | Model | Score | Rating | Cost (60 meetings/mo) | Free Tier | Value | Summary |
|---|---|---|---|---|---|---|---|
| 1 | Gemini 2.5 Pro | 9.2/10 | βββ | FREE ($2.44*) | β Unlimited | π₯ | Best choice, professional and complete |
| 2 | Gemini Flash Latest (Thinking) | 9.0/10 | βββ | FREE ($0.61*) | β Unlimited | π₯ | Best cost-performance, innovative format |
| 3 | DeepSeek-V3.1 | 8.5/10 | ββ | $0.30 | β Paid | π₯ | Good quality, no timestamps |
| 4 | Gemini Flash Latest (No Thinking) | 8.2/10 | ββ | FREE ($0.61*) | β Unlimited | π₯ | Best readability, no timestamps |
| 5 | Gemini Flash Lite | 7.8/10 | β | FREE ($0.61*) | β Unlimited | π₯ | Acceptable, excessive bold |
| 6 | Minimax-M2 | 7.0/10 | β | $0.24 | β Paid | π₯ | Good format, missing content |
| 7 | GLM-4.6 | 5.5/10 | - | $0.14 | β Paid | β | Too vague |
| 8 | Qwen3-Coder | 4.0/10 | β | $0.10 | β Paid | β | Not suitable for meeting minutes |
*Paid tier costs shown in parentheses for reference
π 8-Dimension Evaluation Matrix
| Model | Date Accuracy | Completeness | Readability | Context Depth | Format | Professionalism | Risk Recognition | Action Items | Timestamps |
|---|---|---|---|---|---|---|---|---|---|
| Gemini 2.5 Pro | 9/10 | 9.5/10 | 9/10 | 9.5/10 | 9/10 | 9/10 | 9/10 | 9/10 | β Yes |
| Flash Latest (Think) | 9/10 | 9.5/10 | 8.5/10 | 9/10 | 9.5/10 | 9/10 | 9.5/10 | 9/10 | β Yes |
| DeepSeek-V3.1 | 9/10 | 9/10 | 9/10 | 8.5/10 | 8.5/10 | 8.5/10 | 8.5/10 | 8.5/10 | β No |
| Flash Latest (No) | 9/10 | 9/10 | 9.5/10 | 8/10 | 8.5/10 | 8/10 | 8.5/10 | 8/10 | β No |
| Flash Lite | 8.5/10 | 8.5/10 | 6/10 | 7.5/10 | 8/10 | 7.5/10 | 8/10 | 8/10 | β No |
| Minimax-M2 | 8/10 | 7/10 | 8/10 | 6/10 | 8.5/10 | 6/10 | 7/10 | 7/10 | β No |
| GLM-4.6 | 6/10 | 6/10 | 7/10 | 5.5/10 | 6.5/10 | 5.5/10 | 6/10 | 6/10 | β No |
| Qwen3-Coder | 3/10 | 5/10 | 5/10 | 4/10 | 7/10 | 4/10 | 4/10 | 4/10 | β No |
π― Key Content Capture Comparison
| Model | Site Fault Testing | Country Code Priority | SMS Dual-Carrier Risk | Huawei Stats | UAT Schedule | Historical Context |
|---|---|---|---|---|---|---|
| Gemini 2.5 Pro | β Complete + background | β βExtremely criticalβ | β With history | β Yes | β Detailed | β All |
| Flash (Think) | β Complete + background | β Emphasized | β With history | β Yes | β Detailed | β All |
| DeepSeek | β Complete | π‘ Mentioned | β Complete | β Yes | π‘ Brief | π‘ Partial |
| Flash (No) | β Complete + background | β Emphasized | β With history | β Yes | β Detailed | β Most |
| Flash Lite | π‘ No background | π‘ Weak | π‘ No history | π‘ Yes | π‘ Brief | β Missing |
| Minimax-M2 | β Missing | π‘ Weak | π‘ Partial | π‘ Yes | π‘ Brief | β Missing |
| GLM-4.6 | β Missing | π‘ Weak | π‘ Weak | π‘ Yes | π‘ Brief | β Missing |
| Qwen3 | β Missing | β Missing | β Missing | π‘ Yes | β Missing | β Missing |
| Legend: β Complete | π‘ Partial/Weak | β Missing |
π° Detailed Cost Analysis
Test Scenario Parameters
- Meeting duration: 60 minutes per meeting
- Transcript length: ~18,000 characters (~4,500 tokens input per meeting)
- Output length: ~3,500 tokens (comprehensive meeting minutes)
- Usage: 2 meetings per day (120 minutes total/day)
- Monthly volume: 60 meetings (30 business days)
- Total monthly tokens: ~480,000 tokens (270K input + 210K output)
Cost Breakdown by Model
Gemini Models (Google AI API)
Gemini 2.5 Pro
- Free Tier: β UNLIMITED (Currently free for all input/output)
- Paid Tier (if needed):
- Input: $1.25 per 1M tokens Γ 4,500 tokens = $0.005625 per meeting
- Output: $10.00 per 1M tokens Γ 3,500 tokens = $0.035 per meeting
- Per meeting: $0.040625
- Daily (2 meetings): $0.08125
- Monthly (60 meetings): $2.44
- Rate limits (Free tier): 15 RPM, 1M requests/day
- Verdict: β 2 meetings/day stays FREE FOREVER
Gemini Flash Latest (Thinking & No Thinking)
- Free Tier: β UNLIMITED (Currently free for all input/output)
- Paid Tier (if needed):
- Input: $0.30 per 1M tokens Γ 4,500 tokens = $0.00135 per meeting
- Output: $2.50 per 1M tokens Γ 3,500 tokens = $0.00875 per meeting
- Per meeting: $0.0101
- Daily (2 meetings): $0.0202
- Monthly (60 meetings): $0.61
- Rate limits (Free tier): 15 RPM, 1M requests/day
- Verdict: β 2 meetings/day stays FREE FOREVER
Gemini Flash Lite
- Same pricing as Flash Latest
- Verdict: β 2 meetings/day stays FREE FOREVER
Ollama Cloud Models (All Others)
Note: Ollama is primarily a local tool. If using cloud hosting providers for these models:
DeepSeek-V3.1
- Typical Cloud Pricing: $0.27/1M input, $1.10/1M output (via third-party)
- Per meeting: (4,500 Γ $0.27/1M) + (3,500 Γ $1.10/1M) = $0.001215 + $0.00385 = $0.005065
- Daily (2 meetings): $0.0101
- Monthly (60 meetings): $0.30
- Verdict: β Very affordable, ~$0.30/month
Minimax-M2
- Typical Cloud Pricing: ~$0.50/1M tokens (estimated, blended input/output)
- Per meeting: 8,000 tokens Γ $0.50/1M = $0.004
- Monthly (60 meetings): $0.24
- Verdict: β Affordable but quality issues
GLM-4.6
- Typical Cloud Pricing: ~$0.30/1M tokens (estimated, blended input/output)
- Per meeting: 8,000 tokens Γ $0.30/1M = $0.0024
- Monthly (60 meetings): $0.14
- Verdict: β οΈ Cheap but lowest quality
Qwen3-Coder
- Typical Cloud Pricing: ~$0.20/1M tokens (estimated, blended input/output)
- Per meeting: 8,000 tokens Γ $0.20/1M = $0.0016
- Monthly (60 meetings): $0.10
- Verdict: β Cheap but unsuitable for meeting minutes
π Cost-Performance Summary
| Model | Per Meeting | Daily (2x) | Monthly (60x) | Free Tier Status | Quality | Value Rank |
|---|---|---|---|---|---|---|
| Gemini 2.5 Pro | FREE ($0.041*) | FREE ($0.08*) | FREE ($2.44*) | β Unlimited Free | 9.2/10 | π₯ #1 |
| Flash (Think) | FREE ($0.010*) | FREE ($0.02*) | FREE ($0.61*) | β Unlimited Free | 9.0/10 | π₯ #1 |
| Flash (No) | FREE ($0.010*) | FREE ($0.02*) | FREE ($0.61*) | β Unlimited Free | 8.2/10 | π₯ #1 |
| Flash Lite | FREE ($0.010*) | FREE ($0.02*) | FREE ($0.61*) | β Unlimited Free | 7.8/10 | π₯ #2 |
| DeepSeek-V3.1 | $0.0051 | $0.010 | $0.30 | β Paid | 8.5/10 | π₯ #2 |
| Minimax-M2 | $0.004 | $0.008 | $0.24 | β Paid | 7.0/10 | π₯ #3 |
| GLM-4.6 | $0.0024 | $0.005 | $0.14 | β Paid | 5.5/10 | β Poor |
| Qwen3-Coder | $0.0016 | $0.003 | $0.10 | β Paid | 4.0/10 | β Avoid |
*Paid tier costs shown in parentheses for reference
π― Critical Findings for 2 Meetings/Day
β FREE TIER WINNERS (Unlimited)
All Gemini models stay completely FREE:
- Gemini 2.5 Pro: Best quality (9.2/10) + FREE = ULTIMATE CHOICE
- Gemini Flash (Thinking): Best features (9.0/10) + FREE = BEST INNOVATION
- Gemini Flash (No Thinking): Best readability (8.2/10) + FREE = SIMPLEST CHOICE
- Gemini Flash Lite: Good enough (7.8/10) + FREE = BACKUP OPTION
Why Gemini wins:
- Googleβs free tier has NO USAGE LIMITS on free models
- Rate limit: 15 requests/minute (900/hour) - far exceeds 2 meetings/day
- Daily limit: 1M requests/day - essentially unlimited
- β 2 meetings/day = ~60 requests/month « 1M requests/day limit
π° Paid Options (Ollama Cloud)
If you need non-Gemini models:
- DeepSeek-V3.1: $0.30/month (good quality, affordable)
- Minimax-M2: $0.24/month (acceptable, but quality issues)
- GLM-4.6/Qwen3: $0.10-$0.14/month (cheap but quality too poor)
π Revised Cost-Performance Rankings
Considering 2 meetings/day scenario:
π₯ TIER S (Free + Excellent):
- Gemini 2.5 Pro - FREE + 9.2/10 quality + timestamps
- Gemini Flash (Thinking) - FREE + 9.0/10 + innovative format + timestamps
π₯ TIER A (Free + Good):
- Gemini Flash (No Thinking) - FREE + 8.2/10 + best readability
- Gemini Flash Lite - FREE + 7.8/10 + acceptable quality
π₯ TIER B (Paid but Affordable):
- DeepSeek-V3.1 - $0.30/month + 8.5/10 (only if avoiding Google)
β TIER C (Not Recommended):
- Minimax-M2, GLM-4.6, Qwen3-Coder (low quality despite low cost)
π‘ Strategic Recommendations
For Your Use Case (2 meetings/day):
- Primary Choice: Gemini 2.5 Pro
- FREE forever under current tier
- Highest quality (9.2/10)
- Includes timestamps
- Zero cost risk
- Alternative Choice: Gemini Flash (Thinking)
- FREE forever under current tier
- Innovative format (9.0/10)
- Best cost-performance ratio
- Zero cost risk
- Budget Strategy: Stick with Gemini family
- All Gemini models are FREE in current tier
- Even if Google eventually adds limits, 2 meetings/day is minimal usage
- Superior quality compared to paid alternatives
- No need to consider Ollama Cloud models
- Risk Mitigation:
- Monitor Googleβs free tier policy changes
- Current limits (15 RPM, 1M/day) are extremely generous
- Even if converted to paid, costs would be minimal ($0.61-$2.44/month)
- Ollama Cloud alternatives cost $0.10-$0.30/month but lower quality
π¨ Important Notes
Google Gemini Free Tier:
- β Explicitly permits commercial use
- β Full 1 million token context window
- β No daily token limits, only rate limits (15 RPM)
- β Rate limits reset at midnight Pacific Time
- β οΈ Policy subject to change (monitor Google AI Studio)
Ollama Cloud Considerations:
- Ollama is primarily for local deployment (free if self-hosted)
- Cloud pricing via third-party providers (Together AI, Replicate, etc.)
- Costs shown are estimates based on typical provider pricing
- If self-hosting: FREE but requires infrastructure
π Final Verdict
For 2Γ60-minute meetings/day (~18,000 chars each):
ABSOLUTE WINNER: Gemini 2.5 Pro (FREE Tier)
- β Stays under free tier (unlimited)
- β Best quality (9.2/10)
- β Includes timestamps
- β Professional format
- β Zero cost (saves $2.44/month vs paid tier)
- β Zero risk
No reason to use paid alternatives when highest quality is free.
π Use Case Recommendations
| Scenario | First Choice | Second Choice | Acceptable | Avoid |
|---|---|---|---|---|
| Formal audit meetings | 2.5 Pro | Flash Think | - | All others |
| Technical decision meetings | Flash Think | 2.5 Pro | DeepSeek | Flash Lite and below |
| Internal technical discussions | Flash Think | DeepSeek | Flash No | GLM/Qwen3 |
| Quick internal sharing | Flash No | Flash Think | DeepSeek | GLM/Qwen3 |
| Management reports | 2.5 Pro | Flash Think | DeepSeek | All others |
| Simple meetings | Flash No | Minimax | Flash Lite | Qwen3 |
| Extreme budget | Flash No | Flash Lite | Minimax | - |
| Non-technical meetings | Flash No | Minimax | Flash Lite | Qwen3 |
π¨ Major Issues by Model
| Model | Main Issues | Severity |
|---|---|---|
| Gemini 2.5 Pro | Slightly verbose | π’ Minor |
| Flash (Think) | Table may be too complex | π’ Minor |
| DeepSeek | Missing timestamps | π‘ Medium |
| Flash (No) | Missing timestamps, less detail | π‘ Medium |
| Flash Lite | Excessive bold, 30% missing content | π‘ Medium |
| Minimax-M2 | 30% missing content, priority errors | π Severe |
| GLM-4.6 | Too vague, 40% missing content | π΄ Critical |
| Qwen3-Coder | Speaker errors, 50% missing, inconsistent | π΄ Critical |
π Feature Comparison Matrix
| Feature | 2.5 Pro | Flash Think | DeepSeek | Flash No | Flash Lite | Minimax | GLM | Qwen3 |
|---|---|---|---|---|---|---|---|---|
| Timestamp markers | β | β | β | β | β | β | β | β |
| Checkbox format | β | β | β | π‘ | β | β | β | β |
| Table format | β | β | β | β | β | β | β | β |
| Date accuracy | β | β | β | β | β | β | π‘ | β |
| Pending items section | β | β | β | β | β | β | π‘ | π‘ |
| Risk section | β | β | β | β | β | π‘ | π‘ | π‘ |
| Historical incident records | β | β | π‘ | β | β | β | β | β |
| Moderate bold usage | β | β | β | β | β | β | β | β |
π Selection by Need
π Highest Quality
- Gemini 2.5 Pro (9.2)
- Gemini Flash Latest Thinking (9.0)
π Best Cost-Performance
- Gemini Flash Latest Thinking (9.0) βββββ
- Gemini Flash Latest No Thinking (8.2) βββββ
- DeepSeek-V3.1 (8.5) ββββ
π Best Readability
- Gemini Flash Latest No Thinking (9.5/10)
- DeepSeek-V3.1 (9/10)
- Gemini 2.5 Pro (9/10)
π Highest Completeness
- Gemini 2.5 Pro (95%)
- Gemini Flash Latest Thinking (95%)
- DeepSeek-V3.1 (90%)
β±οΈ Requires Timestamps
- Gemini 2.5 Pro β
- Gemini Flash Latest Thinking β
- All others β
π Next Steps
Immediate Actions
- For critical meetings: Deploy Gemini 2.5 Pro
- For technical meetings: Deploy Gemini Flash Latest (Thinking)
- For budget optimization: Test DeepSeek-V3.1 vs Flash Latest (No Thinking)
Testing Recommendations
- Run side-by-side comparison on actual meeting transcripts
- Measure timestamp importance for your specific use case
- Test Flash (Thinking) table format with end users
- Validate content capture percentage requirements
Decision Criteria
- If timestamps are critical: Choose only from top 2
- If budget is constrained: Choose from top 4
- If simplicity matters: Avoid Flash (Thinking) tables
- If completeness matters: Stay in top 3
π·οΈ Tags Analysis
Content Analysis:
- Type:
reference(Technical comparison data with actionable recommendations) - Topics:
AI- Core subject: AI model evaluationproductivity- Meeting minutes optimizationtools- Model selection for specific use casesdata-science- Quantitative performance analysis
- Characteristics:
technical- Detailed metrics, performance analysisactionable- Clear recommendations and decision framework
- Priority:
medium- Valuable for model selection decisions
Why These Tags: This is a technical reference document providing actionable insights for selecting AI models for meeting minute generation. The comprehensive scoring, use case mapping, and cost-performance analysis make it valuable for both immediate decision-making and future reference. Tagged with AI (primary domain), productivity (application area), tools (selection context), and data-science (analytical approach).
Suggested Bases Filters:
- Find similar technical comparisons:
type = reference AND tags contains "AI" AND tags contains "technical" - Find actionable AI insights:
tags contains "AI" AND tags contains "actionable" AND status = inbox - Find by priority:
priority = medium AND status = inbox AND tags contains "tools" - Find productivity + AI content:
tags contains "AI" AND tags contains "productivity"
π Related Searches
/semantic-search "AI model comparison evaluation"/semantic-search "meeting minutes automation productivity"
Captured: 2025-11-13 Status: inbox (needs processing) Next Action: Review comparison data and implement model selection for voicebot system