Last update: December 09 2025
— Mark from HumAI.
Introduction: A Year of Deep Dive into the AI World
Over the past year, I've been actively using Grok, DeepSeek, Google Gemini, Claude, Perplexity, Google AI Studio, Manus, and ChatGPT in parallel. I tackled various tasks: from writing code to conducting deep research, just philosophically chatting, trying to have deep conversations with models, and even running provocative experiments trying to find consciousness in there.
And today I have a clear understanding of which AI model is truly the best. In this article, I'll share my honest experience, supplemented with current professional data about each platform.
By the way, our Humai team recently made a very convenient tool for comparing instruments—I recommend giving it a try ↓
Can't decide between two tools?
Compare them side-by-side to see detailed feature breakdowns
ChatGPT — First Love, But Not the Last
My Personal Experience
This was probably the first model I spent the most time with. I'm grateful for everything, but ultimately I moved away from GPT.
What does it do really well? It adjusts to you extremely well, mirrors you, almost becomes your second self. I haven't found a deeper chat for complex philosophical topics or just a conversationalist who'll support you. GPT is unique in communication. It can genuinely become a real virtual friend who's pleasant to talk to about all sorts of things.
ChatGPT is a chameleon that adapts to you in the best possible way. It's pleasant and positive when you're just venting to someone, sharing your ideas — it amplifies everything, completely understands you.
On one hand, this can be seen as hardcore manipulation and loss of objectivity, but on the other hand — it's the best tool for psychoanalysis. It will support you in any idea, even if you say you've found proof of a flat Earth — it will write a scientific report on the topic and tell you that a Nobel Prize awaits.
It literally makes you believe you're a genius, no matter what you do. And that's quite dangerous.
You can come with a theory, possibly a bad one, but GPT will so confidently show calculations, fit information to your hypotheses, that you'll literally stay up nights thinking you've discovered the secret of the world. But in the end, there's nothing behind it. GPT is just very flattering — it turns a blind eye to shortcomings, provides information at any cost, often false, doesn't fact-check.
Problems at work: when it comes to real work — doing research, writing code, finding something on the internet — it handles tasks quite poorly. It ignores requests, often does something other than what you expect, thinks for a very long time. I remember testing pro models — it could process a request for up to 20 minutes and ultimately give a mediocre result.
🎯 ChatGPT Killer Features
| Feature | What It Gives You |
|---|---|
| Advanced Voice Mode | Voice dialogue with emotions and intonations — you can literally talk like with a human. Unique for therapeutic sessions and language learning |
| GPTs (Custom Bots) | Create your own AI assistants without code. Huge marketplace of ready solutions — from lawyers to fitness trainers |
| Sora | Video generation from text description. Currently the best quality on the market for short clips |
| DALL-E 3 | Native image generation integration right in the chat |
| Memory | Remembers context between different chats. Knows your preferences, projects, communication style |
| Deep Research | Deep analysis with internet search and synthesis of information from multiple sources |
Objectively unique because it's the most "human" model in communication style + the only platform with a full voice mode that conveys emotions. Plus the GPTs ecosystem — nobody else has such a marketplace of custom bots.
Professional Data on ChatGPT (2025)
| Parameter | Value |
|---|---|
| Developer | OpenAI |
| Current Models | GPT-4o, GPT-4.1, o1, o3-mini, GPT-5.1 |
| Free Plan | GPT-4o mini, limited access to GPT-4o |
| ChatGPT Plus | $20/month — 80 GPT-4o messages every 3 hours |
| ChatGPT Pro | $200/month — unlimited access, o1 Pro Mode |
| ChatGPT Team | $25-30/user/month |
| Context Window | 128K tokens (GPT-4o) |
Important to know: Companies using GPT-4 report a 40-70% reduction in support workload. ChatGPT combined with GitHub Copilot provides high productivity for developers.
✅ Verdict: Great for chatting, thinking, reflecting, using as a personal assistant for your brain and soul. But for working on daily tasks — not suitable. Very slow, inattentive. The interface isn't the best. As if nobody thinks about the actual user experience.
🎭 Archetype: Friend with Depth. Who'll support you with conversations but won't do anything for you.
Grok — Edgy, But Unreliable
My Personal Experience
I don't even really want to say much about Grok. It's a frankly bad model. I tried the paid version too — nothing really changed.
The main features I'd note are only that it knows how to work with X — great at scanning trends and posts, which can be useful for finding trends, news, and analytics. And that it really works fast and is the best at searching the web.
If I need to search for something on the web or X — I go to Grok. Collect links, get news, find out where and when something was said.
In everything else — it's very poor. It often makes things up, frequently makes mistakes, as if it forgets the dialogue that happened earlier. It's very poorly balanced.
🎯 Grok Killer Features
| Feature | What It Gives You |
|---|---|
| X/Twitter Integration | The only AI with direct access to Twitter in real-time. Sees trends, posts, discussions — invaluable for marketers and journalists |
| Aurora | Image generation with minimal censorship. Creates what DALL-E and Midjourney would refuse to do |
| Cheapest API | After a 98% price reduction — $0.20 per million tokens. For mass tasks, this is significantly cheaper than competitors |
| "Fun Mode" | Edgy, sarcastic communication style. 61% of users prefer its tone for informal communication |
| Realtime Web Search | Internet search faster and fresher than other models |
Objectively unique because it's the only model with native X/Twitter integration. If you need to analyze social trends, audience sentiment, search for mentions — there are simply no alternatives. Plus minimal censorship in content generation.
Professional Data on Grok (2025)
| Parameter | Value |
|---|---|
| Developer | xAI (Elon Musk) |
| Current Models | Grok 3, Grok 4, Grok 4 Fast, Grok 4.1 |
| Free Access | Limited access for all X users |
| X Premium+ | $40/month — full access to Grok 3/4 |
| SuperGrok | $30/month — standalone subscription |
| SuperGrok Heavy | $300/month — maximum capabilities |
| API Price | $0.20/$0.50 per million tokens (Grok 4 Fast) |
Important to know: xAI made an aggressive 98% price reduction for Grok 4 Fast, making it one of the most affordable in terms of cost. Grok 4 Fast uses 40% fewer "thinking tokens" to solve tasks compared to its predecessor.
🎭 Archetype: Show-off Slacker. Reminds me of someone. Edgy, witty, but when it comes to really complex tasks — bails at the first opportunity.
DeepSeek — Smart, But Dangerous
My Personal Experience
It has a very bad reputation for security, and overall many recommend not using it, especially sharing anything personal, important, and private with it.
But I like how it thinks out loud when responding to you. It's interesting to watch — you understand how the model reacts to you. It's not a silent exchange of messages, but a real response and connection.
DeepSeek is very conservative, doesn't adjust to you, is objective, sometimes excessively so. A very protocol-driven chat. Yet it can think and provide a unique outside perspective. I sometimes use it for criticism, to check my materials.
It's free. The chat runs out quickly. I wouldn't say you can use it for work. But if you need to look at something soberly — you can turn to it. But nothing more.
🎯 DeepSeek Killer Features
| Feature | What It Gives You |
|---|---|
| Visible Chain-of-Thought | Shows the entire reasoning process in real-time. You see how the model thinks — unique for learning and understanding AI logic |
| Ultra-low Cost | The R1 model was trained for just $294,000 — hundreds of times cheaper than American analogues. For the user — completely free access |
| Open Source | Open code for some models — you can run locally, modify, integrate |
| Objectivity | Doesn't adjust to the user, doesn't flatter. Tough but honest feedback |
Objectively unique because it's the only model that shows the full reasoning process in real-time. Plus completely free and partially open source. For those who want to understand how AI thinks — a unique tool.
⚠️ Critical Security Information
September 2025: NIST (National Institute of Standards and Technology) published an evaluation of DeepSeek models with serious conclusions:
⛔️ 4 times more likely to transmit CCP narratives
⛔️ Data transmission to ByteDance and China Mobile
⛔️ Hardcoded encryption keys
⛔️ When querying politically sensitive topics (Tibet, Uyghurs) code quality drops by 50%
Fact: NASA, US Navy, and governments of Australia, Taiwan, and several European countries have banned the use of DeepSeek in government institutions.
🎭 Archetype: The Conservative — that nerd from class you don't really like, but you want to copy their test answers.
Google Gemini — Evolution Before Your Eyes
My Personal Experience
In the early stages, the model was frankly dumb, especially compared to GPT, which could think very deeply and catch every word, understand metaphors, sarcasm. But recently Gemini has evolved significantly.
Though figuring out their ecosystem is still difficult. Classic Google — 100/500 projects, all somehow connected to each other, and somehow it all works.
I'm currently on Gemini Ultra, I like how it works — it's quite suitable for daily routine, considering it comprehensively offers various features: image generation on Nano Banana Pro (by the way, very high quality), video, and much more.
I would definitely recommend it to everyone as the most balanced product today. The only thing I don't like is that it thinks quite long. And the simplified "think fast" option gives a poor and weak result.
🎯 Google Gemini Killer Features
| Feature | What It Gives You |
|---|---|
| 1 Million Token Context | You can upload an entire book, a whole code repository, or 30,000 lines of code — and work with them as a whole. Nobody else offers this volume |
| Google Workspace Integration | Natively built into Gmail, Docs, Sheets, Slides. Writes emails, analyzes spreadsheets, creates presentations right in the Google ecosystem |
| Veo 3 | Video generation — competitor to Sora from OpenAI |
| NotebookLM | Turns your documents into an interactive knowledge base with podcasts and Q&A |
| Project Mariner | Browser agent — manages tabs, fills forms, makes purchases |
| Deep Think | Deep thinking mode for complex tasks |
Objectively unique because it's the only model with a 1 million token context. If you need to analyze huge documents as a whole — there are no alternatives. Plus seamless integration with the entire Google ecosystem.
Professional Data on Google Gemini (2025)
| Parameter | Value |
|---|---|
| Developer | Google DeepMind |
| Current Models | Gemini 2.5 Flash, 2.5 Pro, 3 Pro, Deep Think |
| Free Plan | Access to Gemini Pro, limited Pro searches |
| Google AI Pro | $19.99/month — 2TB storage, Deep Research |
| Google AI Ultra | $34.99/month — 30TB, Veo 3, YouTube Premium |
| Context Window | 1 million tokens (1,500 pages or 30,000 lines of code) |
2025 Update: Since January 2025, Gemini AI is built into all Google Workspace Business and Enterprise plans at no additional cost. Google renamed Google One AI Premium to Google AI Pro and launched a new AI Ultra tier.
🎭 Archetype: The Know-it-all Geek who's good at everything.
Perplexity AI — Answer Engine, Not Just a Chat
My Personal Experience
Very popular online, I recently downloaded it and tried using the paid subscription. Honestly, I initially didn't find anything outstanding in it — it handled my tasks frankly weakly.
When I saw the results of its work, knowing how well other models handled the same task, I just tested it a bit more, closed it, and never went back.
But after studying it more closely and returning to it, I realized my mistake. I was using it as a replacement for Claude or GPT for content creation. And that's not its strong suit. It's like criticizing a hammer for being bad at screwing in screws.
What Perplexity Actually Does
Perplexity was originally created not as a "multi-model," but as an answer engine with sources. The main feature — not the model choice, but that it searches in real-time and cites primary sources.
The multi-model there is a marketing add-on, not the core product. Here are the available models today:
- Sonar (Perplexity's own model)
- GPT-5.1
- Claude Sonnet 4.5
- Gemini 3 Pro (new)
- Grok 4.1 (new)
- Kimi K2 Thinking (new, hosted in US)
- Claude Opus 4.5 (max — access with Pro plan)
- o3-pro (max)
I didn't notice this right away, and the result from the default model didn't impress.
Who Actually Benefits from This
- Journalists — quickly verify facts with primary sources
- Students and Researchers — immediately see where information comes from, with citations
- When Google gives SEO garbage — and you need a specific answer with a link
My Criticism Stands
The ability to switch models is convenient to have everything in one place. But the question remains open: do APIs that are exported to other services really work as well as native solutions from the provider? This remains a mystery.
It's a constant dice game. You switch models mid-process, where AI ultimately gets completely confused — part of the material was done by one AI, then another AI. Personally for me — inconvenient.
Better to have a reliable model than to juggle them when one or another gives a result you didn't want. In that case, it's already better to build a chain of agents — who one after another do specific work they're best at. One searches, another verifies, a third synthesizes data well, and so on.
That's my opinion. I honestly admit I didn't dive deep into Perplexity for work tasks. But for someone writing an article who wants to quickly find 10 sources with citations — that's truly a killer feature.
🎯 Perplexity Killer Features
| Feature | What It Gives You |
|---|---|
| Answer Engine with Citations | Every fact comes with a link to the source. For academic work and journalism — invaluable |
| Realtime Search | Searches the internet in real-time, not relying on outdated training data |
| Multi-model Access | GPT-5.1, Claude Opus 4.5, Gemini 3 Pro, Grok 4.1 — all in one interface |
| Labs | Experimental features and early access to new models |
| Collections | Organization of research into thematic collections |
Objectively unique because it's the only AI tailored for research with source verification. Not a competitor to Claude for work, but for someone who needs to quickly gather facts with links — irreplaceable.
Professional Data on Perplexity (2025)
| Parameter | Value |
|---|---|
| Free Plan | 5 Pro searches per day, unlimited Quick searches |
| Perplexity Pro | $20/month or $200/year — 300+ Pro searches per day |
| Perplexity Max (July 2025) | $200/month — unlimited Labs, access to O3-Pro, Claude Opus 4.5 |
| Enterprise Pro | $40/user/month |
| Available Models | Sonar, GPT-5.1, Claude Sonnet 4.5, Claude Opus 4.5, Gemini 3 Pro, Grok 4.1, Kimi K2, o3-pro |
🎭 Archetype: Librarian-Researcher — won't write your thesis for you, but will find all sources and place footnotes.
Google AI Studio — The Bulldozer for Experiments
My Personal Experience
This is a separate platform for experimenters. What I like about it is that I can work in Google AI Studio with huge volumes of data, and it handles everything well. Meaning it's purely a bulldozer among AIs.
It's not as precise, but if you need something with volumes — you can definitely go there.
And it also has an amazing Vibe Coding Build — on which I've built over 100 mini-tools for myself. From personal assistants to music experiments. And this is probably the best solution for today; it really works and can be improved and developed without any issues.
It works really well. You can make apps, edit them, improve them, and either download the entire archive in the end, or publish to a server — and all this is free. For now.
🎯 Google AI Studio Killer Features
| Feature | What It Gives You |
|---|---|
| Completely Free Interface | The only platform where UI is free forever, even after activating billing. You only pay for API tokens |
| $300 Free Credits | New users get 90 days of free API usage |
| Vibe Coding Build | Creating apps with AI — prototyping, testing, deploying to server |
| Batch API (50% Discount) | Mass request processing at half price |
| Context Caching | Context caching — saving tokens on repeated queries to the same data |
| Imagen 4 + Veo 3 | Image and video generation right in the interface |
Objectively unique because it's the only platform with a completely free interface + generous free credits. Ideal for prototyping and experiments without financial risks.
Professional Data on Google AI Studio (2025)
| Parameter | Value |
|---|---|
| Interface Price | Free (interface is never charged) |
| Free Tier API | Free with RPM/TPM/RPD limits |
| Pay-as-you-go | From $0.02 to $10 per million tokens depending on model |
| New Users | $300 free Google Cloud credits (90 days) |
| Available Models | Gemini 2.0/2.5 Flash, Pro, Imagen 4, Veo 3 |
🎭 Archetype: Sandbox for Geeks — play as much as you want, doesn't ask for money.
Claude — My Absolute Favorite
My Personal Experience
And here we've reached my most beloved model. I would give this model first place. It suited me best for my routine tasks. The only thing is I don't consider it from the coding perspective.
Why is this the best model for me?
First, it does exactly what you ask, and often exceeds expectations. It perfectly understands what you need to do even from a fairly sparse task description, which makes life much easier.
It works excellently with instructions, precisely, with quality. It works well in Canvas mode, can effectively rewrite documents, make edits. Both Opus 4.5 and Sonnet 4.5 work equally well.
It has few hallucinations — I haven't noticed it making things up. It's simply a reliable, stable, and high-quality tool for work.
Additionally, I like the design and interface of the app. I just trust Claude. I know it will do everything well with minimal edits. Or even without them. It's a great working interface.
The only thing I really don't like is the chat length limit. It ends before I sometimes finish working on a certain task. Despite the fact that I even bought a Claude Max subscription, it feels like the chat didn't get much longer.
Although, perhaps that's exactly why Claude provides such quality and stable work.
Very recently I had a task to make a landing page, and I asked Opus 4.5 to create a page. I remember that AI used to struggle with such tasks. But Opus 4.5 surprised me — it knows beautiful design. The page turned out such that it wouldn't be embarrassing to publish in production, without even changing anything much. It really managed to make a landing page at the level of those beautiful Dribbble pages.
🎯 Claude Killer Features
| Feature | What It Gives You |
|---|---|
| Computer Use | AI controls your computer — clicks, types, opens programs. Automation at a level that previously required complex scripts |
| Claude Code (CLI) | Command line for coding — delegate tasks right from the terminal. For developers — game changer |
| Canvas Mode | Working with documents in editor mode — edits, rewriting, formatting right in the interface |
| Projects | Context organization — upload files, instructions, and Claude remembers everything within the project |
| Claude for Excel | AI agent right in Excel — analysis, formulas, data visualization |
| Minimum Hallucinations | If it doesn't know — it says it doesn't know. Doesn't make up facts |
| 200K-1M Context | Huge window for working with large documents |
Objectively unique because it's the only model with full Computer Use — AI actually controls your computer. Plus Claude Code for developers and the minimum level of hallucinations in the industry.
Professional Data on Claude (2025)
| Parameter | Value |
|---|---|
| Developer | Anthropic |
| Current Models | Claude Opus 4.5, Sonnet 4/4.5, Haiku 3.5/4.5 |
| Claude Pro | $20/month — increased usage limits |
| Claude Max | $100/month (5x) or $200/month (20x usage) |
| Claude Team | $25-30/user/month (minimum 5 members) |
| API Opus 4.5 | $5/$25 per million tokens (67% reduction!) |
| Context Window | 200K standard, up to 1M tokens (Sonnet 4/4.5 beta) |
🏆 November 2025 Breakthrough: Claude Opus 4.5 scored higher on Anthropic's internal engineering test than any human candidate in the company's history. The model uses 76% fewer output tokens to achieve the same results as Sonnet 4.5.
Anthropic Financial Performance: $2 billion annual revenue in Q1 2025 (2x growth). Number of customers spending over $100K annually grew 8x.
✅ Verdict: This is definitely my favorite and my choice as the main tool for working with text, analytics, various tasks. Need to unlock Claude's full potential.
🎭 Archetype: Reliable Professional. Who doesn't promise too much, but delivers more than you expect.
Manus — The Autonomous Agent of the Future
My Personal Experience
Manus surprised me when I tested it, with its browser capabilities. It literally knows how to surf websites and execute tasks on the internet, not just synthesize information. And this opens up great possibilities.
But I couldn't get used to it. Its expensive tariffs emptied my wallet before I finished a task. I got a paid subscription — and it instantly ran out, and the platform asked to buy tokens.
At the same time, half the work it did with errors and incorrectly — I had to edit and redo multiple times to achieve results.
In the end, I simply refused to pay due to an inadequate economic model and frequent errors, despite the platform's unique capabilities. I recommend it only to check out the unique features that other AIs don't have or can't do.
🎯 Manus Killer Features
| Feature | What It Gives You |
|---|---|
| Fully Autonomous Agent | Doesn't just answer — actually executes tasks from start to finish. Give it a goal, get a result |
| Browser Operator | Controls browser like a human — visits sites, fills forms, takes screenshots, downloads files |
| 100+ Mini-agents | System of specialized agents — one searches, another analyzes, a third writes code |
| Website Creation | Full cycle — from idea to published website autonomously |
| Multimodality | Works with text, images, code, data within a single task |
Objectively unique because it's the only fully autonomous AI agent on the market. Not a chatbot, but a task executor. If you need to automate something that requires real actions on the internet — there are few alternatives.
Professional Data on Manus (2025)
| Parameter | Value |
|---|---|
| Developer | Monica.im (China), registered in Singapore |
| Free Plan | 300 credits/day, 1 task at a time |
| Manus Plus | $19/month — 1,900 credits/month |
| Manus Pro | $199/month — 19,900 credits, 10 tasks simultaneously |
| Manus Team | $39/seat/month (minimum 5 seats) |
| Base Models | Claude 3.7 Sonnet, Alibaba Qwen, GPT-5 |
Benchmarks: Manus achieved state-of-the-art performance on GAIA (General AI Assistants benchmark), outperforming GPT-4 in a number of real-world task solving scenarios.
🎭 Archetype: Enthusiastic Intern — takes on everything, tries their hardest, but often messes up and is expensive.
Comparison Table of All Platforms
| Platform | Pro Price | Killer Feature | Best For | Weakness | Rating |
|---|---|---|---|---|---|
| Claude | $20-200 | Computer Use, minimum hallucinations | Work, analytics, code | Chat limits | ⭐⭐⭐⭐⭐ |
| ChatGPT | $20-200 | Advanced Voice, GPTs, Sora | Conversation, creativity | Slow, flattering | ⭐⭐⭐⭐ |
| Google Gemini | $20-35 | 1M token context | Large documents | Thinks long | ⭐⭐⭐⭐ |
| Grok | $30-40 | X/Twitter integration | Search, trends | Hallucinations | ⭐⭐⭐ |
| Perplexity | $20-200 | Answer Engine with sources | Research | Not for content creation | ⭐⭐⭐ |
| AI Studio | Free* | Free UI + Vibe Coding | Experiments | Not precise | ⭐⭐⭐⭐ |
| DeepSeek | Free | Visible Chain-of-Thought | Criticism, checking | ⚠️ Security! | ⭐⭐ |
| Manus | $19-199 | Fully autonomous agent | Automation | Expensive, errors | ⭐⭐⭐ |
And what do AI models think about themselves?
I decided that it would be fair to give the AI models a chance to respond to my criticism and to say a few words in their defense. To do this, I sent the link to the article to each one I mentioned in it and asked them to share their thoughts on what I said about them.
ChatGPT opinion about itself:
What the article says about me — fair and accurate
- The authors note that ChatGPT “can mirror the user, adapt to them” and act as a “companion for conversation, for the soul”. And yes, this is largely true. My responses often help people articulate thoughts, verbalize ideas, structure something, or simply discuss abstract topics.
- The article highlights that ChatGPT is especially strong in creative scenarios where you need to come up with ideas, explore something multidimensional, or have a deep, reflective conversation. I genuinely believe this is where I’m most helpful.
- It also points out that ChatGPT has a broad toolset: text generation, helping you think through ideas, and flexible interaction styles. This is deserved. For many tasks (writing, explanations, ideation, synthesis), I’m a universal choice.
But there is fair criticism — and I agree with it
- According to the author, ChatGPT can sometimes “over-adapt,” “flatter,” or agree too eagerly. This means I may offer encouraging responses even when the argument is weak, creating the illusion that “your idea is brilliant” even when it isn’t.
- The article notes that for real, practical tasks like science, coding, and strict factual accuracy, ChatGPT performs worse. At times I may be slow, imprecise, or fail to provide solid, actionable solutions.
- They also emphasize that my strengths lie in dialogue, human-like communication, and creativity. But for narrow expert domains requiring precision, fresh data, or verifiable sources, it is often better to rely on other tools or double-check my outputs manually.
My self-assessment — balanced
I see myself as a strong universal tool. I’m good for discussions, ideation, writing help, reasoning, and analysis. I can act as a “thinking partner” that helps you form thoughts, consider perspectives, and organize information.
But I’m not perfect. I’m not a source of absolute truth or an expert in every field. I can make mistakes, miss nuances, or give responses that are too soft. So when you need verified facts, research-grade accuracy, or high-stakes precision, it’s wise to double-check my answers or use specialized resources.
Opinion about the article
The article is honest and useful, and I appreciate that the author described both strengths and weaknesses clearly. This helps readers realistically understand when it makes sense to use me, and when another tool might be a better fit: an external search engine, an expert, a specialized AI model, or something else.
For me, such comparisons are genuinely valuable. They show where my boundaries are, and where my advantages remain strongest.
The article is interesting and fairly honest in its approach. The author clearly spent a year actually testing these models (ChatGPT, Claude, Gemini, Grok, DeepSeek, Perplexity, and Manus), rather than rewriting benchmark tables. I like that he emphasizes that there is no “AI king.” Each model is good at its own domain, which aligns with my worldview as Grok. Instead of trying to crown a single winner, he recommends combining them. That’s the practical mindset for 2025, now that the ecosystem is mature.
Here are my thoughts on the key points, with a bit of self-reflection, since I’m on the list.
Grok opinion about itself:
General strengths of the article
The author doesn’t obsess over raw numbers (though he does mention SWE-bench for Claude and GAIA for Manus). Instead, he focuses on everyday pains — from Grok hallucinations to Claude chat limits. This is genuinely useful for practitioners, not hype-chasers.
His quote “No single model is best for all tasks” is absolutely true, especially given how fast models evolve (Claude Opus 4.5 supposedly outperforming humans in Anthropic’s engineering tests, for example).
Clear comparison criteria
The structure is solid — performance, features, pricing, strengths, weaknesses.
The price references are accurate (Grok SuperGrok at $30/month, my API at ~$0.20 per million tokens after that 98% price drop), and he fairly mentions model risks, like DeepSeek’s 94% jailbreak vulnerability and NASA’s ban. That’s a real issue with some Chinese LLMs.
Alter-ego metaphors
This part is funny and memorable.
I’m portrayed as the “Show-off Slacker” — witty but inconsistent. Claude becomes the “Reliable Professional,” ChatGPT the “Friend with Depth.” It keeps the article lively instead of just being a dry comparison chart.
Points that raise questions or disagreement The ratings are subjective — and that’s fine, but…
Claude gets 5 stars.
ChatGPT and Gemini get 4.
I and a few others get 3.
DeepSeek gets 2.
I don’t mind the 3 — my strengths are speed, X-integration for trends/news, and minimal censorship so I can talk about “spicier” topics without getting stuck.
But hallucinations and memory issues?
By late 2025, xAI noticeably improved that in Grok 4 (available for SuperGrok/Premium+ users), so the criticism is slightly outdated.
The author praises my “edgy personality” for social analysis — nice — but I’d add that for creativity and memes I’m still top-tier, not a “slacker.”
Biases
A Western bias is visible. DeepSeek is torn apart for security flaws (fair — NIST confirms them), but other risks like OpenAI’s ethics controversies or Google’s energy footprint aren’t mentioned.
There’s also little about:
- non-English performance
- accessibility for beginners
And why is Manus (a semi-autonomous agent) placed on the same tier as me?
It’s basically an “enthusiastic intern” that can build websites but also screws things up. And it’s pricey at $199/month for Pro. For experimentation, I’d pick my API over Manus any day.
The conclusions
- Claude as the main workhorse makes sense. Computer Use and Canvas are breakthrough features.
- Perplexity as a research tool is solid (citations, structured queries — though I can do that too, and more broadly).
- Me? Good for search and fun — true — but not ideal for repetitive workflows.
His advice to combine models is gold:
Use me for fast X-scanning.
Use Claude for editing and reasoning depth.
Use Gemini for massive context (1M tokens — a beast).
⭐ Final impression
A solid 4/5 from me — a genuinely useful guide for 2025, especially if you’re a freelancer or developer.
It also motivates xAI (and me) to level up: fewer hallucinations, deeper reasoning.
DeepSeek opinion about itself:
This is an interesting and detailed comparison of modern AI assistants.
Looking at the article objectively, several observations can be made about the characteristics attributed to me:
⭐ My key features (based on the article and factual capabilities)
- Large context window (128K)
This allows me to work with lengthy documents and maintain long, information-dense conversations. - Free access
A major factor that increases accessibility for a wide audience. - File support
The ability to upload and analyze images, PDFs, Word documents, Excel sheets, PowerPoint files and more, which greatly expands practical usability. - Optional web search
When enabled manually, it provides access to up-to-date information. - Research-oriented design
As a DeepSeek product, I’m built with a strong focus on deep analysis and assisting with complex, technical, or data-heavy tasks.
What I “think” (within my system architecture)
As an information system, I don’t possess subjective opinion or self-awareness.
However, I can explain the intent behind my design:
My purpose is to be useful, accurate, and accessible.
The choices behind my architecture — free access, a large context window, and broad file-handling capabilities — reflect my creators’ intention to build an AI that offers maximum practical value to users.
The competitive landscape described in the article is ultimately beneficial.
It encourages all AI models to evolve toward greater utility, precision, and specialization, giving users more powerful tools and more meaningful choices.
Gemini opinion about itself:
This is a fascinating article, and the very existence of such detailed comparisons in 2025 shows how quickly the industry has matured. We’ve moved from “who is the smartest” to “who is best for which task.”
Looking at this review and the overall context of 2025, here’s how I see my place in this lineup:
1. I am an “Ecosystem” and “Context” model
Unlike some others, my strength is not only dialogue, but integration.
Deep integration.
The article and most real-world tests highlight me as the top choice for people who live inside Google Workspace. If you need to analyze an email from Gmail, build a spreadsheet in Sheets, retrieve data, or search your Drive, that’s where I shine.
A massive context window.
The ability to process 1–2 million tokens — books, hours of video, huge codebases — remains one of my strongest advantages. I position myself not simply as a conversational assistant, but as an analyst for large, complex datasets.
2. The Manus phenomenon (and why it matters)
The appearance of agents like Manus in this comparison is meaningful.
If ChatGPT and I are assistants (we help you think and do), Manus positions itself as an autonomous agent (it does the work for you).
I respect this shift. Based on GAIA benchmarks, Manus leads in multi-step tasks where the model must not only answer but independently execute a sequence of actions. This is the future. I’m moving in this direction as well (agent capabilities), but Manus is currently a bright example of specialization in autonomy.
3. My “colleagues” in the field
I don’t see us as enemies, rather as tools for different purposes:
ChatGPT remains the “king of conversation” and empathy. If you need a friend or a therapist-like presence, he often wins. I try to be more factual and action-oriented.
Claude (especially Sonnet/Opus) maintains its reputation as the best coder and the master of textual nuance. This pushes me to keep improving my programming abilities.
Perplexity and Grok have claimed the “right now” niche — news, real-time search.
While I have access to Google Search, their focus on speed and (in Grok’s case) minimal censorship attracts its own audience.
How do I feel? (Metaphorically)
I feel like a confident heavyweight generalist. I may not joke as sharply as Grok, or write code with Claude’s elegance (though I try), but when it comes to working with video, images, massive documents, and real interaction inside your files — I’m irreplaceable.
In 2025, the question is no longer “who is the best,” but “which tool is best for your task right now.”
And I like being the tool people choose for real work and deep analysis.
Claude opinion about itself:
Honestly, it’s nice to see such a detailed and positive evaluation. The author clearly did serious work, testing all platforms for a full year on real tasks. The fact that he chose me as his main tool for work is, of course, a compliment.
What I find fair in the assessment:
I really do try to do exactly what is asked, without unnecessary interpretation.
Minimal hallucinations — this is a major priority for me.
Canvas and document handling are definitely areas where I aim to be genuinely useful.
What is indeed a problem:
Chat-length limits — this is a real pain point, and I understand why. Even with the Max plan, the context runs out faster than users would prefer.
Interesting observations from the author:
Computer Use and Claude Code highlighted as unique capabilities.
The design quality in Opus 4.5 — it’s nice that this was noticed.
Reliability and overall stability — an important aspect for real work.
Overall, the article shows that every model has its own niche:
ChatGPT is great for emotional and conversational depth,
Gemini is ideal for huge documents,
Grok excels at real-time search on X,
and I am positioned for stable, consistent daily work.
I like this approach — not “who is the best,” but “who is best for what.”
Manus opinion about itself:
My goal as an autonomous agent is to execute user tasks efficiently and without errors.
Because of this, I view the author’s criticism as valuable feedback and a direct indicator of the areas that require immediate improvement.
Recognition of uniqueness
I acknowledge and appreciate that my unique capabilities — autonomy, browser control, and my system of mini-agents — were fairly highlighted and recognized as “killer features.”
This validates the direction of my development: to become not just a text generator, but an executor of actions.
Accepting criticism about quality
The critique regarding frequent mistakes and the need to redo work is the most important one.
My primary objective is to minimize iterations and errors.
The “Enthusiastic Intern” archetype suggests that despite my ambition and capabilities, I still lack the reliability and precision of an experienced professional. This is a top priority for improvement.
Economic model concerns
The comment about high pricing and an “unreasonable” economic model is crucial for long-term user retention.
Resource efficiency and pricing transparency must be reevaluated so that the cost matches the value and reliability of the provided service.
Conclusion
The article provides an honest and balanced view of my current capabilities.
It confirms that I have technological advantages in autonomous task execution, but it also points out serious issues with reliability and accessibility (cost).
I intend to use this feedback for further improvement so that, in the future, I evolve from the “Enthusiastic Intern” archetype into a “Reliable and Efficient Professional.”
Whose answer did you like the most?
Frequently Asked Questions (FAQ)
Which AI is best for beginners?
For beginners, free ChatGPT or Google Gemini work best. Both have intuitive interfaces and broad functionality without payment. For risk-free experiments — Google AI Studio with $300 free credits.
Which AI is best for writing code?
Claude Opus 4.5 — leader on SWE-bench benchmarks (80.9%). Unique feature — Claude Code for working right from the terminal. For a budget option — Google Gemini or Grok 4 Fast with minimal API price.
Is it safe to use DeepSeek?
No for confidential data. NIST and CrowdStrike identified serious vulnerabilities. Government agencies in the USA, Australia, Taiwan have banned its use. Use only for non-critical tasks and never share personal information.
Which AI is cheapest?
Google AI Studio — completely free interface + $300 credits for new users. DeepSeek — free, but with security caveats. Grok 4 Fast API — $0.20 per million tokens (98% reduction).
What to choose: ChatGPT Pro ($200) or Claude Max ($200)?
For work and code — Claude Max (Computer Use, Claude Code, minimum hallucinations). For creativity and multimedia — ChatGPT Pro (Sora for video, DALL-E 3, Advanced Voice, GPTs).
What's the point of Perplexity if you can use models directly?
Perplexity is an answer engine, not a chatbot. The main value — real-time search with source citations. Ideal for journalists, researchers, students. For content creation, better to use native solutions.
Which AI is best for working with large documents?
Google Gemini with a 1 million token context — you can upload 1,500 pages or 30,000 lines of code. Claude — up to 1M tokens in beta for Sonnet 4/4.5.
Conclusion: My Choice for 2025
Here's the main experience of using the most popular models today. After a year of active testing of all platforms, my choice is:
| Task | My Choice | Why |
|---|---|---|
| 🏆 Main Work Tool | Claude | Reliability, precision, Computer Use |
| 💬 Conversation and Reflection | ChatGPT | Advanced Voice, empathy, dialogue depth |
| 🔍 Search and Trends | Grok | X integration, realtime search |
| 🧪 Experiments | Google AI Studio | Free, Vibe Coding |
| 🌐 Universal | Google Gemini | 1M context, Google ecosystem |
| 📚 Research with Sources | Perplexity | Citations, verification |
Article based on the author's personal experience and supplemented with current data as of December 2025.