And why "workflow redesign" isn't consultant-speak, it's the scientific answer
Just days ago, Google Research published a paper that should fundamentally change how businesses think about AI adoption.
The paper, "Agentic AI and the Next Intelligence Explosion" by James Evans, Benjamin Bratton, and Blaise Agüera y Arcas, confirms what we've been seeing in the field: AI doesn't work the way most companies are trying to use it.
Here's what they found, and why it matters for your business.
What Google Discovered Inside Reasoning Models
Google's research team studied frontier reasoning models like DeepSeek-R1 and QwQ-32B. They discovered something remarkable: these models don't improve by "thinking longer." They improve by simulating internal debates.
Inside the model's chain of thought reasoning, distinct cognitive perspectives argue, question, verify, and reconcile with each other. The researchers call this a "society of thought."
The kicker: No one trained the models to do this. When reinforcement learning rewarded models purely for accuracy, they spontaneously developed multi-agent conversational reasoning.
The models rediscovered through optimization alone what centuries of philosophy have suggested: robust reasoning is fundamentally social.
Why This Explains the 95% Failure Rate
MIT found that 95% of enterprise AI pilots fail to deliver measurable profit impact. McKinsey found two-thirds of companies stuck in "pilot mode."
The Google paper reveals why.
Most companies treat AI as a monolithic tool:
- "Let's add ChatGPT to our workflow"
- "Let's build a proprietary AI model"
- "Let's hire a few AI specialists"
This is what the Google researchers call the "singularity" mindset: the assumption that AI value comes from a single, powerful system.
But intelligence has never worked that way.
Human language created what researchers call the "cultural ratchet": knowledge accumulating across generations. Writing, law, and bureaucracy externalized social intelligence into institutions.
AI is the next step in that progression. But most businesses are trying to use it like a calculator when it actually functions like a team.
The Task-Level Mismatch, Explained by Science
In our previous post, we described the "task-level mismatch".
The Google paper explains why this happens.
Their finding: Intelligence emerges from interaction between distributed perspectives. Not from a single mind (or a single AI tool) doing everything.
When you bolt AI onto an existing workflow:
- AI handles fragments (email drafts, data entry)
- Humans still do most tasks (judgment, relationship building, coordination)
- The collaboration is poorly designed (no clear handoffs, no defined roles)
Result: You get the "society of thought" effect, but chaotic and uncoordinated. Like a team meeting where everyone talks over each other and nobody takes notes.
What "Human-AI Centaurs" Actually Means
The Google researchers introduce a term that's going to reshape how we talk about AI in business: human-AI centaurs.
Not one human + one AI assistant.
Many configurations:
- One human directing multiple AI agents (each specialized for different tasks)
- One AI serving multiple humans (coordinating team workflows)
- Many humans + many AIs in shifting ensembles (what we call "task-level workflow redesign")
The paper even name-drops OpenClaw (the platform we use for multi-agent orchestration) as an "embryonic glimpse" of this future.
Translation for business leaders:
Your sales team isn't supposed to use ChatGPT. Your sales team is supposed to become a human-AI ensemble where:
- AI handles prospect research, initial outreach, follow-up sequences, meeting scheduling, proposal generation
- Humans handle discovery calls, objection handling, relationship building, deal structuring, strategic account planning
That's not "automation." It's orchestration.
Why Workflow Redesign Is the Scientific Answer
The Google paper argues that scalable AI requires institutional alignment, not just better tools.
What does "institutional alignment" mean in practice?
It means designing your workflows like institutions:
- Roles and responsibilities (What does the AI do? What do humans do?)
- Handoff protocols (When does AI pass work to humans? When do humans delegate to AI?)
- Feedback loops (How do we measure what's working? How do we improve?)
This is exactly what our Task-Level Workflow Redesign Framework delivers:
Step 1: Task Audit
Map every task your team does (not job descriptions, actual daily work).
Step 2: AI Capability Mapping
Categorize which tasks AI can handle today, which require workflow changes, and which still need human judgment.
Step 3: Workflow Redesign
Rebuild processes so AI handles the full executable task set and humans focus on judgment, trust-building, and strategy.
Step 4: Measure & Iterate
Track revenue per employee, not time saved. Iterate based on what delivers P&L impact.
The Google research validates every step of this framework. It's not consultant-speak. It's applied social science.
What This Means for the Next 12 Months
The Google paper ends with a warning: The intelligence explosion is already here.
Not in a single godlike AI, but in:
- The "society of thought" inside every reasoning model
- The centaur workflows reshaping every knowledge profession
- The recursive agent ecosystems beginning to fork and collaborate at scale
Companies that understand this in 2026 will have a 2-3 year advantage over competitors still treating AI as a productivity tool.
Here's what separates the winners from the 95%:
The 95% ask: "How can AI make this task faster?"
The 5% ask: "How do we rebuild this process so AI handles the full task set?"
The 95% buy: More AI tools
The 5% design: Human-AI workflows
The 95% hire: AI specialists
The 5% train: Existing teams to work in centaur configurations
The 95% measure: Time saved
The 5% measure: Revenue per employee
The Bottom Line
Google Research just published 6 pages of academic validation for what we've been teaching Central New York businesses:
AI isn't a tool you add to your workflow. It's a team member you design workflows around.
The 95% of AI pilots that fail? They're failing because they're treating a collaborative intelligence like a productivity app.
The 5% that succeed? They're redesigning workflows to enable human-AI ensembles where both humans and AI do what they're each uniquely good at.