Enterprise-wide Voice AI Implementation
Led enterprise-wide Voice AI deployment at CloudTrucks, scaling AI agents to handle approximately 30,000 monthly calls across sales, customer support, operations, and safety functions. Owned end-to-end strategy, implementation, team leadership, and business outcomes, delivering $3.1M in annualized incremental revenue and cost savings.

Impact at a Glance
The Challenge
CloudTrucks faced a critical customer experience and operational scalability crisis. The existing AI agent was unhelpful and frustrating, generating excessive escalations to overwhelmed human representatives who delivered inconsistent quality. Callback times and SLA adherence fell far below targets, resulting in poor NPS scores, customer frustration, and ultimately churn.
Before This Project
The existing AI agent was unhelpful and frustrating. The AI functioned as a glorified IVR that took down some information and said that a human agent would respond no matter what the issue or challenge was. There was no acknowledgement of repeat callbacks, and customers had to repeat themselves after they were transferred to a live agent. Instead of solving problems, the voice AI added friction and speed bumps along the way.
Key Problems
- ❌Unsustainable Cost Scaling - Call volumes scaled proportionally with the customer base, but scaling the human contact center at the required pace was prohibitively expensive and operationally complex
- ❌Below-Target Customer Outcomes - The company needed multiple management layers to run the contact center, yet outcomes (NPS, CSAT, and customer performance scores) remained well below targets
- ❌No Evaluation Framework - No systematic evaluation framework existed for either human reps or AI agents
- ❌Limited Diagnostic Visibility - Leadership could see that customer experience was subpar, but couldn't pinpoint which agents, scenarios, or interactions needed improvement
What Made This Challenging
- ⚠️Scale & Complexity: Customers call in with complex interconnected issues across a wide array of potential problems or asks
- ⚠️Customer Clarity: Sometimes even customers didn't know what they were calling about and we had to not only triage but also clarify what teams needed to click into gear and what SOPs to follow
- ⚠️Quality & Trust: Customers were already used to an ineffective agent, so we needed to be that much better to rebuild trust in the AI to even have a chance at solving customer intent before escalating to live agents
The Approach
Role & Strategy
As program owner and team lead, I owned the complete strategy, evaluation frameworks, architectural decisions, prompting approval, outcome ownership, and C-suite reporting. I led a hybrid team of ~86 (6 humans managing 80+ AI agents) handling thousands of calls daily.
My core strategy was to use AI extensively to improve our AI, building a systematic, repeatable workflow that could scale from one scenario to dozens per week.
AI-Native Workflow Methodology
When we did this workflow manually it took two weeks to land a single scenario. To accelerate our workflow, I built a series of interconnected Gemini Gems that created a cascade workflow:
- Historical Call Pattern Analysis - Run 50-100 historical calls through a Gem to identify consistent customer questions, challenges, and requests
- SME Interviews - Conduct subject-matter expert interviews with transcription to identify the current business logic
- Existing Business Logic Definition - Combine SME interviews with SOP documentation (Notion docs, knowledge base, handbooks) to define business logic using a specialized Gem
- Gap Analysis & Iteration - Simulate historical calls against business logic to identify gaps, then iterate with SMEs to create comprehensive, complete logic
- Visual Approval Flow - Convert business logic into flowcharts and visual representations for final stakeholder approval
- Prompt Generation - Automatically convert approved business logic into appropriate prompting framework with expected tool calls
- Evaluation Framework - Build global and scenario-specific LLM-as-a-judge evaluations and deploy to production
- Sandbox Testing - Run live simulations based on historical calls with human oversight in sandbox environment
- Production Deployment and Monitoring - Deploy to production and monitor evaluation metrics on live calls with automated improvement recommendations for poor performing calls
- Continuous Iteration - Rapidly iterate over 1-2 weeks based on the live call feedback loop to optimize each scenario until it reaches a steady state of consistent performance
After our team adopted this AI-native workflow, we went from 1 scenario in 2 weeks to 20 scenarios in 1 week. We systematically tackled the 60+ distinct customer scenarios, each of which would branch in multiple directions and daisy-chain nonlinearly in chaotic workflows.
Key Phases
- First scenario deployed manually (2 weeks, 3 people full-time) and evaluation framework established
- AI workflow methodology proven (5 scenarios in Week 3)
- Scaled deployment (9 scenarios Week 4, 15 scenarios Week 5)
- Steady state achieved (10-20 scenarios per week, 15-20× faster than manual approach)
- Full vertical rollouts (Drivers + Broker verticals, all sales scenarios revamped)
- Reporting architecture and agentic tool library established
Technology Stack
- AI Development Tools: Gemini (predominant), Gemini Gems for workflow automation, Claude Code for engineering team's agentic tool development
- Voice AI Stack: Whisper (speech-to-text), ElevenLabs (text-to-speech), Gemini Flash (logic & supervisor agents)
- Voice AI Vendor: Operator
- Integrations: Salesforce, Zendesk, Slack, internal console backend
- Data Infrastructure: Airflow DAG to BigQuery data warehouse for call and outcome data
Results & Impact
Financial Impact
~$2.2M in annualized incremental revenue + cost savings.
- Sales: 20× faster speed to lead, 34% increase in SQL conversion rate
- Operations: 18× growth in team efficiency, 60% reduction in human headcount while improving outcomes, 25% improvement in SLA adherence
- Quality: 14% improvement in on-call resolution (from 0%), better data integrity across systems
Qualitative Impact
This project marked the first time anyone in the organization led an AI-native team using standardized AI workflows to accelerate and improve outcomes at scale. Leadership gained a new understanding of what's possible with a relatively small team when augmented by AI-native methodologies.
The explicit evaluation framework for all calls, whether handled by AI or human agents, was completely novel for the company. It created a foundation for coaching, quality assurance, and continuous improvement that didn't exist before.
Most significantly, the project demonstrated the transformational scope, depth, and quality possible when orienting a team to AI-native workflows. The speed and impact achieved by the team fundamentally changed how company leadership thinks about doing business and deploying AI at scale.
Unexpected Wins
- 💡SOP Gap Discovery: Simulation and analysis work revealed areas where standard operating procedures were misaligned, missing, or failed to address common customer requests
- 💡Human Agent Evaluation Framework: Creating evaluations for AI agents prompted development of parallel evaluation frameworks for human representatives, enabling consistent coaching and performance assessment
- 💡Improved Data Integrity: AI agent consistency in distilling conversations and porting key information to systems improved customer intake data quality
- 💡Enhanced Human Agent Performance: When AI agents handled initial triage and information gathering, human representatives became faster and more effective during escalations, creating a powerful hybrid model where AI and humans complemented each other