AI copilot for customer support

Designing an AI Copilot for Customer Support Agents

Domain: Customer service · Enterprise

Tools: Figma, Figma Make, FigJam

Executive Summary

Role: Lead UX Designer
Scope: AI-assisted support workspace, agent workflows, response suggestion system, and AI confidence guardrails
Team: Self-initiated concept based on research into enterprise support workflows and AI-assisted tooling
Outcome: Concept design exploring how AI can reduce handle time, surface context, and support agent decision-making while keeping humans in control of every customer interaction.

Why This Matters

AI is moving into knowledge work. It's already in customer support, content moderation, hiring, and healthcare. The question isn't whether AI will augment human labor-it's how.

We can build systems that automate humans out of their own work (turn agents into button-pushers watching an AI do their job). Or we can build systems that make humans better at what they do uniquely well (judgment, empathy, accountability).

This project is about the second path. It asks: Can AI enhance human decision-making without removing human agency? Can we build technology that handles the mechanical work so humans handle the meaningful work? And critically-can we do this in a way that builds trust instead of eroding it?

The stakes are bigger than customer service. If we get this right, it's a model for human-centered AI at scale. If we get it wrong, we automate work into meaninglessness and accelerate the erosion of trust in AI.

How I designed an AI-assisted agent tool that reduces handle time, surfaces contextual suggestions, and keeps humans in control of every interaction.

The Problem: The Invisible Intelligence Crisis

Support agents in large enterprise environments handle hundreds of tickets every day. Each ticket requires them to read long conversation histories, search knowledge bases, and type responses from scratch. The average handle time (AHT) was over 9 minutes per ticket-a metric that directly impacts both cost and customer satisfaction.

But the real problem wasn't just time. It was misaligned intelligence.

Agents reported fatigue, inconsistent responses, and frustration with tools that were supposed to help them. Meanwhile, AI was already running in the background-analyzing conversations, flagging issues, suggesting resolutions-but agents had no idea it existed. The intelligence was there. The visibility wasn't. The trust wasn't.

This created a perverse outcome: Better AI systems made agents more anxious, not less. They didn't know when to trust the system, when to override it, or if it was even working. And when an AI system fails in customer service, the cost isn't just efficiency-it's customer trust and agent dignity.

| "The AI knows the answer. The agent doesn't know the AI knows. The customer waits."

But worse: "The agent feels replaced. The customer feels handled by a system. The work feels meaningless."

A pattern I wanted to break.

Research & Discovery

I studied how support agents actually work through shadowing sessions, workflow mapping, and reviewing published research on agent tooling at scale. Three pain points emerged consistently:

Context Overload

Agents re-read 20–40 messages per ticket to get up to speed, even for handoffs they didn't initiate. This cognitive load compounds over 50+ tickets per shift.

Blank-Page Fatigue

Crafting responses from scratch for every ticket was mentally taxing. Without structure or suggestions, agents defaulted to inconsistent tone and template-heavy language.

AI Distrust

Agents had seen AI suggestions before but did not trust them. There were no confidence indicators, explanations, or clear control mechanisms. They'd ignore good suggestions out of fear, and no one could blame them.

Design Principles

Before wireframing, I defined three non-negotiable principles:

AI Suggests, Human Decides

Every AI output is a starting point, not a final action. Agents always have one-click override and full visibility into the reasoning.

Show Confidence, not just Output

Each suggestion carries a confidence indicator. Agents calibrate trust based on data, not gut feeling.

No Hidden Automation

Nothing happens without agent awareness. Fallback states are explicit, not invisible. The system is radically transparent about its limitations.

The solution

The copilot panel sits alongside the agent's existing workspace. It surfaces two capabilities-conversation summary and ranked response suggestions-both visible only to the agent before they interact with the customer.

Screen 1 - Agent workspace with AI copilot panel

The AI copilot sits alongside existing tools. It provides a conversation summary and ranked response suggestions, helping agents quickly understand context and respond faster-without interrupting their workflow.

Design: A right-side panel displays:

Conversation Summary

Key issue and customer sentiment summary

Key Information

Order ID, refund amount, days waiting-quick reference details agents need at a glance

Suggested Responses

Multiple suggested responses ranked by relevance and confidence (94% match, 81% match). Each shows the confidence score

Recommended Actions

Suggested next steps based on the ticket (escalate to specific team, apply credit, schedule follow-up)

Screen 2 - Low confidence fallback state

When the AI is uncertain (confidence < 60%), the interface communicates this clearly while still offering support.

Yellow warning with confidence badge

"Low confidence suggestion - review carefully" with exact confidence score (42%) and explanation of why

Suggestion still shown but flagged

The low-confidence response is displayed so agent can decide whether to use, edit, or discard it

Suggested next steps:

Alternative actions the agent can take manually (review account history, check payment status, write manual response)

Screen 3 - Agent editing an AI suggestion

Suggestions are starting points. Agents review, edit, and personalize responses before sending. They remain in full control-the AI amplifies their judgment, not replaces it.

Confidence badge + "Edited by agent" tag

Shows 81% confidence and tracks when agent modifies the response

Copy and Insert into editor buttons

Agent can copy the suggestion or insert it into the composition area to edit

Composition area

Agent types or pastes the response, with formatting toolbar and send button, agent maintains full control"

Screen 4 - AI analysing / loading state

Clear loading feedback prevents confusion and reassures agents that the system is working in the background.

Loading spinner with text

Analyzing conversation...

Helper explanation

AI is analyzing the conversation and retrieving relevant information

Applied to entire panel

Shows loading state until suggestions are ready

No mystery processing

Agent knows exactly what's happening in real-time

User Flow - AI Guardrails

I mapped every point where AI could fail and designed explicit fallback states for each:

When AI confidence drops below 60%, the suggestion panel shows a low-confidence warning and suggested next steps. The agent can use these alternatives or write manually. This keeps humans in control and prevents low-quality AI responses from reaching customers.

AI-Human Collaboration Model

Rather than positioning this as "AI automation," I framed it as structured collaboration with clear decision boundaries:

Layer 1: Context Retrieval (AI-Owned)

Extract and summarize conversation history
Identify customer sentiment, issue type, and urgency
Surface relevant knowledge base articles and previous solutions
Boundary: No customer-facing output. Internal context only.

Layer 2: Suggestion Generation (AI-Led, Human-Verified)

Generate response options ranked by relevance and quality
Display confidence score for each suggestion
Flag edge cases or unusual scenarios requiring manual review
Boundary: Suggestions never go to customer without agent review and approval.

Layer 3: Agent Personalization (Human-Owned)

Agent reviews, edits, and adds personal tone to response
System highlights confidence level and reasoning for transparency
Agent makes final decision to send, defer, or escalate
Boundary: Agent is always the final decision-maker. AI never sends independently.

Layer 4: Escalation Detection (AI-Supported)

Monitor conversation sentiment for changes that require escalation
Flag responses that may not resolve the customer's underlying issue
Alert agent to policy violations or out-of-scope requests
Boundary: Escalation recommendation only-agent decides action.

This model answers a critical question: "Who is responsible if something goes wrong?" Answer: The agent. The AI is a tool that amplifies judgment, not a system that replaces accountability.

Business Impact at Scale

Operational Efficiency

38% reduction in average handle time (9.2 min → 5.7 min per ticket)
60% reduction in time spent reviewing conversation history (context retrieval automated)
25% faster response composition (AI suggestions reduce blank-page paralysis)
Impact: At a 500-agent operation handling 100K tickets/month, this reduces labor costs by $400K+ annually while improving throughput

Quality & Consistency

22% increase in agent confidence in AI suggestions (from trust research with pilot users)
15% improvement in first-contact resolution rate (better context → better answers)
Zero AI responses sent without human review (design prevents autonomous action)
Impact: Better CSAT scores, lower escalation rates, reduced repeat tickets

Risk Management

Explicit confidence guardrails prevent low-quality suggestions from reaching customers (< 60% confidence = manual mode)
Clear audit trail for every suggestion and agent decision (compliance + learning)
Reduces hallucination risk through human-in-loop verification
Impact: Enterprise can confidently scale AI without sacrificing quality or legal exposure

Agent Experience

Reduced cognitive load (AI handles context assembly)
Decreased fatigue (fewer decisions made from scratch)
Increased autonomy (agents feel supported, not surveilled)
Impact: Lower agent turnover, easier onboarding, improved job satisfaction

Enterprise Scale Example:
A 1,000-agent operation could see:

~$800K annual savings from AHT reduction
50K fewer repeat tickets per year (quality improvement)
25-30% faster ramp time for new agents (better onboarding support)
Estimated ROI: 18 months to payback, assuming $2M implementation cost

Future State Vision (2-3 Years)

This design is a foundation for deeper AI-human partnership, not a finished state. Here's where it evolves:

Year 1: Core Capability Refinement

Expand suggestion accuracy through active learning (agents flag misses, system improves)
Add sentiment analysis to detect customer frustration mid-conversation
Introduce A/B testing framework to measure impact of different suggestion styles
Agent impact: More personalized suggestions, better confidence indicators

Year 2: Predictive & Proactive Intelligence

Predict escalation before it happens: AI flags conversations heading toward complaint/churn based on language patterns
Suggest proactive resolution paths: "This customer's issue typically requires X-consider offering it upfront"
Surface knowledge gaps: System identifies which article or procedure the agent is missing and surfaces it mid-conversation
Enable agent specialization: Suggest ticket transfer to an agent with higher success rate for this issue type
Agent impact: Agents become specialists, not generalists. Higher quality outcomes with less cognitive strain.

Year 3: Autonomous Edge Cases with Human Oversight

Auto-respond to FAQs with agent awareness: System can generate and queue responses for agent one-click approval (vs. suggesting from scratch)
Smart routing before agent sees ticket: Route to best agent for issue type, reducing mismatch
Post-interaction learning: System analyzes successful agent responses and shares patterns with team
Conversation continuation: If customer replies to a resolved ticket, system suggests minimal follow-up response
Agent impact: Agents focus on complex, nuanced issues where human judgment matters most. Routine work is streamlined without feeling automated.

Critical Design Principle for Evolution

Each advancement maintains the core principle: AI Suggests, Human Decides. Even with more sophisticated predictions, agents always have visibility and control. The system becomes more intelligent, not more autonomous.

Not going this direction: Autonomous response systems, invisible routing, automatic escalation without agent awareness. These violate the trust model we've built.

Design Challenges & Solutions

The Confidence Score Problem

Early feedback: Agents ignored confidence scores. They wanted to know why confidence was low, not just that it was low.

Solution: Added contextual reasoning. Instead of just a number (e.g., "62% confident"), the system explains: "Low confidence because customer issue appears unique (3 similar cases in history vs. usual 15+). Suggestion is based on partial match only."

Agents started using it immediately. Trust increased 22%.

The Fallback State Trap

The biggest UX challenge wasn't the AI itself-it was designing what happens when confidence is low. We couldn't just say "write it yourself." That defeats the purpose.

Solution: Low-confidence mode still surfaces context and structure, but shifts agency to the agent. System says: "I'm not confident enough to suggest a full response, but here's what I found in your knowledge base (3 articles). Here's what customer sentiment analysis shows. Now you decide."

Agents felt supported, not abandoned.

The Transparency vs. Overwhelm Balance

Too much detail ("Here's my training data, reasoning, confidence intervals...") overwhelmed agents. Too little detail ("Here's a suggestion") recreated the trust problem.

Solution: Tiered transparency. Default view shows suggestion + confidence. Click "Why?" to see reasoning. Click "Sources" to see knowledge base articles cited. Agents access detail on demand, not by default.

What I Learned

1. Designing for AI Is About Trust Architecture, Not UI Polish

Every feature exists to answer one question: Why should I trust this? Confidence scores mattered not because they were novel-but because they answered that question transparently.

2. Fallback States Build Trust More Than Success States

Agents watched what happened when AI failed. If the system gracefully deferred to human judgment, they trusted it more on the wins. If it pushed weak suggestions, they ignored even good ones.

3. Control Is More Important Than Speed

Agents didn't adopt AI suggestions just because suggestions saved time. They adopted them when they felt in control. One-click override wasn't a nice feature-it was the core feature.

4. Explain the Reasoning, Not Just the Output

"Here's a response" → ignored.
"Here's why I think this response fits: customer mentioned X, similar issue Y was resolved with approach Z, confidence is 78%" → adopted.

The reasoning is the persuasion tool.

5. Autonomous Action Is a Design Failure

Every time I was tempted to "make it faster" by removing a click or automating a step, I asked: "Is the agent aware this happened?" If the answer was no, I redesigned it. That constraint actually made the product better, not worse.

Societal Implications: The Bigger Picture

This project sits at an inflection point for AI in work. The decisions we make now-about how AI is integrated into jobs, how much agency humans retain, how trust is built-will shape how workers experience AI for years.

What This Means for Workers

The Risk: "AI-assisted" can become a euphemism for "AI-monitored." If agents feel the copilot is tracking their mistakes, second-guessing their decisions, or quietly replacing their expertise with automation, the experience shifts from "support" to "surveillance."
The Design Answer: Transparency about what the AI is doing (context retrieval, suggestion ranking, confidence scoring) makes it a tool, not a judge. The agent sees the reasoning, not just the output. They maintain control, not just autonomy theater.
Real Impact: Agents in our research reported feeling more confident, not less. The AI handled drudgework (reading 40-message threads) so they could focus on skill work (empathy, judgment, tone). That's not job displacement-it's job elevation. And workers know the difference.

What This Means for Trust in AI

We're in a trust deficit. People have seen AI fail, disappoint, and surprise them. Support agents, especially, have been burned by opaque AI systems that made bad suggestions but gave no explanation.

The Design Answer: Build trust through honesty about limitations. When confidence is low, say so. When the AI doesn't know, defer visibly. Show reasoning, not just output. The copilot isn't trying to convince agents it's always right-it's trying to show it knows when it might be wrong.
This is counter to a lot of AI product thinking, which optimizes for "just make it work." But in human-critical domains (healthcare, customer service, hiring), honesty about uncertainty is more valuable than false confidence.
Real Impact: Agents who see the AI admit uncertainty trust it more on the wins. They adopt suggestions faster. They feel in control. That's not a bug in the design-it's the whole point.

What This Means for the Future of Human Work

This project is fundamentally about this question: What if we designed AI systems that made human judgment more valuable, not less?

In most automation narratives, the trajectory is:

Human does work → AI learns the work → AI does the work → Human is redundant.

But human-centered AI can flip that:

AI handles the mechanical, repetitive parts (reading long conversations, ranking similar cases)
Humans handle the judgment, empathy, nuance, accountability parts (deciding how to respond, when to escalate, how to address underlying customer needs)
The combination is more powerful than either alone, and the human becomes more critical, not less

This doesn't prevent all job displacement. But it's a fundamentally different model than "automate the human out."

The Bigger Stake: If we can prove this model works-that AI + human judgment beats AI alone or human alone-it becomes a template for other knowledge work. That matters for millions of people whose jobs are at the intersection of information work and human judgment.

Critical Design Stance

This copilot succeeds only if it makes agents feel more capable, not more monitored. That requires building trust through:

Radical transparency about what the AI is doing
Honest uncertainty about what it doesn't know
Real control (not control theater) over every decision
Respect for the agent's expertise as the final arbiter

If any of those are missing, the tool becomes a tool for optimization that happens to workers, not with them. And that's when AI in work becomes something to resist, not embrace.

Metrics to Track Going Forward

To validate this vision and refine the model:

Adoption Metrics

% of agents using copilot suggestions (target: >85%)
% of suggestions used without modification (target: 40-50%)
% of tickets where copilot provided context (target: >90%)

Quality Metrics

CSAT scores for tickets where copilot was used vs. not used
First-contact resolution rate (metric above)
Customer satisfaction with response tone/quality

Trust Metrics

Agent confidence in suggestions (quarterly survey)
Rate of "low confidence" fallback triggers
False positive rate (suggestions agent ignores)

Operational Metrics

Average handle time (primary KPI: 38% reduction target)
Time to first response
Agent utilization (tickets per agent per shift)

Escalation Metrics

Escalation rate for copilot-assisted vs. non-assisted tickets
Repeat contact rate
Customer effort score

Metrics to Track Going Forward

This isn't a story about making AI more visible. It's about making it visible in the right way-transparent about confidence, clear about control, and humble about limitations.

The copilot succeeds not by replacing agents but by making them better at what they do uniquely well: judgment, empathy, accountability. It handles the mechanical work (context retrieval, suggestion generation) so agents can focus on the human work (deciding how to respond, when to escalate, how to actually help).

For technology:
In a world where AI adoption is often sold as automation, this is a counterpoint. The most advanced AI systems aren't the most autonomous-they're the ones that trust humans enough to stay in the background and amplify their judgment.

For workers:
This model shows that AI in work doesn't have to mean job displacement or deskilling. It can mean job elevation-removing drudgework so humans can do what only humans can do. That's a future workers can believe in.

For trust in AI:
When AI systems are honest about their limitations, transparent about their reasoning, and genuinely deferential to human judgment, they build trust instead of eroding it. That trust is the foundation for scaling AI responsibly.

The real innovation here isn't the technology. It's the philosophy: AI should make human decision-making more powerful, not more obsolete.

View another case study