jchowlabs
Interactive Voice Assistant

Interactive Voice Assistant

If you’ve spent some time on this site, you may have noticed the voice assistant in the bottom-right corner. It’s not a chatbot. It’s a conversational voice concierge that can help you explore the site, find articles, navigate between pages, and get in touch with me, all through natural speech.

But this article isn’t just about how that voice assistant was built. It’s about what voice assistants like this one can actually do, and it’s designed to let you try those capabilities yourself as you read.

Voice agents are showing up everywhere: customer support, healthcare, fast food, travel. What stands out is how natural these interactions feel. They handle interruptions, maintain context, and respond in ways that increasingly feel human. But the real value isn’t just conversation. It’s awareness.

A well-built voice assistant doesn’t just talk to you. It knows where you are, what’s on the screen, and what you’ve done. That turns a voice interface from a novelty into something genuinely useful.

Let’s explore three capabilities that make this possible. To follow along, activate the voice assistant by clicking the pill in the bottom-right corner of the screen.

Page Awareness

The most basic form of context is knowing what page the user is on. When you ask this site’s voice assistant a question, it checks your current location before responding. It knows whether you’re on the home page, reading an article, or working through an interactive lab.

This matters because it changes how the assistant responds. On the home page, it gives you an overview. On the passkeys demo, it tracks your progress and offers step-by-step guidance. Here, it knows you’re reading about voice capabilities.

Try it now:

1Enable Voice Assistant
2Ask what page your on.

The assistant will tell you exactly where you are. Simple, but foundational. Every other capability builds on this awareness.

Configuration Awareness

Now let’s go further. Imagine a dashboard with settings, toggles, and controls. A voice assistant that understands the page can also understand the state of interactive elements: whether a setting is enabled or disabled, what options are selected, what’s changed since the last time it checked.

This is what makes voice useful in complex interfaces. Think of a security console, an admin panel, or a deployment workflow. Instead of hunting through menus, a user can simply ask and get an immediate, accurate answer.

Try it now:

NotificationsDisabled
1Ask position of toggle.
2Flip the toggle, then ask again
3Ask to switch toggle.

The voice assistant reads the live state of that toggle every time you ask, and it can change it too. Ask it to turn the toggle on or off. The assistant sees the current state, decides whether an action is needed, and confirms the result.

In real-world applications, this pattern extends to any interactive element: dropdown selections, radio buttons, checkbox groups, slider positions. The assistant doesn’t need to watch every change in real time. It checks the current state when asked, which is exactly how a human helper would work — they look at the screen when you ask them a question.

Free-Text Awareness

Toggles and dropdowns have a fixed set of states. Text input is different. When a user types into a form field, the value is arbitrary. A voice assistant that can read and relay free-text input demonstrates a deeper level of screen awareness.

This is useful for guided workflows where a user fills out forms and wants confirmation that they entered the right thing, or for accessibility scenarios where a user wants to hear back what they typed.

Try it now:

1Type your name above.
2Ask what you typed.

The assistant reads back exactly what you typed. Change it, ask again, and it reflects the update. The input is intentionally simple here, a name field, but the pattern works for any text-based form element: search queries, configuration values, addresses, policy names.

Why This Matters

These three capabilities — page awareness, configuration awareness, and text-input awareness — are the building blocks for voice-assisted interfaces that are actually useful in professional settings.

Consider a security operations center where an analyst sees an alert and asks the voice assistant what triggered it. Or an identity management console where an admin asks whether MFA enforcement is enabled. Or a deployment workflow where someone is setting up a new service and wants guidance on what to fill in next.

In each case, the assistant doesn’t just chat. It sees the screen, understands the current state, and provides relevant answers. That’s the difference between a voice interface and a voice assistant.

How It Was Built

The voice assistant on this site is powered by ElevenLabs Conversational AI. The conversational logic, system prompt, voice settings, and tool definitions all live on the ElevenLabs platform. The client handles audio input and output and executes tool calls.

The key mechanism behind screen awareness is a client tool called get_current_page. Each time the assistant needs context, it calls this tool, which reads the current page URL and any interactive state exposed by the page’s components. The response gives the assistant a structured snapshot of what the user is looking at.

Interactive elements on the page, like the toggle and text field above, sync their state to a shared JavaScript object. The tool reads from that object when called. No server is involved. No data is stored. The state lives in-browser for the duration of the session.

Tuning the experience involved balancing conversational fluidity with accuracy. The assistant needs to check state before answering, not guess from a previous call. It needs to describe what it sees naturally, not read raw data. And it needs to know when the question is about the page and when it’s a general query that doesn’t require screen context.

Getting in Touch

If you’re interested in adding voice capabilities to your own website or product, I’d love to hear from you. Try it right now:

1Ask to get in touch with Jason
2Say goodbye to end session.

Whether you’re exploring voice for customer support, internal tools, or public-facing websites, the patterns in this article are a starting point. The technology is ready. The interesting work is in designing how it fits into the experience you want to create.

And if you’ve made it this far, say “open sesame” to the voice assistant and see what happens.