Sesame's quest to cross the "street" (uncanny valley) of conversational voice

I really apologize for the poooor attempt at word play - but I had to find a way to get y’alls attention, right?

I recently came across Sesame, an AI startup founded by Oculus co-founder Brendan Iribe, along with Ankit Kumar and Ryan Brown, is making significant strides with its voice assistants, ‘Maya & Miles.’ Unlike typical AI assistants, both assistants provide a more engaging, human-like interaction. Users who interacted with Maya and Miles have noted their ability to role-play in user-created scenarios, enhancing the naturalness of conversations.

At the heart of Sesame’s innovation is the concept of “voice presence,” which aims to make spoken interactions with AI feel real, understood, and valued. This involves several key components:​

  • Emotional Intelligence: The ability to read and respond to emotional contexts, allowing the assistant to detect nuances in a user’s voice and tailor its responses accordingly.​
  • Conversational Dynamics: Incorporating natural timing, pauses, interruptions, and emphasis to mimic human speech patterns, thereby enhancing the fluidity of interactions.​
  • Contextual Awareness: Adjusting tone and style to match the situation, ensuring that the assistant’s responses are appropriate and relevant.​
  • Consistent Personality: Maintaining a coherent, reliable, and appropriate presence throughout interactions, fostering user trust and comfort.​Sesame

To achieve these objectives, Sesame has developed the Conversational Speech Model (CSM), an end-to-end multimodal learning framework that utilizes transformers. Unlike traditional text-to-speech models that generate spoken output directly from text but lack contextual awareness, CSM leverages the history of the conversation to produce more natural and coherent speech. This approach addresses the “one-to-many” problem in speech generation, where multiple valid ways exist to express a sentence, by considering additional context such as tone, rhythm, and conversational history.

In addition to software advancements, Sesame is developing AI glasses designed for all-day wear, providing high-quality audio and seamless access to the voice assistant. While these are just in their prototype stages, these glasses aim to create an immersive experience where the AI companion can observe the world alongside the user, offering timely and context-aware assistance. ​

The impact of Sesame’s technology has been profound. Users have reported experiences ranging from fascination to unease due to the assistant’s lifelike conversational abilities. For example, I myself had an interaction with the AI, and it was so realistic it left me unsettled, underscoring the assistant’s advanced conversational skills. On the other hand, see below Sean Hollister’s reaction from his personal experience with Maya:

“But speaking to “Maya,” one of two voices from a new startup headed by the man who built Oculus VR and sold it to Facebook, is the first time I’ve been left wanting more. Like I could just talk to it, or at least play a genuinely fun game of testing its limits, like I did with Bing before Microsoft decided to tame down its unhinged persona.” from his article on The Verge

By prioritizing emotional intelligence, contextual awareness, and natural conversational dynamics, Sesame is not merely setting new standards but also redefining communication itself. As these advancements continue, we will likely see a transformative era where AI enriches human experiences in ways we have yet to fully imagine.

Sesame’s AI assistants, Maya and Miles, signal a shift toward AI companionship that feels genuinely interactive and emotionally responsive. The emphasis on voice presence and conversational fluidity makes them stand out from typical voice assistants.

While this is an exciting leap for digital assistants, it also raises questions about human attachment to AI. If these assistants can mimic human interactions so well, will people begin to emotionally depend on them in ways that blur the lines between real and artificial relationships? I’m talking about potentially creating/ushering a generation that will slowly create anti-social children

Maya and Miles’ ability to convey emotion, maintain personality, and adapt their tone based on context is a huge leap from traditional robotic voice assistants, which often sound mechanical and detached. If AI continues in this direction, we could see more intuitive customer service interactions, personalized digital tutors, and even AI-driven therapy companions. But on the flip side, does making AI speech too natural risk creating unrealistic expectations of AI capabilities?

Maya and Miles aren’t just useful for practical applications; they also represent a game-changing innovation in AI-driven role-playing and storytelling. The ability to engage in dynamic conversations, remember past interactions, and respond with emotional nuance could take interactive storytelling in games and virtual worlds to a whole new level. Imagine AI-powered NPCs that react organically to player choices, making every interaction unique

personally i’m excited for what’s to come with this technology. Sesame’s work is nothing short of groundbreaking, and i’m going to be keeping tabs on them from today on

this tech shows how close we are to AI interactions that feel genuinely human - and technology-wise this is reeeeaaaally groundbreaking.

i’m still inclined to think of certain ethical concerns: Should AI be designed to mimic human emotion so convincingly? Could it lead to manipulation or deception if users forget they’re speaking to an algorithm?

Maya and Miles are genuinely amazing

this tech is making me MORE excited for the glasses that they’re teasing - i’ve seen Meta’s Raybans posted on this forum, and while they’re definitely also impressive, i’m even more intrigued by Sesame’s spin on it. i wonder what the glasses will give us that other glasses-based tech haven’t given us?

I Didn’t Expect an AI to Comfort Me, But Then This Happened
byu/snehens inartificial

tl;dr - OP was feeling overwhelmed, talked to Maya on Sesame and then talked to her for 10 minutes then felt comforted by Maya as if they were talking to their friend