I really apologize for the poooor attempt at word play - but I had to find a way to get y’alls attention, right?
I recently came across Sesame, an AI startup founded by Oculus co-founder Brendan Iribe, along with Ankit Kumar and Ryan Brown, is making significant strides with its voice assistants, ‘Maya & Miles.’ Unlike typical AI assistants, both assistants provide a more engaging, human-like interaction. Users who interacted with Maya and Miles have noted their ability to role-play in user-created scenarios, enhancing the naturalness of conversations.
At the heart of Sesame’s innovation is the concept of “voice presence,” which aims to make spoken interactions with AI feel real, understood, and valued. This involves several key components:
- Emotional Intelligence: The ability to read and respond to emotional contexts, allowing the assistant to detect nuances in a user’s voice and tailor its responses accordingly.
- Conversational Dynamics: Incorporating natural timing, pauses, interruptions, and emphasis to mimic human speech patterns, thereby enhancing the fluidity of interactions.
- Contextual Awareness: Adjusting tone and style to match the situation, ensuring that the assistant’s responses are appropriate and relevant.
- Consistent Personality: Maintaining a coherent, reliable, and appropriate presence throughout interactions, fostering user trust and comfort.Sesame
To achieve these objectives, Sesame has developed the Conversational Speech Model (CSM), an end-to-end multimodal learning framework that utilizes transformers. Unlike traditional text-to-speech models that generate spoken output directly from text but lack contextual awareness, CSM leverages the history of the conversation to produce more natural and coherent speech. This approach addresses the “one-to-many” problem in speech generation, where multiple valid ways exist to express a sentence, by considering additional context such as tone, rhythm, and conversational history.
In addition to software advancements, Sesame is developing AI glasses designed for all-day wear, providing high-quality audio and seamless access to the voice assistant. While these are just in their prototype stages, these glasses aim to create an immersive experience where the AI companion can observe the world alongside the user, offering timely and context-aware assistance.
The impact of Sesame’s technology has been profound. Users have reported experiences ranging from fascination to unease due to the assistant’s lifelike conversational abilities. For example, I myself had an interaction with the AI, and it was so realistic it left me unsettled, underscoring the assistant’s advanced conversational skills. On the other hand, see below Sean Hollister’s reaction from his personal experience with Maya:
“But speaking to “Maya,” one of two voices from a new startup headed by the man who built Oculus VR and sold it to Facebook, is the first time I’ve been left wanting more. Like I could just talk to it, or at least play a genuinely fun game of testing its limits, like I did with Bing before Microsoft decided to tame down its unhinged persona.” from his article on The Verge
By prioritizing emotional intelligence, contextual awareness, and natural conversational dynamics, Sesame is not merely setting new standards but also redefining communication itself. As these advancements continue, we will likely see a transformative era where AI enriches human experiences in ways we have yet to fully imagine.

