Editor's note: The following is a guest article from Anant Jhingran, CTO of Apigee at Google, and Michael Endler, a content and editorial strategist at Google Cloud.
Voice interfaces are one of today's hottest technology trends. Some major platforms can understand verbal commands at least 95% of the time, putting their comprehension skills on par with those of humans, and they're getting better at fulfilling the requests asked of them.
Literally thousands of capabilities have been added to these voice-enabled platforms in just the last year. Use cases range from setting appointments, checking the weather and conducting transactions to real-time language translation and richer, more efficient healthcare experiences.
With time, voice interfaces will likely play a role in virtually every business, manifesting in the internal applications and tools that enterprises use to get things done and in the experiences that enterprises offer externally to customers.
But this probable future presents a significant problem: Almost no businesses have skills or experience designing voice interfaces.
Of course, a few years ago, almost no one had any experience designing mobile apps. Some businesses crashed and burned as users rapidly adopted a new interaction model — but after some growing pains, most companies have figured out mobile.
They may have outsourced development for a while as they hired more developers and built up digital capabilities, but over the last decade, many enterprises have succeeded in tipping the balance of their technical talent from people skilled with legacy system maintenance to developers skilled in building mobile apps.
Should industry expect a similar trajectory for voice? Not necessarily.
The transition from desktop to mobile was made somewhat easier because the two interfaces share a number of similarities. Instead of clicking a digital object with a mouse, the user taps that object on a touchscreen. Instead of scrolling down a webpage with keyboard arrows, the users swipes. Icons, text, pictures, and other web elements are central to both the desktop and mobile interaction models.
The above cannot be said for voice interfaces. Voice is a different interaction model that can avail itself to dramatically different form factors.
There may be no screen to look at and nothing to touch, whether via screen, mouse or physical buttons. Some voice commands may be easy for users to intuitively adopt, regardless of the device or platform, whereas others may require that users learn how to interact with a specific experience.
Some manifestations of voice may be integrated with other interaction models, allowing intermodal experiences that combine vision, voice and touch.
But others may present themselves almost as disembodied voices, like when Star Trek's Captain Picard says "Computer" somewhere on the Enterprise and an anthropomorphized, omnipresent voice simply responds.
Can businesses navigate this disruption by sitting tight and waiting out the disruptions? Probably not.
Rather, they should take a two-pronged and proactive approach:
Rely on others
It's important to recognize that voice technologies will likely follow a maturity curve similar to other technologies. HTML, for example, began as the ability to list bullets, text and tables, and if one were to look at the source code for an early website, the code might be fairly simple to interpret.
Source code for modern webpages looks very different. The code isn't authored top-to-bottom by a human being but rather assembled through tools that encapsulate specific functionality and result in code that is easier for developers to use but also much more complex to the human eye.
Voice is likely to evolve similarly, with services assembled from application programming interfaces, or APIs, rather than developed from the ground up.
Indeed, many of the few companies with experience creating voice interfaces provide access to their platforms and services via productized API.
These APIs abstract the complexity of these voice services behind a consistent interface, making it easier for developers to leverage underlying systems and functionality even if the developers lack experience with voice.
Relying exclusively on this approach is probably not a great long-term plan, as doing so confines a business to someone else's model, but it is definitely useful for plugging holes, complementing in-house functionality and transitioning to voice now while in-house development capabilities catch up.
Invest in the right people and technologies
An organization doesn't need to build everything itself. But just as today's successful businesses have had to increase investments in web developers, future successful business will have to increase investments in people skilled with new technologies, including not only voice platforms, but also the underlying machine learning efforts that make voice possible.
Outsourcing everything may work for a while, but in an environment in which all businesses are expected to be de facto technology businesses, internal resources matter.
With surveys indicating around half of Americans use voice interfaces, it's clear that now is the time for businesses to get ahead of the curve. New services and digital ecosystems mean enterprises don't have to take on this challenge alone as they ramp up internal capabilities.