Well Speech Heads, this was a long time coming, but here it is! OUR SPINVOX REVIEW!
Some ground rules before we begin:
Pussyfootin’ Provisos and Liability Claims
Trial Version
The SpinVox service I was using was a trial version. It basically worked through call forwarding, sending all my missed calls to SpinVox for transcription and using a third-party aggregator to send them back to me as texts/emails. If I wanted to listen to the messages, I had to call SpinVox directly. When you use SpinVox natively, the service works through a carrier’s existing channels. So for the most part you don’t feel its presence as much as I did.
SpinVox v. Nuance
Doubtlessly, this review will draw some comparisons to our Nuance VM2T review. The devious of you out there will be trying to piece together which one I think is better. Sorry to disappoint, I won’t be coming down with any definitive pronouncements on that count. I’m afraid those betting stubs you bought are going to be worthless.
For the purposes of our blog product reviews we’ve very purposefully eschewed using any kind of numerical value system-in part because that system is just untenable long term for the blog. Technology will change and, moreover, numbers can be used to suggest comparisons between features that we never intended to compare. For a lot of the same reasons, when we do the Speech Industry Awards (SpeechTek 2009, HOLLA!) we have our judges rate vendors overall rather than dealing with thickly overgrown forest individual products.
Furthermore, for the purposes of this Thrilla in Manila: VM2T v. SpinVox, the two are only comparable so far. Like with SpinVox, when I test drove the Nuance service it was just a demo version that didn’t actually deliver texts to my phone. I don’t really have a great feel for the ins and outs of delivery times or the niceties of interface of either. Given that, the only point of comparison between the two that one could feasibly draw is between the respective engine’s recognition accuracy.
The Equipment
Another disclosure: the mobile phone used throughout this review was a Samsung SCH-U540. My phone, pictured here can be described as pretty much the featurest of feature phones. This phone is so feature, and this is not a joke, that when I was talking to a vendor about usability tests on mobile devices, they told me they test on a “full gamut of phones.” At the high end they tested their software some suped BlackBerry, probably capable of fielding a line drive while processing six terabytes of cancer research, and on the low end, they used a phone that the guy called, “as nothing as you can probably get.” The low phone in question? My exact phone.
So bear that in mind when you read through.
And now…
The Review:
Delivery and Some First Results
One of the immediately cool things I noticed about the service is that when I got a message it appeared as a text from the number that called me. Other than “spoken through SpinVox” appearing at the end of each message, they looked exactly like a text from the caller. That makes it really easy to call or text somebody back without having to dig up their number or dial anything. And hey, if the number is in your address book, you’ll know who it’s from before you even crack the message open. Beats cycling through a bunch of messages you have no need to hear, no? Special added bonus, you can also set up your SpinVox account to send a copy to your email!
There were, however, some issues that I encountered in my use of the service. One was with the delay in delivery of messages. How fast they made it from utterance to text seemed to vary anywhere from a couple minutes to as many as fifteen. Some were also dropped while others were delivered in random chronological order. I don’t know how much I can attribute this to my own carrier’s network or SpinVox, but I can provide an example of where things went wrong.
The very first message I got after setting things up was from me; a recreation of Thomas Edison’s famous first words into his first recording device. The message was transcribed flawlessly and got into my phone within four minutes of its issuance. It read:
“Mary had a little lamb. His fleece was white as snow. And everywhere that Mary, that lamb was sure to go. Ha ha ha ha ha.” – spoken through SpinVox
Pretty great start for little SpinVox.
Things went a little awry from there, though. My brother Adam B., excited to try the service, left me a series of messages, one right after the other. Some of them bordered on ridiculous and were made up on-the-fly, prescribing all sorts of terrible advice, while others were derived from texts like macaroni recipes or key passages from the SpeechTek 2009 Advance Program.
All of the messages were uttered in his cubicle behind me, so I heard them organically before I had access to them as transcribed text or recorded sound. I began getting the messages a few minutes later. I was getting them, however, out of order, which may have been a function of the way the service works.
Not all messages take the same path through SpinVox. If a message has a low enough confidence score or some kind of specially tagged wrinkle, it gets sent to an agent to handle it manually. This might account for the fact that they aren’t received in the order that they are sent—–something to keep in mind if you get a lot of messages, one right after the other, and need to know what order they were sent in.
Strangely, however, was that one of Adam’s messages, a recitation of one of the Vagina Monologues, was lost entirely. It was never delivered and I could not even find it in the recorded voicemail audio files that SpinVox keeps. This particularly miffed Adam who was especially eager to see how SpinVox would handle Ensler’s magnum opus and kept coming to my desk to ask “Is in there yet? Huh? Huh? Huh?”
It never got there-which very well may be the fault of SpinVox’s third-party aggregator and not SpinVox’s fault in the slightest.
Unfortunately, I began my trial of the service at the same time that I was looking for an apartment.
Any of you Speech Heads out there who have looked for real estate in New York know what a grueling process that can be. The apartment you see today could be rented from under your feet within an hour of your seeing it. If you see something you want, you have got to move fast. This combined with the fact that you can’t get cell phone reception in the subways means you need a hardcore reliable message service. The trial SpinVox was just a little too slow and unreliable for me to chance it. I got a message from one landlord some hours after it was relevant. Consequently, I had to turn it off and go with my carrier’s service. After I got my new apartment squared away, I reinstated the trial and have used it since.
The couple of weeks I had it off is the reason this review is coming a little late, actually.
Accuracy
Accuracy-wise, SpinVox nails most messages down pretty well.
Representatives from the company will be the first to tell you that the service is meant to transcribe natural messages and not be read to. On the phone with SpinVox’s representative I got some concerned comments after we said in an earlier post that we’d be putting the service through the “Shakespearean wringer.” The SpinVox engine was built with voicemail in mind, I was told, and not 100% accuracy. The goal was to transmit text messages that would make the meaning of a voicemail clear, not to provide some kind of transcript.
In this way, their engine seems to differ greatly from Nuance’s VM2T which is built on Dragon, an engine explicitly designed to provide a transcript.
Given my experience with the product, I would say SpinVox’s is definitely a fair assessment. The engine showed lapses in the “unnatural,” read messages that Adam left me as texts. A recipe for macaroni and cheese rendered the following conjoined SMS messages, for instance:
“Derek if you want to know how to make the best of mac and cheese it’s really very simple in taste to music. Pre heat the oven up to 350 degrees Fahrenheit, boil water a big pot for the pasta. All of the ingredients as long as pasta can easily go in a blender liquid and powdered. It’s by far the easiest way and the only Lee(?) I do it. Pasta is cooked, drained and put it in a baking pan for the cheese and sauce of the pasta. Bake until half of the pasta looks slightly brown and crispy bout 15 min.” – spoken through SpinVox
For more “natural” messages, however, the accuracy was much higher. Nearly all of the messages my mother left me—-things like, “Hi Eric, I was just calling to see how you’re doing. Bye. Mommy,” or the following message about a sublet I was looking at for my sister, “Hi Eric. Gimme a call cos I wanna see if you can still go with Eric up with [sister's name omitted] to look at a place over the weekend and we have to find you know a time that would be good for you so that we can make arrangements for her to go see it.”
There are a couple lapses in those messages, more so in the second with the strange grammatical construction “if you can still go with Eric with,” but SpinVox has more or less done exactly what the service promises. It has rendered voicemail to text understandable enough to forego listening to voicemail. I got similar results with voicemails from friends, too.
Actually, truth be told, other than to check on the messages for the purposes for this write up, I never once felt the need to go back and listen to a voicemail message. I was pretty contented with the messages that were left. Particularly one from my father that read: “What the fack(?) kind of voicemail is that.” – spoken through SpinVox
Apparently, SpinVox doesn’t know its swear words—-which I find charmingly childlike.
Another cool little thing you’ll notice about SpinVox is that it abbreviates in places. “Because” becomes cos and “want to” becomes wanna. There are other examples like that. SpinVox does this stuff to cut down on the drastic character limitations of the SMS medium (160 characters/message).
One kind of annoying thing that the service tends to do is send text messages that just have couple of random characters (example pictured above). These generally come after another message. They wouldn’t be so bad were it not for the sheer awful of my feature phone, which holds only 50 texts at a time and gets quickly depleted by voicemails to begin with. I’m sure you speech heads out there with smart phones that hold 10,000 texts just sort of laugh this kind of thing off.
SpinVox v. Nuance?
I promised I wouldn’t draw direct comparisons, but I’ll draw a few anyway. I’m tempted to say “So sue me,” but given that we’re dealing with major corporations, saying that is like slapping a hungry bear across the face with an uncooked steak—–an invitation to misfortune.
There are a couple of trade offs that are worth discussing. Because Nuance is built on Dragon, an engine built originally for dictation and command-and-control, it fares far better in mobile dictation—-that is messages that are read from printed texts. SpinVox, as I’ve said, bears no pretentions at being able to do that. It’s focused on voicemail, and as such does some things that seem better suited to texting—-like abbreviating words for the SMS form (it does the same thing for emails, incidentally).
Final Verdict
Similar to my last review, SpinVox is a solid product. I don’t know that I get enough voicemails to just not do things the old fashioned way, but if you are it certainly seems to make a lot of sense.

Eric B. —
May 29, 2009 @ 12:10 pm
Thanks for the Nuance VM2T v SpinVox reviews. Will you be reviewing Yap, Vlingo and Promptu’s Shoutout for the iPhone as well (i.e. when the latter is available)? My understanding is that these three do not have human oversight thereby reflecting a more accurate performance of the ASR built for SMS, email, and voicemail, (i.e. speech to text (STT)). How about Phonetag? Don’t know if the latter has human agents involved.
Comment by Inci — May 29, 2009 @ 10:13 pm
Are you REALLY sure that spinvox is ASR based ?
I wonder why no one put a “powered by” to gain more visibility, having developped such a great engine …
Comment by philippe — May 30, 2009 @ 2:43 am
Inci: Phonetag reached at to me as you can see in the comments of my earlier post. However, as of yet I’ve made no plans to review any further services. It’s a long and tedious process, and I think I’m looking forward to a break of just getting my voicemails the old fashioned way. I may take one of them up later, though.
Philippe: I understand your concern about SpinVox not actually having ASR. I’ve seen that before. I think a lot of that sentiment comes from a patent that the company filed in 2006 that seems to lay out a transcription service manned by breathing human transcriptionists. It’s still pending. You can find it here:
http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220060223502%22.PGNR.&OS=DN/20060223502&RS=DN/20060223502
However, a series of patents later filed from 2007-2008 (which seem to be revisions of each other–though I’m not an attorney) suggest that SpinVox is using ASR first and then having human transcriptionists make corrections.
Here’s the most recent filing:
http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PG01&s1=Spinvox.AS.&OS=AN/Spinvox&RS=AN/Spinvox
Likewise, SpinVox has developed an API:
http://www.speechtechmag.com/Articles/News/Industry-News/SpinVox-Launches-Open-Access-to-Its-Speech-Platform-with-Web-API-53274.aspx
Seems strange that they would do that without some underlying recognition technology.
Plus, they entered into a partnership with IBM in April to do STT as a managed service on IBM hardware:
http://www.speechtechmag.com/Articles/News/Industry-News/SpinVox-Launches-Open-Access-to-Its-Speech-Platform-with-Web-API-53274.aspx
A foolhardy move on IBM’s part if SpinVox is just a room full of minimum wage slaves sweatin’ out the transcriptions in some darkened room in some distant land, as some have seemed to suggest.
Apart from the company’s own explanation, ( http://www.spinvox.com/how_it_works.html ) I just think there’s too much outside, verifiable evidence to suggest they ARE using ASR at some level than there is to suggest are not using it at all.
All that said, from what I’ve seen in the trial run, the engine is very much voicemail-to-text-centered. I don’t know if it’s strong enough to have application outside of that. Certainly from a dictation standpoint, Dragon seemed to out preform it. SpinVox, for its part, may be developing, or maybe even has developed, a different engine for purposes outside of voicemail. Who the heck knows?
I appreciate the skepticism, though.
Comment by Eric B. — May 30, 2009 @ 9:00 am
Phonetag reached at to me as you can see in the comments of my earlier post.
Eric, could you please guide me as to where your eariler post on Phonetag is?
Thanks,
Inci
Comment by Inci — June 1, 2009 @ 5:09 pm
I didn’t have a post on Phone Tag. They just contacted me to see if I was interested in reviewing them.
http://www.speechtechblog.com/2009/04/14/spinvox-review-a-coming-to-stb#comments
I’ll probably get around to it sooner or later.
Comment by Eric B. — June 2, 2009 @ 12:47 pm
Don’t worry about reviewing us directly, as you’ll probably end up being exposed to us elsewhere as we already power a number of the existing voicemail-to-text providers with our automation. Since our founding, we’ve evolved to become a platform provider rather than a direct to consumer offering (we can provide functionality similar to Google Search or Google Voice as a white label solution). Some of the same R&D staff that developed Dragon and ViaVoice are working their magic herein. Cheers!
Comment by Igor Jablokov — June 3, 2009 @ 8:48 pm