System designed for use by providers of speech-based services in e-commerce, e-learning and e-banking, among others (image: human voice spectrogram / Dvortygirl, Mysid / Wikimedia Commons)

Startup develops computational resources for speech technologies
2018-09-19

Brazilian startup SpeechTera invests in four different products: speech corpora, acoustic models, pronunciation models, and grapheme-to-phoneme converters.

Startup develops computational resources for speech technologies

Brazilian startup SpeechTera invests in four different products: speech corpora, acoustic models, pronunciation models, and grapheme-to-phoneme converters.

2018-09-19

System designed for use by providers of speech-based services in e-commerce, e-learning and e-banking, among others (image: human voice spectrogram / Dvortygirl, Mysid / Wikimedia Commons)

 

By Suzel Tunes  |  FAPESP Research for Innovation – When Vanessa Marquiafável Serrani began her undergraduate degree course in language and literature at the Federal University of São Carlos (UFSCar), Brazil, in 2000, her professional future seemed clear-cut: she would be an English teacher. However, her career path changed during her graduate studies. While conducting a scientific initiation project, she was introduced to the Interinstitutional Center for Computational Linguistics (NILC-USP) at the University of São Paulo in São Carlos and ultimately became an entrepreneur instead of remaining in academia.

Serrani is now a partner in her own firm, SpeechTera Desenvolvimento de Programas para Computadores Ltda., and with support from FAPESP’s Innovative Research in Small Business Program (PIPE), she is developing a project to create computational resources for speech technologies in Brazilian Portuguese.

The project completed its PIPE Phase 1 feasibility test in 2016 and is now in Phase 2, which involves development proper and is scheduled for completion in 2019, when SpeechTera expects to bring to market computational resources essential to the production of speech synthesis and recognition systems. These technologies can be applied in various ways, she explains, from the creation of voice commands for electronic devices and pronunciation training in language learning to automatic translation, therapy for people with speech pathologies, and digital inclusion of the visually impaired or otherwise disabled, among others.

Personalized voices can be created for individuals with speech disorders. “Voice is a crucial part of a person’s identity,” Serrani says. However, current speech synthesis systems are developed abroad and expensive, so technology firms tend to create limited types of synthetic voices, which can lead to user dissatisfaction or rejection.

The development of a native technology not only cuts costs but can also offer new possibilities for customized male, female and child voices. “Acoustic traits can be extracted from small speech samples to construct a personalized synthetic voice for individuals with motor disabilities who can only speak a few words or syllables,” Serrani says.

She explains that SpeechTera is focusing on a business-to-business approach, targeting as customers other firms that develop speech-based services in e-commerce, e-learning and e-banking, as well as hospitals, clinics and health centers.

SpeechTera is investing in four different products: speech corpora, acoustic models, pronunciation models, and grapheme-to-phoneme converters. According to Serrani, a speech corpus (plural corpora) is a database of speech audio files and text transcriptions for use by synthesizers. “We collect voice samples from people aged 18-65 with a variety of profiles and Brazilian accents,” she says. “The greater the variation, the better our speech recognition tools will perform.”

Acoustic models correlate audio signals to the phonemes of the language. Pronunciation models are phonetic dictionaries – lists of words and their pronunciations in computer-readable phonetic script.

“These dictionaries are transcribed in 13 different Brazilian accents, which we chose from among the wide variety in use in Brazil,” Serrani says.

The grapheme-to-phoneme converter is an algorithm that converts text input in conventional written format to a string of phonetic symbols for processing by a computer.

These four products can be marketed together or separately, Serrani says.

Voice collection

“This is a relatively new field and very lacking in research. Discovering this angle motivated me a great deal to work with voice technology,” Serrani recalls, adding that NILC, which is part of the University of São Paulo’s Mathematics and Computer Science Institute (ICMC) in São Carlos, has the largest computational linguistics research group in Brazil. It is a multidisciplinary team that includes linguists and computer scientists.

NILC’s researchers developed the spell checker used in the Brazilian version of Microsoft Word, under the aegis of a larger project developed in 1997-98 with investment from Itautec, a private-sector IT firm, and from FAPESP through its Research Partnership for Technological Innovation Program (PITE). In 2000, Microsoft acquired the rights to the tool developed by NILC and incorporated it into the Office suite.

Serrani first came into contact with computational linguistics during her undergraduate course. Then, in the interval between her master’s degree, completed in 2007, and the start of her PhD research in 2011, she seized an opportunity to work on a PIPE project with electrical engineer Luis Felipe Uebel, who was developing an internet browser with speech recognition and synthesis. “I produced a phonetic dictionary for the project. It entailed asking people on the São Carlos campus of the University of São Paulo for permission to record their voices,” she recalls.

The assignment enabled Serrani to acquire experience that proved highly useful when she decided to start SpeechTera in April 2015. Collecting voices, for example, is now far easier: instead of going up to people and asking them to record, the firm has developed a smartphone app that enables collaborators to send speech samples remotely.

“All I need to do is send a link, which the person uses to upload a recording,” she says. “We’ve collected speech samples from 400 people. Each recorded 100 short phrases and was paid a fee equivalent to approximately six or seven dollars. This investment was less than the expense of hiring professionals plus travel if we recorded them personally. We also saved a lot of time.”

SpeechTera has a multidisciplinary development team of linguists, electrical engineers and computer scientists. It is very much a startup in that it has no offices, and its six professionals work from home (in the cities of Araras, Hortolândia, São Carlos and Araraquara). Moreover, its website has yet to go live. It focuses entirely on product development and has no revenue aside from the project funding from FAPESP.

However, even before adopting a publicity strategy (a marketing project is currently in the works), the firm has been contacted by two large corporations that are interested in acquiring resources for the development of speech technologies. It therefore has highly positive expectations. “We’re sticking to the timetable and objectives initially proposed for the project, thanks to the excellent multidisciplinary team we’ve succeeded in building since we embarked on this journey,” Serrani says.

Company: SpeechTera Desenvolvimento de Programas para Computadores
Tel: +55 19 97142-4872
Contact: speechtera@gmail.com and marquiafavel@gmail.com

 
  Republish
 

Republish

The Agency FAPESP licenses news via Creative Commons (CC-BY-NC-ND) so that they can be republished free of charge and in a simple way by other digital or printed vehicles. Agência FAPESP must be credited as the source of the content being republished and the name of the reporter (if any) must be attributed. Using the HMTL button below allows compliance with these rules, detailed in Digital Republishing Policy FAPESP.