版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
1、<p><b> 語音識別</b></p><p> 在計算機技術(shù)中,語音識別是指為了達到說話者發(fā)音而由計算機生成的功能,利用計算機識別人類語音的技術(shù)。(例如,抄錄講話的文本,數(shù)據(jù)項;經(jīng)營電子和機械設(shè)備;電話的自動化處理) ,是通過所謂的自然語言處理的計算機語音技術(shù)的一個重要元素。通過計算機語音處理技術(shù),來自語音發(fā)音系統(tǒng)的由人類創(chuàng)造的聲音,包括肺,聲帶和舌頭,通過接觸,語音模式
2、的變化在嬰兒期、兒童學習認識有不同的模式,盡管由不同人的發(fā)音,例如,在音調(diào),語氣,強調(diào),語調(diào)模式不同的發(fā)音相同的詞或短語,大腦的認知能力,可以使人類實現(xiàn)這一非凡的能力。在撰寫本文時(2008年),我們可以重現(xiàn),語音識別技術(shù)不只表現(xiàn)在有限程度的電腦能力上,在其他許多方面也是有用的。</p><p><b> 語音識別技術(shù)的挑戰(zhàn)</b></p><p> 古老的書寫系
3、統(tǒng),要回溯到蘇美爾人的六千年前。他們可以將模擬錄音通過留聲機進行語音播放,直到1877年。然而,由于與語音識別各種各樣的問題,語音識別不得不等待著計算機的發(fā)展。</p><p> 首先,演講不是簡單的口語文本——同樣的道理,戴維斯很難捕捉到一個note-for-note曲作為樂譜。人類所理解的詞、短語或句子離散與清晰的邊界實際上是將信號連續(xù)的流,而不是聽起來: I went to the store yest
4、erday昨天我去商店。單詞也可以混合,用Whadd ayawa嗎?這代表著你想要做什么。第二,沒有一對一的聲音和字母之間的相關(guān)性。在英語,有略多于5個元音字母——a,e,i,o,u,有時y和w。有超過二十多個不同的元音, 雖然,精確統(tǒng)計可以取決于演講者的口音而定。但相反的問題也會發(fā)生,在那里一個以上的信號能再現(xiàn)某一特定的聲音。字母C可以有相同的字母K的聲音,如蛋糕,或作為字母S,如柑橘。</p><p> 此
5、外,說同一語言的人使用不相同的聲音,即語言不同,他們的聲音語音或模式的組織,有不同的口音。例如“水”這個詞,wadder可以顯著watter,woader wattah等等。每個人都有獨特的音量——男人說話的時候,一般開的最低音,婦女和兒童具有更高的音高(雖然每個人都有廣泛的變異和重疊)。發(fā)音可以被鄰近的聲音、說話者的速度和說話者的健康狀況所影響,當一個人感冒的時候,就要考慮發(fā)音的變化。</p><p> 最后
6、,考慮到不是所有的語音都是有意義的聲音組成。通常語音自身是沒有任何意義的,但有些用作分手話語以傳達說話人的微妙感情或動機的信息:哦,就像,你知道,好的。也有一些聽起來都不認為是字,這是一項詞性的:呃,嗯,嗯。嗽、打噴嚏、談笑風生、嗚咽,甚至打嗝的可以成為上述的內(nèi)容之一。在噪雜的地方與環(huán)境自身的噪聲中,即使語音識別也是困難的。</p><p> “我昨天去了商店”的波形圖</p><p>
7、 “我昨天去了商店”的光譜圖</p><p><b> 語音識別的發(fā)展史</b></p><p> 盡管困難重重,語音識別技術(shù)卻隨著數(shù)字計算機的誕生一直被努力著。早在1952年,研究人員在貝爾實驗室就已開發(fā)出了一種自動數(shù)字識別器,取名“奧黛麗”。如果說話的人是男性,并且發(fā)音者在詞與詞之間停頓350毫秒并把把詞匯限制在1—9之間的數(shù)字,再加上“哦”,另外如果這臺機
8、器能夠調(diào)整到適應說話者的語音習慣,奧黛麗的精確度將達到97℅—99℅,如果識別器不能夠調(diào)整自己,那么精確度將低至60℅.</p><p> 奧黛麗通過識別音素或者兩個截然不同的聲音工作。這些因素與識別器經(jīng)訓練產(chǎn)生的參考音素是有關(guān)聯(lián)的。在接下來的20年里研究人員花了大量的時間和金錢來改善這個概念,但是少有成功。計算機硬件突飛猛進、語音合成技術(shù)穩(wěn)步提高,喬姆斯基的生成語法理論認為語言可以被程序性地分析。然而,這些似
9、乎并沒有提高語音識別技術(shù)。喬姆斯基和哈里的語法生成工作也導致主流語言學放棄音素概念,轉(zhuǎn)而選擇將語言的聲音模式分解成更小、更易離散的特征。</p><p> 1969年皮爾斯坦率地寫了一封信給美國聲學學會的會刊,大部分關(guān)于語音識別的研究成果都發(fā)表在上面。皮爾斯是衛(wèi)星通信的先驅(qū)之一,并且是貝爾實驗室的執(zhí)行副主任,貝爾實驗室在語音識別研究中處于領(lǐng)先地位。皮爾斯說所有參與研究的人都是在浪費時間和金錢。</p>
10、;<p> 如果你認為一個人之所以從事語音識別方面的研究是因為他能得到金錢,那就太草率了。這種吸引力也許類似于把水變成汽油、從海水中提取黃金、治愈癌癥或者登月的誘惑。一個人不可能用削減肥皂成本10℅的方法簡單地得到錢。如果想騙到人,他要用欺詐和誘惑。</p><p> 皮爾斯1969年的信標志著在貝爾實驗室持續(xù)了十年的研究結(jié)束了。然而,國防研究機構(gòu)ARPA選擇了堅持下去。1971年他們資助了一項
11、開發(fā)一種語音識別器的研究計劃,這種語音識別器要能夠處理至少1000個詞并且能夠理解相互連接的語音,即在語音中沒有詞語之間的明顯停頓。這種語音識別器能夠假設(shè)一種存在輕微噪音背景的環(huán)境,并且它不需要在真正的時間中工作。</p><p> 到1976年,三個承包公司已經(jīng)開發(fā)出六種系統(tǒng)。最成功的是由卡耐基麥隆大學開發(fā)的叫做“Harpy”的系統(tǒng)?!癏arpy”比較慢,四秒鐘的句子要花費五分多鐘的時間來處理。并且它還要求發(fā)
12、音者通過說句子來建立一種參考模型。然而,它確實識別出了1000個詞匯,并且支持連音的識別。</p><p> 研究通過各種途徑繼續(xù)著,但是“Harpy”已經(jīng)成為未來成功的模型。它應用隱馬爾科夫模型和統(tǒng)計模型來提取語音的意義。本質(zhì)上,語音被分解成了相互重疊的聲音片段和被認為最可能的詞或詞的部分所組成的幾率模型。整個程序計算復雜,但它是最成功的。</p><p> 在1970s到1980s
13、之間,關(guān)于語音識別的研究繼續(xù)進行著。到1980s,大部分研究者都在使用隱馬爾科夫模型,這種模型支持著現(xiàn)代所有的語音識別器。在1980s后期和1990s,DARPA資助了一些研究。第一項研究類似于以前遇到的挑戰(zhàn),即1000個詞匯量,但是這次要求更加精確。這個項目使系統(tǒng)詞匯出錯率從10℅下降了一些。其余的研究項目都把精力集中在改進算法和提高計算效率上。</p><p> 2001年微軟發(fā)布了一個能夠與0ffice
14、XP 同時工作的語音識別系統(tǒng)。它把50年來這項技術(shù)的發(fā)展和缺點都包含在內(nèi)了。這個系統(tǒng)必須用大作家的作品來訓練為適應某種指定的聲音,比如埃德加愛倫坡的厄舍古屋的倒塌和比爾蓋茨的前進的道路。即使在訓練之后,該系統(tǒng)仍然是脆弱的,以至于還提供了一個警告:“如果你改變使用微軟語音識別系統(tǒng)的地點導致準確率將降低,請重新啟動麥克風”。從另一方面來說,該系統(tǒng)確實能夠在真實的時間中工作,并且它確實能識別連音。</p><p>&l
15、t;b> 語音識別的今天</b></p><p><b> 技術(shù)</b></p><p> 當今的語音識別技術(shù)著力于通過共振和光譜分析來對我們的聲音產(chǎn)生的聲波進行數(shù)學分析。計算機系統(tǒng)第一次通過數(shù)字模擬轉(zhuǎn)換器記錄了經(jīng)過麥克風傳來的聲波。那種當我們說一個詞的時候所產(chǎn)生的模擬的或者持續(xù)的聲波被分割成了一些時間碎片,然后這些碎片按照它們的振幅水平被度
16、量,振幅是指從一個說話者口中產(chǎn)生的空氣壓力。為了測量振幅水平并且將聲波轉(zhuǎn)換成為數(shù)字格式,現(xiàn)在的語音識別研究普遍采用了奈奎斯特—香農(nóng)定理。</p><p><b> 奈奎斯特—香農(nóng)定理</b></p><p> 奈奎斯特—香農(nóng)定理是在1928年研究發(fā)現(xiàn)的,該定理表明一個給定的模擬頻率能夠由一個是原始模擬頻率兩倍的數(shù)字頻率重建出來。奈奎斯特證明了該規(guī)律的真實性,因為一
17、個聲波頻率必須由于壓縮和疏散各取樣一次。例如,一個20kHz的音頻信號能準確地被表示為一個44.1kHz的數(shù)字信號樣本。</p><p><b> 工作原理</b></p><p> 語音識別系統(tǒng)通常使用統(tǒng)計模型來解釋方言,口音,背景噪音和發(fā)音的不同。這些模型已經(jīng)發(fā)展到這種程度,在一個安靜的環(huán)境中準確率可以達到90℅以上。然而每一個公司都有它們自己關(guān)于輸入處理的專
18、項技術(shù),存在著4種關(guān)于語音如何被識別的共同主題。</p><p> 1.基于模板:這種模型應用了內(nèi)置于程序中的語言數(shù)據(jù)庫。當把語音輸入到系統(tǒng)中后,識別器利用其與數(shù)據(jù)庫的匹配進行工作。為了做到這一點,該程序使用了動態(tài)規(guī)劃算法。這種語音識別技術(shù)的衰落是因為這個識別模型不足以完成對不在數(shù)據(jù)庫中的語音類型的理解。</p><p> 2.基于知識:基于知識的語音識別技術(shù)分析語音的聲譜圖以收集數(shù)據(jù)
19、和制定規(guī)則,這些數(shù)據(jù)和規(guī)則回饋與操作者的命令和語句等值的信息。這種識別技術(shù)不適用關(guān)于語音的語言和語音知識。</p><p> 3.隨機:隨機語音識別技術(shù)在今天最為常見。隨機語音分析方法利用隨機概率模型來模擬語音輸入的不確定性。最流行的隨機概率模型是HMM(隱馬爾科夫模型)。如下所示:
20、 </p><p> Yt是觀察到的聲學數(shù)據(jù),p(W)是一個特定詞串的先天隨機概率,p(Yt∣W)是在給定的聲學模型中被觀察到的聲學數(shù)據(jù)的概率,W是假設(shè)的詞匯串。在分析語音輸入的時候,HMM被證明是成功的,因為該算法考慮到了語言模型,人類說話的聲音模型和已知的所有詞匯。</p><p> 1.聯(lián)結(jié):在聯(lián)結(jié)主義語音識別技術(shù)當中,關(guān)于語音輸入的知識是這樣獲得的,即分析輸入的信號并從簡單的
21、多層感知器中用多種方式將其儲存在延時神經(jīng)網(wǎng)絡中。</p><p> 如前所述,利用隨機模型來分析語言的程序是今天最流行的,并且證明是最成功的。</p><p><b> 識別指令</b></p><p> 當今語音識別軟件最重要的目標是識別指令。這增強了語音軟件的功能。例如微軟Sync被裝進了許多新型汽車里面,據(jù)說這可以讓使用者進入汽車的
22、所有電子配件和免提。這個軟件是成功的。它詢問使用者一系列問題并利用常用詞匯的發(fā)音來得出語音恒量。這些常量變成了語音識別技術(shù)算法中的一環(huán),這樣以后就能夠提供更好的語音識別。當今的技術(shù)評論家認為這項技術(shù)自20世紀90年代開始已經(jīng)有了很大進步,但是在短時間內(nèi)不會取代手控裝置。</p><p><b> 聽寫</b></p><p> 關(guān)于指令識別的第二點是聽寫。就像接下
23、來討論的那樣,今天的市場看重聽寫軟件在轉(zhuǎn)述醫(yī)療記錄、學生試卷和作為一種更實用的將思想轉(zhuǎn)化成文字方面的價值。另外,許多公司看重聽寫在翻譯過程中的價值,在這個過程中,使用者可以把他們的語言翻譯成為信件,這樣使用者就可以說給他們母語中另一部分人聽。在今天的市場上,關(guān)于該軟件的生產(chǎn)制造已經(jīng)存在。</p><p> 語句翻譯中存在的錯誤</p><p> 當語音識別技術(shù)處理你的語句的時候,它們的
24、準確率取決于它們減少錯誤的能力。它們在這一點上的評價標準被稱為單個詞匯錯誤率(SWER)和指令成功率(CSR)。當一個句子中一個單詞被弄錯,那就叫做單個詞匯出錯。因為SWERs在指令識別系統(tǒng)中存在,它們在聽寫軟件中最為常見。指令成功率是由對指令的精確翻譯決定的。一個指令陳述可能不會被完全準確的翻譯,但識別系統(tǒng)能夠利用數(shù)學模型來推斷使用者想要發(fā)出的指令。</p><p><b> 商業(yè)</b>
25、;</p><p><b> 主要的語音技術(shù)公司</b></p><p> 隨著語音技術(shù)產(chǎn)業(yè)的發(fā)展,更多的公司帶著他們新的產(chǎn)品和理念進入這一領(lǐng)域。下面是一些語音識別技術(shù)領(lǐng)域領(lǐng)軍公司名單(并非全部)NICE Systems(NASDAQ:NICE and Tel Aviv:Nice),該公司成立于1986年,總部設(shè)在以色列,它專長于數(shù)字記錄和歸檔技術(shù)。他們在2007
26、年收入5.23億美元。欲了解更多信息,請訪問http://www.nice.com</p><p> Verint系統(tǒng)公司(OTC:VRNT),總部設(shè)在紐約的梅爾維爾,創(chuàng)立于1994年把自己定位為“勞動力優(yōu)化智能解決方案,IP視頻,通訊截取和公共安全設(shè)備的領(lǐng)先供應商。詳細信息,請訪問http://verint.com</p><p> Nuance公司(納斯達克股票代碼:NUAN)總部
27、設(shè)在伯靈頓,開發(fā)商業(yè)和客戶服務使用語音和圖像技術(shù)。欲了解更多信息,請訪問http://www.nuance.com</p><p> Vlingo,總部設(shè)在劍橋,開發(fā)與無線/移動技術(shù)對接的語音識別技術(shù)。 Vlingo最近與雅虎聯(lián)手合作,為雅虎的移動搜索服務—一鍵通功能提供語音識別技術(shù)。欲了解更多信息,請訪問http://vlingo.com</p><p> 在語音技術(shù)領(lǐng)域的其他主要公
28、司包括:Unisys,ChaCha,SpeechCycle,Sensory,微軟的Tellme公司,克勞斯納技術(shù)等等。專利侵權(quán)訴訟</p><p> 考慮到這兩項業(yè)務和技術(shù)的高度競爭性,各公司之間有過無數(shù)次的專利侵權(quán)訴訟并不奇怪。在開發(fā)語音識別設(shè)備所涉及的每個元素都可以作為一個單獨的技術(shù)申請專利。使用已經(jīng)被另一家公司或個人申請專利的技術(shù),即使這項技術(shù)是你自己獨立研發(fā)的,你也可能被要求賠償,并并可能不公正地禁止
29、你以后使用該項技術(shù)。語音產(chǎn)業(yè)中的政治和商業(yè)緊緊地與語音技術(shù)的發(fā)展聯(lián)系在一起,因此,必須認識到可能阻礙該行業(yè)的進一步發(fā)展的政治和法律障礙。下面是對一些專利侵權(quán)訴訟的敘述。應當指出,目前有許多這樣的訴訟立案,許多訴訟案被推上法庭。</p><p><b> 語音識別未來的發(fā)展</b></p><p> 今后的發(fā)展趨勢和應用</p><p>&l
30、t;b> 醫(yī)療行業(yè)</b></p><p> 醫(yī)療行業(yè)有多年來一直在宣傳電子病歷(EMR)。不幸的是,產(chǎn)業(yè)遲遲不能夠滿足EMRs,一些公司斷定原因是由于數(shù)據(jù)的輸入。沒有足夠的人員將大量的病人信息輸入成為電子格式,因此,紙質(zhì)記錄依然盛行。一家叫Nuance(也出現(xiàn)在其他領(lǐng)域,軟件開發(fā)者稱為龍指令)相信他們可以找到一市場將他們的語音識別軟件出售那些更喜歡聲音而非手寫輸入病人信息的醫(yī)生。</
31、p><p><b> 軍事</b></p><p> 國防工業(yè)研究語音識別軟件試圖將其應用復雜化而非更有效率和親切。為了使駕駛員更快速、方便地進入需要的數(shù)據(jù)庫,語音識別技術(shù)是目前正在飛機駕駛員座位下面的顯示器上進行試驗。</p><p> 軍方指揮中心同樣正在嘗試利用語音識別技術(shù)在危急關(guān)頭用快速和簡易的方式進入他們掌握的大量資料庫。另外,軍方
32、也為了照顧病員涉足EMR。軍方宣布,正在努力利用語音識別軟件把數(shù)據(jù)轉(zhuǎn)換成為病人的記錄。</p><p> 摘自:http://en.citizendium.org/wiki/Speech_Recognition</p><p><b> 附:英文原文</b></p><p> Speech Recognition</p>&
33、lt;p> In computer technology, Speech Recognition refers to the recognition of human speech by computers for the performance of speaker-initiated computer-generated functions (e.g., transcribing speech to text; data e
34、ntry; operating electronic and mechanical devices; automated processing of telephone calls) — a main element of so-called natural language processing through computer speech technology. Speech derives from sounds created
35、 by the human articulatory system, including the lungs, vocal cords, a</p><p> The Challenge of Speech Recognition </p><p> Writing systems are ancient, going back as far as the Sumerians of 6
36、,000 years ago. The phonograph, which allowed the analog recording and playback of speech, dates to 1877. Speech recognition had to await the development of computer, however, due to multifarious problems with the recogn
37、ition of speech. </p><p> First, speech is not simply spoken text--in the same way that Miles Davis playing So What can hardly be captured by a note-for-note rendition as sheet music. What humans understand
38、 as discrete words, phrases or sentences with clear boundaries are actually delivered as a continuous stream of sounds: Iwenttothestoreyesterday, rather than I went to the store yesterday. Words can also blend, with Whad
39、dayawa? representing What do you want? </p><p> Second, there is no one-to-one correlation between the sounds and letters. In English, there are slightly more than five vowel letters--a, e, i, o, u, and som
40、etimes y and w. There are more than twenty different vowel sounds, though, and the exact count can vary depending on the accent of the speaker. The reverse problem also occurs, where more than one letter can represent a
41、given sound. The letter c can have the same sound as the letter k, as in cake, or as the letter s, as in citrus. </p><p> In addition, people who speak the same language do not use the same sounds, i.e. lan
42、guages vary in their phonology, or patterns of sound organization. There are different accents--the word 'water' could be pronounced watter, wadder, woader, wattah, and so on. Each person has a distinctive pitch
43、when they speak--men typically having the lowest pitch, women and children have a higher pitch (though there is wide variation and overlap within each group.) Pronunciation is also colored by adjacent sou</p><
44、p> Lastly, consider that not all sounds consist of meaningful speech. Regular speech is filled with interjections that do not have meaning in themselves, but serve to break up discourse and convey subtle information
45、about the speaker's feelings or intentions: Oh, like, you know, well. There are also sounds that are a part of speech that are not considered words: er, um, uh. Coughing, sneezing, laughing, sobbing, and even hiccupp
46、ing can be a part of what is spoken. And the environment adds its own n</p><p> History of Speech Recognition </p><p> Despite the manifold difficulties, speech recognition has been attempted
47、for almost as long as there have been digital computers. As early as 1952, researchers at Bell Labs had developed an Automatic Digit Recognizer, or "Audrey". Audrey attained an accuracy of 97 to 99 percent if t
48、he speaker was male, and if the speaker paused 350 milliseconds between words, and if the speaker limited his vocabulary to the digits from one to nine, plus "oh", and if the machine could be adjusted to the sp
49、eaker's</p><p> Audrey worked by recognizing phonemes, or individual sounds that were considered distinct from each other. The phonemes were correlated to reference models of phonemes that were generate
50、d by training the recognizer. Over the next two decades, researchers spent large amounts of time and money trying to improve upon this concept, with little success. Computer hardware improved by leaps and bounds, speech
51、synthesis improved steadily, and Noam Chomsky's idea of generative grammar suggested that lang</p><p> In 1969, John R. Pierce wrote a forthright letter to the Journal of the Acoustical Society of Ameri
52、ca, where much of the research on speech recognition was published. Pierce was one of the pioneers in satellite communications, and an executive vice president at Bell Labs, which was a leader in speech recognition resea
53、rch. Pierce said everyone involved was wasting time and money. </p><p> It would be too simple to say that work in speech recognition is carried out simply because one can get money for it. . . .The attract
54、ion is perhaps similar to the attraction of schemes for turning water into gasoline, extracting gold from the sea, curing cancer, or going to the moon. One doesn't attract thoughtlessly given dollars by means of sche
55、mes for cutting the cost of soap by 10%. To sell suckers, one uses deceit and offers glamor. </p><p> Pierce's 1969 letter marked the end of official research at Bell Labs for nearly a decade. The defen
56、se research agency ARPA, however, chose to persevere. In 1971 they sponsored a research initiative to develop a speech recognizer that could handle at least 1,000 words and understand connected speech, i.e., speech witho
57、ut clear pauses between each word. The recognizer could assume a low-background-noise environment, and it did not need to work in real time. </p><p> By 1976, three contractors had developed six systems. Th
58、e most successful system, developed by Carnegie Mellon University, was called Harpy. Harpy was slow—a four-second sentence would have taken more than five minutes to process. It also still required speakers to 'train
59、' it by speaking sentences to build up a reference model. Nonetheless, it did recognize a thousand-word vocabulary, and it did support connected speech. </p><p> Research continued on several paths, but
60、 Harpy was the model for future success. It used hidden Markov models and statistical modeling to extract meaning from speech. In essence, speech was broken up into overlapping small chunks of sound, and probabilistic mo
61、dels inferred the most likely words or parts of words in each chunk, and then the same model was applied again to the aggregate of the overlapping chunks. The procedure is computationally intensive, but it has proven to
62、be the most successf</p><p> Throughout the 1970s and 1980s research continued. By the 1980s, most researchers were using hidden Markov models, which are behind all contemporary speech recognizers. In the l
63、atter part of the 1980s and in the 1990s, DARPA (the renamed ARPA) funded several initiatives. The first initiative was similar to the previous challenge: the requirement was still a one-thousand word vocabulary, but thi
64、s time a rigorous performance standard was devised. This initiative produced systems that lowered the w</p><p> In 2001, Microsoft released a speech recognition system that worked with Office XP. It neatly
65、encapsulated how far the technology had come in fifty years, and what the limitations still were. The system had to be trained to a specific user's voice, using the works of great authors that were provided, such as
66、Edgar Allen Poe's Fall of the House of Usher, and Bill Gates' The Way Forward. Even after training, the system was fragile enough that a warning was provided, "If you change the room in whic</p><p
67、> Speech Recognition Today </p><p> Technology </p><p> Current voice recognition technologies work on the ability to mathematically analyze the sound waves formed by our voices through re
68、sonance and spectrum analysis. Computer systems first record the sound waves spoken into a microphone through a digital to analog converter. The analog or continuous sound wave that we produce when we say a word is slice
69、d up into small time fragments. These fragments are then measured based on their amplitude levels, the level of compression of air released from a p</p><p> Nyquist-Shannon TheoremThe Nyquist –Shannon theo
70、rem was developed in 1928 to show that a given analog frequency is most accurately recreated by a digital frequency that is twice the original analog frequency. Nyquist proved this was true because an audible frequency m
71、ust be sampled once for compression and once for rarefaction. For example, a 20 kHz audio signal can be accurately represented as a digital sample at 44.1 kHz.</p><p> How it WorksCommonly speech recogniti
72、on programs use statistical models to account for variations in dialect, accent, background noise, and pronunciation. These models have progressed to such an extent that in a quiet environment accuracy of over 90% can be
73、 achieved. While every company has their own proprietary technology for the way a spoken input is processed there exists 4 common themes about how speech is recognized. </p><p> 1. Template-Based: This mode
74、l uses a database of speech patterns built into the program. After receiving voice input into the system recognition occurs by matching the input to the database. To do this the program uses Dynamic Programming algorithm
75、s. The downfall of this type of speech recognition is the inability for the recognition model to be flexible enough to understand voice patterns unlike those in the database. </p><p> 2. Knowledge-Based: Kn
76、owledge-based speech recognition analyzes the spectrograms of the speech to gather data and create rules that return values equaling what commands or words the user said. Knowledge-Based recognition does not make use of
77、linguistic or phonetic knowledge about speech. </p><p> 3. Stochastic: Stochastic speech recognition is the most common today. Stochastic methods of voice analysis make use of probability models to model th
78、e uncertainty of the spoken input. The most popular probability model is use of HMM (Hidden Markov Model) is shown below. </p><p> Yt is the observed acoustic data, p(W) is the a-priori probability of a par
79、ticular word string, p(Yt|W) is the probability of the observed acoustic data given the acoustic models, and W is the hypothesised word string. When analyzing the spoken input the HMM has proven to be successful because
80、the algorithm takes into account a language model, an acoustic model of how humans speak, and a lexicon of known words. </p><p> 4. Connectionist: With Connectionist speech recognition knowledge about a spo
81、ken input is gained by analyzing the input and storing it in a variety of ways from simple multi-layer perceptrons to time delay neural nets to recurrent neural nets. </p><p> As stated above, programs that
82、 utilize stochastic models to analyze spoken language are most common today and have proven to be the most successful. </p><p> Recognizing CommandsThe most important goal of current speech recognition sof
83、tware is to recognize commands. This increases the functionality of speech software. Software such as Microsost Sync is built into many new vehicles, supposedly allowing users to access all of the car’s electronic access
84、ories, hands-free. This software is adaptive. It asks the user a series of questions and utilizes the pronunciation of commonly used words to derive speech constants. These constants are then factored i</p><p&
85、gt; DictationSecond to command recognition is dictation. Today's market sees value in dictation software as discussed below in transcription of medical records, or papers for students, and as a more productive way
86、to get one's thoughts down a written word. In addition many companies see value in dictation for the process of translation, in that users could have their words translated for written letters, or translated so the u
87、ser could then say the word back to another party in their native languag</p><p> Errors in Interpreting the Spoken WordAs speech recognition programs process your spoken words their success rate is based
88、on their ability to minimize errors. The scale on which they can do this is called Single Word Error Rate (SWER) and Command Success Rate (CSR). A Single Word Error is simply put, a misunderstanding of one word in a spok
89、en sentence. While SWERs can be found in Command Recognition Programs, they are most commonly found in dictation software. Command Success Rate is defined b</p><p><b> Business </b></p>&
90、lt;p> Major Speech Technology Companies </p><p> As the speech technology industry grows, more companies emerge into this field bring with them new products and ideas. Some of the leaders in voice recog
91、nition technologies (but by no means all of them) are listed below. </p><p> NICE Systems (NASDAQ: NICE and Tel Aviv: Nice), headquartered in Israel and founded in 1986, specialize in digital recording and
92、archiving technologies. In 2007 they made $523 million in revenue in 2007. For more information visit http://www.nice.com. </p><p> Verint Systems Inc.(OTC:VRNT), headquartered in Melville, New York and fou
93、nded in 1994 self-define themselves as “A leading provider of actionable intelligence solutions for workforce optimization, IP video, communications interception, and public safety.”[9] For more information visit http://
94、verint.com. </p><p> Nuance (NASDAQ: NUAN) headquartered in Burlington, develops speech and image technologies for business and customer service uses. For more information visit http://www.nuance.com/. <
95、/p><p> Vlingo, headquartered in Cambridge, MA, develops speech recognition technology that interfaces with wireless/mobile technologies. Vlingo has recently teamed up with Yahoo! providing the speech recognit
96、ion technology for Yahoo!’s mobile search service, oneSearch. For more information visit http://vlingo.com </p><p> Other major companies involved in Speech Technologies include: Unisys, ChaCha, SpeechCycle
97、, Sensory, Microsoft's Tellme, Klausner Technologies and many more. </p><p> Patent Infringement Lawsuits </p><p> Given the highly competitive nature of both business and technology, it i
98、s not surprising that there have been numerous patent infringement lawsuits brought by various speech companies. Each element involved in developing a speech recognition device can be claimed as a separate technology, an
99、d hence patented as such. Use of a technology, even if it is independently developed, that is patented by another company or individual is liable to monetary compensation and often results in injunctions pre</p>&
100、lt;p> The Future of Speech Recognition </p><p> Future Trends & Applications </p><p> The Medical Industry </p><p> For years the medical industry has been touting electr
101、onic medical records (EMR). Unfortunately the industry has been slow to adopt EMRs and some companies are betting that the reason is because of data entry. There isn’t enough people to enter the multitude of current pati
102、ent’s data into electronic format and because of that the paper record prevails. A company called Nuance (also featured in other areas here, and developer of the software called Dragon Dictate) is betting that they can f
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 語音識別外文文獻翻譯
- 語音識別的翻譯
- 語音識別的綜述【文獻綜述】
- [雙語翻譯]語音識別外文翻譯--自動語音識別錯誤檢測與糾正綜述
- 基于語音識別和語音播報設(shè)計綜述【文獻綜述】
- [雙語翻譯]語音識別外文翻譯--自動語音識別錯誤檢測與糾正綜述(英文)
- 2018年語音識別外文翻譯--自動語音識別錯誤檢測與糾正綜述
- 2018年語音識別外文翻譯--自動語音識別錯誤檢測與糾正綜述.DOCX
- 2018年語音識別外文翻譯--自動語音識別錯誤檢測與糾正綜述(英文).PDF
- 語音識別
- 語音識別技術(shù)
- 外文翻譯--基于語音識別的智能門控系統(tǒng)設(shè)計
- 機器人語音識別算法的研究外文翻譯
- 外文翻譯--基于語音識別的智能門控系統(tǒng)設(shè)計
- 人臉識別文獻翻譯(中英文)
- 外文翻譯--基于語音識別的智能門控系統(tǒng)設(shè)計
- 外文翻譯--基于語音識別的智能門控系統(tǒng)設(shè)計(英文)
- 外文翻譯--基于語音識別的智能門控系統(tǒng)設(shè)計.doc
- 外文翻譯--基于語音識別的智能門控系統(tǒng)設(shè)計.doc
- 現(xiàn)代語音識別技術(shù)
評論
0/150
提交評論