Psuedo voice: just an idea for a radical different approach to digital voice over limited channels.
One of the reasons I started this blog and gave this website its name is that I want to share an idea I have for HAM-radio (and other radio communication). Starting point is that on HF ‘voice’ or ‘phone’ needs a relatively large bandwidth compared to many digimodes. Sometimes you just need the excellent SNR-handling and/or narrow bandwidth of digital modes to get the message through. Some digimodes can do this. Voice not.
There are experiments to encode and send coded voice digitally (like freeDV), but that is still a bandwidth-hungry mode compared to -say- PSK31 or even MT63-500
Here my idea of ‘pseudo-voice’ steps in. Psuedo voice is (no surpise!) not real voice, but is in essence it is a series of code-labelled words. That’s why it has the potential to be used in narrow bandwidth channels. They basic idea is you just transmit short codes that represent whole words or even whole sentences. How? First let a computer algorith (AI??) convert speech to ‘words’ and then label each word (or standard sentence) with a unique and (as) short (as possible) code (a so called ‘label’). Then only transmit the sequence of labels. After receiving the labels, they can be converted back to words/sentences (and speech) by using a special look up table: a dictionary. This concept only works if both sides use with the same ‘label dictionary’. that associates the same labels with the same words.
For each type of communication you can develop a seperate tailor-made-dictionary. Like a dictionary for EMCOM, a RAG CHEWING dictionary, et cetera. It’s all about standardisation.
The basic idea behind pseudo voice and its use of dedicated dictionaries is that a meaningfull basic conversation in a normal language only needs a vocabulary of about thousand words (the so called ‘language level A1’). If the conversation is limited to just a few topics (like EMCOM) even less words are needed. Higher laguage levels need more words, complicated grammar and therefore larger dictionaries. By the way: a normal conversational speech speed al level A1 is about 120 words per minute, some of these words form standard sentences that can also have their own label.
Pseudo voice could work ‘like real time’ voice if we use a digimode that does 120 labels per minute. If a label constists of max 5 characters, we would need (netto) max 600 characters per minute = max 10 characters per second. Which small band digimodes can do that? A lot!
More and older/less matured musings:
So: what if each word (and regularly used sentences) has it’s own, short and unique digital ‘label’? And this label is -obviously- shorter than the word itself. You could pair words and labels in a specially written dictonary, that consists of carefully selected words and sentences. Lets say a ‘dictionary HAMQSO’, or a dictionary ‘EMCOM’. Then you could use a speech-to-label engine (STL-e, a variant on the speech-to-text engine) to code speech. And vice versa with a label-to-speech engine (LTSe) . A 120 wpm speech could than be converted to a much lower wpm datastream of ‘labels’.
For use in radio we could -instead of sending real voice- use a STL-engine loaded with a certain dictionary, and send a series of short labels (characters) using a narrow digimode. Say MT63-500 (speed: 5 cps). Labels receveived can be converted back to words (/sentences) and speech using a ‘label to speech’-engine also loaded with this dictionary. Hence ‘pseudo voice’.
NB The sound quality of the pseudo speech is independent from noise in the analog radio channel (as long as the labels are decoded properly): the sound quality depends solely on the quality of sound clips in the LTS-engine. It could be anything from low level computervoices to hifi-speech clips. There are many possibilities to optimise the system to ones needs. One could even have qso’s between diffent foreign languages as long as the labels refer to the same content-dictionary in your own language. ;-). You could use the voice you like (male/female) on the receiving end.
One could use pseudovoice in circumstances where normal HF voice communication is difficult or even impossible (like in situation with low power, compromise antenna’s, bad snr, bad propagation, QRM, etc) and where dedicated digital modes work still OK.
In short: in essence not that the voice itself is digitized and then transmitted, but the use of a dedicated dictionary that labels words (or whole standard sentences) and transmit corrsponding short codes. And vice versa.
For-pupose-dictionaries (aimed at specialized communications) could be developed; for HAM, Emergency, tactical engagements, etc.
Only sending labels (i.e. maximal shorted codes) would greatly reduce HF bandwidth needed (albeit at the cost of a limited dictionary). It doesn’t produce a real conversation, but it comes close (pseudo 😉 ) And it is better than nothing!
That’s why I called this webpage ‘pseudo-voice’.
By the way, I may have coined the idea in this blog, but that doesn’t mean I am building or developing it. I am not an engineer or programmer by any means. So I am not able to bring this idea to life in the real world. But it would be nice to hear from you what you think about this idea and even how to bring it forward in the HAM-community. Please post your comments!
Interesting idea. Consider perhaps sending the words over a digital text mode, including already established shortforms and abbreviations ( such as used in CW ) for further “compression” . PSK31 is about 60Hz BW, RTTY a bit more, but still somewhere between about 85Hz up to 1kHz max (depending on data rate chosen). Now you have your potential dictionary, and you could use speech to text transcription on the input side, and text to speech on the output side, should you want to maintain the pseudo voice..
For that matter, the speech to text translator doesn’t have to create full proper text from speech, it could instead encode the phenoms in such a way as to help the output side recreate the speech in the best way.
Or, with translation services, perhaps even make a babblefish… 🙂
Hi
An interesting idea.
Your assumptions appear largely based on English mother tongue speakers probably in your country’s accent. Consider the gap between Highland Scottish English and Caribbean English This would have more problems with other mother tongues speakers who have learnt English as an adult.
Adapting some existing speech to text software to be the I/P to your software to further reduce the data size of the speech for transmission.
Would you have the transmitted speech info presented to the receiving person as audio or as text on a screen? The latter would allow the human intelligence to correctly interpret errors.
( BCompSc and coursework masters in Ling. Did some basic work with Appen LTD but failed to get in as a developer. This was 15-20 years ago).
Hi, Pseudovoice is envisaged to work with labelled words in conjunction with a dedicated dictionary. The dictionary could have variants in different languages, like all dictionaries. Speaking-voice (i.e. on the transmitting side) can be done in the language of your choosing, with the dictionary in the language of your choosing. The dictionary just labels the words (or complete standarddised phrase). Only the relevant label (a short code) is transmitted.
On the receiving end short codes (labels) are received and entered in a dictionary of the exactly same group. I.e. this dictionary consists of exactly the same labels and associated the labels to the same words. But the words themselves may be in completely other language (the language the receiving person likes). Example: the speaker says ‘hello’. The speaker knows he is speaking UK Englich and has slected the dictionary of the group EMCOM/UK-English. This EMCOM dictionary has, say, the 500 relevant words for a basis conversation at the EMCOM group level. The word ‘hello’ is in the dictionary and therefore recognized by this dictionary, gets it’s associated label (for example 1#j). The transmitter sends 1#j. 1#j is received and entered in a dictionary of the same EMCOM group. Say EMCOM/Francais. The hearer has choosen this dictionary beacuse he likes communicate in French. The dictionary associates the received code 1#j with the word ‘bonjour’ and voices ‘bonjour’ to the listener.
It’s all about standardising labels/codes and dictionaries. The dictionaries themselves can be written in the languages you like and should contain the same words connected to the same labels. For just basic communication (limited vocabulary) this would suffice to get a limited but meaningfull conversation. Even grammar could in principle be a part of the system, if it would help a meaningfull conversation.
Last but not least: off course you could also enter en display text as part of pseudovoice, but then it’s not voice anymore 😉
kind regards
Robert
Hullo Robert (I’m in Australia, where in the world are you)?
I think you are underestimating the lack of one to one meaning of words between languages. This is a constant issue for language teachers (I’m now a retired English language teacher). The other big thing is verb position. English has Subject Verb Object but many languages use Subject Object Verb. For example:
My car is at the Ford mechanic. English
My car at the mechanic Ford is. Kurdish.
I’m not discouraging you but am attempting to see that you can define the problem correctly. Old adage: the sooner you start coding the longer it takes.
I live in the Netherlands. On grammar and the different meaning of words: yes, language is a complex thing. That’s why I just at aim at dictionaries with only about 1000 words. Language level A1. Just basic communication. KIS!
The ‘dictionaries’ themselves should be prepared by specialists in their field, they know what terms are used and usefull for communication inside their community. The same goes if anyone would translate the group’s dictionary into another language.