ChatGPT is, ostensibly, a chatbot that depends on a big language mannequin (LLM) constructed round a synthetic neural community to generate textual content. However that underlying functionality has confirmed to be extraordinarily versatile and folks have been in a position to make use of ChatGPT for every part from writing code to fixing puzzles. It will possibly even act as a voice assistant, like a complicated model of Siri or Alexa. Its talents are nearly limitless if in case you have the talent to show it the right way to deal with the specified duties. Zoltan took benefit of that to flip a classic rotary telephone into a house assistant powered by ChatGPT.
That is an replace to a undertaking that Zoltan accomplished years in the past, wherein he put fundamental a voice assistant into an previous rotary telephone. When he picked up the handset, he may communicate a command and the assistant would reply—if it understood what was requested of it. However that wasn’t excellent at understanding or decoding speech. If the spoken command did not match the best format or wasn’t enunciated effectively, the assistant would not be capable of reply. Most of us have skilled related issues with Siri and different voice assistants. However ChatGPT is excellent at decoding pure language, so Zoltan upgraded his rotary telephone voice assistant to make use of that.
The one {hardware} within the telephone itself is a Grandstream HT801, which is a tool supposed to transform analog telephones into VoIP (Voice over Web Protocol) units. On this case, it turns audio picked up by the handset right into a digital audio stream and vice-versa. That audio feeds to a Raspberry Pi single-board pc positioned elsewhere in Zoltan’s residence. The Raspberry Pi then handles the entire communication with the assorted providers this undertaking requires.
First, it sends audio to OpenAI’s Whisper service, which gives speech recognition. The textual content generated by Whisper then goes to ChatGPT for interpretation. Whether it is one thing like a query, ChatGPT will return a solution within the type of textual content. That textual content is then handed AWS Polly, which handles the text-to-speech performance. Lastly, that audio goes again by way of the Grandstream HT801 to the handset’s speaker.
If, nonetheless, ChatGPT interprets the textual content as a command, corresponding to “activate the lights,” it’ll name a Python operate. It’s as much as Zoltan what to do with these capabilities. He can, as an example, play a track by way of the Spotify API (Utility Programming Interface). In idea, Zoltan can management something that has an accessible API. However that does require that he arrange the capabilities himself — ChatGPT cannot make these connections.
And, in fact, there’s the problem of accuracy in relation to factual info. ChatGPT is well-known for its “hallucinations” (learn: lies). It may also be outdated, if a solution comes from coaching information gathered previous to an occasion taking place. As Zoltan demonstrates, it was unaware of the demise of Queen Elizabeth II as a result of that occurred after its final coaching information replace.
Besides, it is a enjoyable undertaking and it in the end has extra potential than your typical shopper voice assistant.