voice-based personal assistant in python

Hi there, hope you are doing good. We can build our very own voice-based assistant using fundamental concepts of programming of loops and python programming language.

Our aim is to try building a simple assistant which does some simple tasks as and when we give our voice-based commands. We can increase and improve upon the complexity of the tasks based on the need and change as and when we want the assistant to do more complex tasks.

The flow of the app is very simple, first, we take voice input from the user, process it, find some trigger or pre-defined words and then fire up our tasks to be done by the assistant. The flow of the app is shown in the following outline diagram:

Diving into the details of the app let's first go with the packages used in our application. So the following packages will be used in our application(On a ubuntu machine):

Espeak - sudo apt-get install espeak
speech recognition - pip install SpeechRecognition

Espeak package is used in the conversion of text input to speech output. speech recognition package is used in the conversion of speech to text as the name suggests.

So the idea is simple, we use the basic concept of programming, and in specific we use loops, and while loops to pinpoint. The idea is to have two while loops, where the first while loop is to activate the assistant and the second one is to execute our the command. We would see in our homes and in advertisements about personal assistants like Alexa, Siri, Cortana, Google which gets triggered when they hear our voice with a trigger word like okay google, or Alexa!. So we can replicate the same idea here with the first while loop. The first while is an infinite loop which actually does the listening part to the user, to simplify our use case, let's not make it complex by adding a unique voice to make our assistant activate, rather we make it a generic one where anyone can trigger the assistant and get their simple tasks done. Also, we need to fix a key trigger word to activate our assistant. I personally used just "OKAY" as my triggered keyword to make things simple. So we are done with the first while loop, which helps us to identify the user's voice command and activate our assistant. We can nest another while loop inside the first while or have a method call that has an infinite loop and has the tasks to be done based on the trigger words.

We can draw a parallel to our day-to-day available devices says google assistant. Once we trigger the assistant then there will be a list of options to be done like read out what is there on my screen or tell me a joke or something else. If we observe we are in a loop where we get a list of options that will be performed by the assistant and after a task is done then the list appears and again we can select from the available tasks, the task to be performed by our assistant. Once we are done with the tasks we say thank you or exit the app. So the second while loop which is a nested loop inside the main loop covers these cases.

The flow of the program goes into the second while loop when the user triggers the assistant. The inner nested loop has all the functionalities covered. The inner nested is active till the termination condition is met/satisfied. The termination condition can be like when a trigger word of goodbye or bye is uttered by the user. Voice input is taken from the user inside the inner while loop till we have the termination condition is true. By this we mean like, the user gives their input with some keywords and the speech recognition module gives the trigger words and based on the trigger word perform the action. For simplicity, words like mail, time,, etc can be used as trigger words for triggering tasks of opening a web browser and opening Gmail(assuming the user is logged in already) or giving a voice-based current time, etc (and the list can go on and on.)

Let's see the implementation in python, which is self-explanatory.

'''
    This is a simple voice-based, assistant where we can use the basic 
    concepts of programming for building something fun.
'''

# Import the required packages
import speech_recognition as sr
import os
import time

def open_mail():
    url = "https://mail.google.com"
    webbrowser.open(url)

# Main part of our program


s_r = sr.Recognizer()  # create a speech recognition object.

os.system("espeak ' Hello BOSS. I am your personal assistant, how can i help you' - s 60")                             # Initial greet message from a personal assistant.

while True:     # first while loop for triggering the assistant on 
                       # and active and keep listening to our commands

   # listen for input from the user to trigger assistant.

    with sr.Microphone(device_index = 0,
                                     sample_rate = 48000,
                                     chunk_size = 1024) as source1:
        r.adjust_for_ambient_noise(source1)
        audio = r.listen(source1)
        print("Listened your command, please wait i'm processing it")    

    try:
        trigger_word = r.recognize_google(audio)
        print(trigger_word)

        # Termination condition for getting back to the first loop.

        if trigger_word == "buy" or trigger_word== "bye" or trigger_word== "end" :
            os.system("espeak 'Any time at your service' - s 60")
            os.system("espeak 'have a good day' - s 60")
            break

        # The nested loop, which has the functionality for performing different tasks.

        while trigger_word=="okay" or trigger_word == "ok":

            os.system("espeak 'triggered Voias, please give your command.' - s 60")        

            # Take the trigger word for performing the tasks.
            ir = sr.Recognizer()
            with sr.Microphone(device_index = 0,
                                              sample_rate = 48000, chunk_size = 2048) as source:
                ir.adjust_for_ambient_noise(source)        
                audio = ir.listen(source)
                print ("Listened, please wait I'm processing it......")

            res=ir.recognize_google(audio)

            os.system("espeak 'Listened please wait i'm processing it' - s 60")

            if res == "mail":
                os.system("espeak 'Opening the mail BOSS'")
                open_mail()

            if res=="thank you":
                os.system("espeak 'Thanks for using'")
                break

            re= "espeak "+ "'"+res+"'"+" -s 60"
            os.system(re)

    # Handle any exceptions if they occur.
    except sr.RequestError:
        os.system("espeak 'request error' -s 80")

    except sr.UnknownValueError as e:
        print(e)

        os.system("espeak 'unable to understant' ")

This is my first ever blog on hashnode. Thank you for taking your time out and reading. Any feedback, suggestions, questions, clarifications are welcomed. Thanks once again. Have a great day!....

Build your own voice-based personal assistant in python!!!

Using the fundamental concept of programming, we can build own voice assistant!