Images Scientific Instruments SR-06,SR-07 Speech Recognition Kit User Guide

September 21, 2024
Images Scientific Instruments

SR-06/SR-07
Speech Recognition Kit
Construction Manual & User Guide Images SI Inc.

109 Woods of Arden Road
Staten Island NY 10312
718.966.3694 voice
718.966.3695 fax
http://www.imagesco.com

Introduction:

The speech recognition kit is a complete easy to build programmable speech recognition circuit. Programmable, in the sense that you train the words (or vocal utterances) you want the circuit to recognize. This kit allows you to experiment with many facets of  speech recognition technology.

Features of the kit include:

Self-contained stand alone speech recognition circuit
User programmable
40 or 20 word vocabulary
Multi-lingual
Non-volatile memory back up
Easily interfaced to control external circuits & appliances
Speech recognition will become the method of choice for controlling appliances, toys, tools and computers. At its most basic level, speech controlled appliances and tools allow the user to perform parallel tasks (i.e. hands and eyes are busy elsewhere) while working with the tool or appliance.
The heart of the circuit is the HM2007 speech recognition IC.
The IC can recognize either 40 words, each word a length of .96 seconds or 20 words, each word a length of 1.92 seconds.

Applications

There are several areas for application of voice recognition technology.

  • Speech controlled appliances and toys
  • Speech assisted computer games
  • Speech assisted virtual reality
  • Telephone assistance systems
  • Voice recognition security
  • Speech to speech translation

Circuit Construction

The schematic for the SR-07 Speech Recognition Circuit is shown in figure 1. The SR-07 utilizes three separate printed circuit boards (pcb). The components are mounted on the top side of each pc board. The top sides of the boards have white silk screen component drawings. The components are soldered on the opposite side of the pc board. After soldering the component to the board any excess wire is clipped off.

Chip Installation

When installing integrated circuit (IC) chips, begin by first identifying the top of the chip. The top of the chip has a marker, many times it is a half circle cutout. Sometimes it is a small mark identifying pin 1 on the IC. In both cases the marks show us the top of  the IC chip. Orientated the top of IC chips with the white silk screen drawings of the components on the top of the pc board (usually a half circle cutout) or on the parts placement drawings and install the IC into their socket.
Display Board:
We start construction with the display board, see figure 2.
Mount and solder 16 (220 ohm) resistors, color bands red-redbrown gold or silver. Next solder two 14-pin sockets for the LED display ICs (U8 and U9). Install the LED displays into the sockets, the dots on the display
Finished Display Board chips face the bottom of the pc board. Next mount and solder the two 16-pin sockets for the 4511 IC’s (U4 and U5) making sure to orientate the sockets properly. Install the 4511 ICs in their appropriate IC sockets making sure to orientate the ICs properly. Just below U4 there are three solder pads in a row. Solder a jumper wire from the center pad to the right pad marked with a “C”. Finish the display board by mounting and soldering the 10-pin female header to the PC board.

Main Circuit Board:

The pc board layout of the main board is shown in figure 3. Begin construction of this board by mounting and soldering the three IC sockets. The HM2007 PLCC uses a 52 pin square socket identified on the pcb as U1. The 8K static RAM uses a 28 pin socket identified as U2. The 74LS373 uses a 20 pin socket identified as U3. Next mount and solder resistor R1, it has a nominal value of 100K. Its color bands brown, black, yellow, gold. Solder resistor R2, it has a nominal value of 6.8K. Its color bands are blue, grey, red and gold. Then solder resistor R3, it has a nominal value of 22K (color bands red, red, orange and gold). Next solder resistor R4, it has a nominal value of 330 ohms (color bands orange, orange, brown, gold).
Mount and solder diodes D1 and D2. Make sure the black band faces the correct direction as shown on the drawings. Next mount and solder the 3.57 MHz crystal. It is identified as XTAL on the parts placement drawing. Mount and solder the red LED next. The short lead of the LED should be aligned with the flat side of the silkscreen circle marked LED. Mount & solder capacitors C1 to C7. Capacitors C2 and C3 are small 22 pf capacitors. C5, C6 & C7 are .1 uF capacitors, C1 is 47 to 100 uF capacitor and C4 is .0047 uF capacitor. Please note C1 is identified as a 47 uF capacitor, but any value between 47 and 100 uF may be substituted in the kit.
Mount & solder the 7805 voltage regulator and on-off slide switch. Next mount and solder the microphone jack and button battery holder and 9-volt battery cap. Keep the wires on the 9volt battery cap short, about 1.5” long. Mount and solder the 10-pin right angle header in the upper left hand corner of the board identified as R1. Mount and solder the 7-pin right angle header in the lower left corner of the board,
SR-07 Main Circuit Board Mount and solder a 2-pin header in the WD location next to R4. Install the integrated circuits in their appropriate IC sockets making sure to orientate the IC’s properly.
Keypad:
The keypad is constructed using 12 normally open momentary contact switches. Place each switch in its mounting position and bend the leads inward to secure the switch to the PCB for soldering.  After mounting and soldering the 12 keypad switches to the top of the keypad PCB, connect the 7-pin female header to the bottom of the keypad pcb.
Non-Volatile Memory Back-up
The PC mounted coin battery holder holds a 2032 coin battery, which supplies backup for the SRAM. This allows the word patterns to be retained in memory when the main circuit is turned off.
Selecting Vocabulary Size and Word Length The default vocabulary and word configuration for the circuit is 40 words, each with a length of .96 seconds. If you wish to change this to 20 word with a length of 1.92 seconds place a jumper on the twopin WD header. If you do not need the 40 word vocabulary, it is suggested you configure the circuit for the 20 word vocabulary as this configuration usually provides a better recognition accuracy.
Using The Speech Recognition Circuit
The keypad and digital display are used to communicate with and program the HM2007 chip. Plug the digital display into the 10-pin header on the main circuit board. Plug the keypad into the 7-pin header on the main circuit board. Plug the headset microphone into the microphone jack. Adjust the microphone so that it is position about 1” away from your mouth.
Keypad Keypad Use:
The keypad is made up of 12 normally open momentary contact switches.

Clear Train
The CLR key equals Clear and the TRN key equal Train.
When the circuit is turned on, “00” is on the digital display, the red LED (READY) is lit and the circuit waits for a command.
Finished SR-07 Circui

Training Words for Recognition

To Train:
Press “1” (display will show “01” and the LED will turn off) on the keypad, then press the TRN key ( the LED will turn on) to place circuit in training mode, for word one.
Say the target word into the headset microphone clearly. The circuit signals acceptance of the voice input by blinking the LED off then on. The word (or utterance) is now identified as the “01” word. If the LED did not flash, start over by pressing “1” and then “TRN” key.
You may continue training new words in the circuit. Press “2” then TRN to train the second word and so on. The circuit will accept and recognize up to 40 words (numbers 1 through 40). It is not necessary to train all word spaces. If you only require 10 target words that’s all you need to train.

Testing Recognition:

Repeat a trained word into the microphone. The number of the word should be displayed on the digital display. For instance, if the word “directory” was trained as word number 25, saying the word “directory” into the microphone will cause the number 25 to be displayed.
Error Codes:
The chip provides the following error codes.
55 = word to long
66 = word to short
77 = no match
Clearing Memory
To erase all words in memory press “99” and then “CLR”. The numbers displayed are “19”, this is not an error. The numbers will quickly scroll by on the digital display as the memory is erased.
Changing & Erasing Words
Trained words can easily be changed by overwriting the original word. For instances suppose word six was the word “Capital” and you want to change it to the word “State”. Simply retrain  the word space by pressing “6” then the TRN key and saying the word “State” into the microphone.
If one wishes to erase the word without replacing it with another word press the word number (in this case six) then press the CLR key. Word six is now erased.

Simulated Independent Recognition

The speech recognition system is speaker dependant, meaning that the voice that trained the system has the highest recognition accuracy. But you can simulate independent speech recognition.
To make the recognition system simulate speaker independence one uses more than one word space for each target word. Set the SR-07 for a 40 word vocabulary. Now we use four word spaces per target word. Therefore we obtain four different enunciation’s of
each target word. (speaker independent)
The four word spaces are chosen to minimize software and hardware interfaces into the circuit. To accomplish this the four word spaces are chosen so that they all have the same Least Significant Digit (LSD). Doing this the words can be recognized by just decoding the least significant digit (number) on the digital display.
Using this procedure the word spaces 01, 11, 21 and 31 are allocated to the first target word. The Most Significant Digit (MSD) is dropped by the interfacing circuits. By only decoding only the LSD number, in this case 1 of “X1” (where X is any number) we can recognize the target word.
We continue do this for the remaining word space. For instance, the second target word will use the word spaces 02, 12, 22 and 32. We continue in this manner until all the words are programmed.
If you are experimenting with speaker independence use different people when training a target word. This will enable the system to recognize different voices, inflections and enunciation’s of the target word. The more system resources that  are allocated for independent recognition the more robust the circuit will become.  If you are experimenting with designing the most robust and accurate system possible, train target words using one voice with different inflections and enunciation’s of the target word.

Rhyming words

Rhyming words are words that sound alike. For instance the words cat, bat, sat and fat sound alike. Because of their like sounding nature they can confuse the speech recognition circuit.
When choosing target words for your system do not use rhyming words.

The Voice With Stress & Excitement

Stress and excitement alters ones voice. This affects the accuracy of the circuit’s recognition. For instance assume you are sitting at your workbench and you program the target words like fire, left, right, forward, etc., into the circuit. Then you use the circuit to control a flight simulator game, Doom or Duke Nukem. Well, when you’re playing the game you’ll likely be yelling “FIRE!… Fire!…FIRE!!…LEFT…go RIGHT!”. In the heat of the action you’re voice will sound much different than when you were sitting down relaxed and programming the circuit. To achieve a higher  accuracy word recognition one needs to mimic the excitement in ones voice when programming the circuit.
These factors should be kept in mind to achieve the high accuracy possible from the circuit. This becomes increasingly important when the speech recognition circuit is taken out of the lab and put to work in the outside world.

Interfacing The Circuit To The Outside World

The circuit design for interfacing the speech recognition system to the outside world controls ten switches. This design idea fits with the robust speech recognition system discussed previously.
While the effective vocabulary drops from forty to ten words, we gain in a more robust and accurate system.

Error Codes

The decoding circuit must recognize the word numbers from error codes. So the circuit must be designed to recognize error codes 55, 66 and 77 and not confuse them with word  spaces 5, 6 and 7. This is accomplished using an OR gate and a NAND gate  connected to the MSB. Whenever an error code is generated, the word number is ignored.

Interface Circuit

Figure 5 is the schematic for the interface circuit. The circuit connects to the 10 pin Right Angle interface header on the circuit board. This header is also used for the Digital Display board
The 4028 has ten output lines. Whatever number is displayed on the LSD the corresponding line number off the 4028 will be brought high. The high signal from the 4028 can be connected to a NPN transistor to control a DC load as shown in box A, or control an
AC or DC load using a simple relay as shown in box B.
The disadvantage to using a simple set up like this is that only one switch out of 10 may be turned on at any given time. This doesn’t make for a very good system, so a solution must be found that allows one to turn on or off any line without changing the status of any other line.
This can be implemented by inserting a flip-flop, shown in box C, between the 4028 and the NPN transistor. The 4013 IC contains two flip-flops, only one is shown in the box C drawing. The flipflop acts like simple memory. When the input line is brought high, its output line goes high, turning on the NPN transistor. When the output line is brought low, the output line still stays high. When the flip-flop receives a second high signal on its output line it brings the output low.
OK, lets see how this works in the real world. Lets assume you have the power to a printer connected through the speech recognition circuit and a 4013 controlled switch or relay. The target word is “printer”. When you want to turn the printer on you use the command word “printer”. The circuit recognizes the word and applies power to the printer. At this point you can also turn on (or off) any other circuit connected to the speech board, be- cause when the high signal goes low the 4013 keeps its output high. When you are ready to turn off the power to the printer all you have to do is repeat the command “printer”. The second time the line is brought high, the output of the 4013 is brought low.
The same command is used to turn the unit on and off. Any of the other lines can be turned on or off without affecting the status of the other output lines.

Voice Security System

This circuit isn’t designed for a voice security system in a commercial application, but that should not prevent anyone from experimenting with it for that purpose. A common approach is to use three or four keywords that must be spoken and recognized in sequence in order to open a lock or allow entry.

CPU Mode

The HM2007 speech recognition chip has a CPU mode to be used when connected to a host computer system or microcontroller. To interface the HM2007 to a host computer requires the writing of driver software as well as designing and building the hardware
interface to the computer data buss.

Aural Interfaces

It’s been found that mixing visual and aural information is not effective. Products that require visual confirmation of an aural command grossly reduces efficiency. To create an effective AUI products need to understand (recognize) commands given in an unstructured and efficient methods. The way in which people typically communicate verbally.

Learning To Listen

The ability to listen to one person speak among several at a party is beyond the capabilities of today’s speech recognition systems. Speech recognition systems can not (as of yet) separate and filter out what should be considered extraneous noise.
Speech recognition is not understanding speech. Understanding the meaning of words is a higher intellectual function. Because a circuit can respond to a vocal command doesn’t mean it understands the command spoken. In the future, voice recognition systems
may have the ability to distinguish nuances of speech and meanings of words, to “Do what I mean, not what I say!”

Speaker Dependent / Speaker Independent

Speech recognition is divided into two broad processing categories; speaker dependent and speaker independent.
Speaker dependent systems are trained by the individual who will be using the system. These systems are capable of achieving a high command count and better than 95% accuracy for word recognition. The drawback to this approach is that the system only responds accurately only to the individual who trained the system. This is the most common approach employed in software for personal computers.

Interface Circuit Speaker independent is a system trained to respond to a word regardless of who speaks. Therefore the system must respond to a large variety of speech patterns, inflections and enunciation’s of the target word. The command word count is usually lower than the speaker dependent however high accuracy can still be maintain within processing limits.  Industrial applications more often require speaker independent voice recognition systems.

Recognition Style

In addition to the speaker dependent/independent classification, speech recognition also contends with the style of speech it can recognize. They are three styles of speech: isolated, connected and continuous.
Isolated. Words are spoken separately or isolated. This is the most common speech recognition system available today. The user must pause between each word or command spoken.
Connected. This is a half way point between isolated word and continuous speech recognition. It permits users to speak multiple words. The HM2007 can be set up to identify words or phrases 1.92 seconds in length. This reduces the word recognition dictionary number to 20.
Continuous. This is the natural conversational speech we use to in everyday life. It is extremely difficult for a recognizer to sift through the sound as the words tend to merge together. For instance, “Hi, how are you doing?” to a computer sounds like “Hi,.howyadoin” Continuous speech recognition systems are on the market and are under continual development.

More On The HM2007 Chip

The HM2007 is a CMOS voice recognition LSI (Large Scale Integration) circuit. The chip contains an analog front end, voice analysis, regulation, and system control functions. The chip may be used in a stand alone or CPU connected.
Features:

  • Single chip voice recognition CMOS LSI
  • Speaker dependent
  • External RAM support
  • Maximum 40 word recognition (.96 second)
  • Maximum word length 1.92 seconds (20 word)
  • Microphone support
  • Manual and CPU modes available
  • Response time less than 300 milliseconds
  • 5V power supply

More information on the HM2007 chip is available in the HM2007 data booklet (DS-HM2007).

Parts List:

Placement Item Quantity
Keypad PCB 3 pieces
Push-button Switches 12
U4 HM2007 PLCC  52-pin socket 1
U1 7805 Voltage Regulator 1
U2 74LS373 1
U3 20-pin socket 1
U4 SRAM 8K X 8 1
U5 28-pin socket 1
U6 4511 2
U7 16-pin socke 2
U8 220ohm 1/8W Resistors 16
U9 7-Segment Displays 2
X1 14-pin socket 2
S1 XTAL 3.57 MHz 1
BT1 Toggle Switch 1
BT2 9V Battery Snap 1
R1 Coin Batter Holder 1
R2 100K 1/4W Resistor 1
R3 6.8K 1/4W Resistor 1
R4 22K 1/4W Resistor 1
C1 330ohm 1/4W Resistor 1
C2 100 uF Capacitor 1
C3 22 pF Capacitors 2
C4 .0047 uF Capacitor 1
C5  C6  C7 .01 uF Capacitors 3
D1  D2 1N914 diodes 2
D3 Red LED 1
P1 9V Battery Holder 1
PC mount microphone jack 1
P5 2-position header 1
Jumper 1
Headset Microphone 1
3V Coin Battery 1
2/56 Hex nuts, Screws & Lock washers 2 each
7-pin headers (male and female) 1 each
10-pin headers (male and female) 1 each

References

Read User Manual Online (PDF format)

Read User Manual Online (PDF format)  >>

Download This Manual (PDF format)

Download this manual  >>

Images Scientific Instruments User Manuals

Related Manuals