Machine-Aided Spoken Language Evaluation: The Test Delivery Module

Brian Teaman, Osaka Jogakuin College

Steve McCarty, Osaka Jogakuin College

Takeshi Tamura, Kobe Institute of Computing

The Machine-Aided Spoken Language Evaluation (MASLE) system is being developed with three different modules: a test delivery system, a rater jukebox for human rating and an automatic speech recognition component that rates using a computer. This presentation will outline the test delivery module and demonstrate a working prototype.

The goal of the test delivery module is to provide a flexible interface for delivering prompts and collecting audio recordings of the test-takers spoken language over a LAN or the Internet. The system will provide prompts and then record the language that is spoken by the learner. In order to perform this there are four parts needed:

1. A program that will run over the web including prompts

2. An input database

3. Recording software

4. An output database

The program (1) will be written in some variety of HTML/XHTML with PHP (v. 4.3) as a control language. The program will run through a web browser and access the database and display a prompt and then record the speech.

The input database (2) will contain the following:

2.1. Display text, including test prompts as well as other meta prompts to direct the user with the test.

2.2. Supplemental media of any sort needed for the test including audio, graphics or video.

2.3. The name and location of the audio file to be saved.

With this database, the program will know what to display as a prompt for each item (whether it be text, audio, some visual object or some combination of the three). It will also know what to name the audio file that will be recorded for the output database.

The core module of the test is the recording software. For this, Javasonics, ListenUp will be used. At this point, the user is required to control the software, including starting the recording, stopping the recording and uploading the file. Ideally this will be automated as much as possible.

The output database (4) will have a structure that will contain information such as the following:

4.1 The speaker ID

4.2 The item ID

4.3 The audio data

This output data will be used later with other proposed modules.

            A session will then run like the following: After some preliminary screens displaying instructions and familiarizing the learner with the test, a program will display the first test prompt while instructing the user to perform the task so the users speech can be recorded and uploaded to the internet. After the item is stored away, the program goes on to the next item that will also be recorded. This repeats as many times as it is necessary to finish recording the speech for the current test.