One of the most important features necessary to
develop
speech recognition and synthesis in a given language is a comprehensive
annotated speech corpus in the language.
At present there are two corpora being buit. One is
made
up of a number of Maltese texts based on newspapers, news items and
books.
This is available from Mike Rosner's home page at the department of
Computer
Science and AI.Maltese
Text Corpora
The other speech corpora available is made up of a
speech,
sentence, and phoneme database.
At present the phoneme part is still under
construction.
Eventually it is hoped that the annotation includes where in the
continuous
spoken sentences, the phonemes start and finish. This is research that
is taking place now in final year projects with my students. The corpus
is taken from read news items from radio malta. I am grateful to them
for
making the tapes avilable to us.
Each item is defined as a sentence in words, the
sentence
in phonemes, and the speech as a .wav file suitable for standard sound
cards on PC's.
Click here if you want to access the speech
corpus.