Course description: Introductory speech processing course, surveying speech analysis, speech recognition and speech synthesis. Students will develop familiarity with speech processing tools (PRAAT, HTK, Festival.)
Prerequisites: For CS graduate students, CSE 526 or permission of instructor. For graduate students outside CS, permission of instructor.
CSE542 is open to students (students in Computer Science, Psychology, Linguistics and Electrical Engineering are particularly welcome) and to advanced undergraduates (with permission of the instructor). Knowledge of C, Perl or Java will be of value in this course. Knowledge of phonetics and prosody will be of value in this course.
Speech is one of our primary modalities for communicating information and for social interaction. Most of us are surrounded by it from birth, and our bodies and brains contain specialized components for processing it.
The field of speech processing is growing rapidly. It includes electrical engineers, computer scientists, linguists, and psychologists. Research is going on at many companies and universities all around the world, and is presented at conferences such as ICASSP, ICSLP, Eurospeech, SPECOM and SpeechTEK.
The first goal of this course is to introduce you to the three core areas of speech processing: speech analysis, speech recognition and text-to-speech synthesis. We finish by looking at how these can be combined in speech-driven applications.
The second goal of this course is to improve your abilities as a research scientist; that is, to encourage your ability to think creatively, plan and conduct experiments, and make use of the literature. Throughout this course, you will be encouraged to bring your own ideas to bear on a range of problems, most of which you get to choose.
This course may include students from a variety of academic backgrounds. Build on your strengths, and be willing to share them with your fellow students.
CSE542 covers four main topics, each to be covered over a 3-week period:
The textbook for this course is: X. Huang, A. Acero, H. Hon and R. Reddy. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice-Hall, 2001. This textbook is used as a resource, not a course guide, so don't let its size intimidate you.
The textbook may be supplemented with 1-3 papers per section of the course, and by guest speakers.
The other resources for this course are:
This page contains links to other books and websites you may find interesting.
Evaluation is based on four projects, one per section of the course (18% of the final grade each), which are to be completed in small groups, and on a final project presentation in which each student presents one of the projects s/he participated in during the semester (18% of the final grade). There are occasional surprise quizzes in class or on Blackboard (10% of the final grade).
You are encouraged to ask questions and seek help in class and during office hours. Questions you might ask include:
If you have a physical, emotional or medical disability that may impact your ability to complete the course work or which requires extra time on examinations, please contact the Disabled Student Services office in the ECC Building (phone: 633-6748/9TTY). DSS will review your concerns and determine, with you, what accommodations are necessary and appropriate. All information and documentation of disability is confidential.
As a student at Stony Brook, you have agreed to follow the university's rules regarding academic honesty and appropriate conduct. You should read both the academic honesty information and procedures and the student code of conduct, which can be found in the student handbook.
Any academic dishonesty will be reported to the academic judiciary.