凹みTips

C++、JavaScript、Unity、ガジェット等の Tips について雑多に書いています。

Open JTalk を組み込んでみた

はじめに

Open JTalk はバイナリでは配布されているのですが、自分のコードへ組み込むにはどうしたら良いかいまいち分かりませんでした。が、色々試してみてなんとか組み込めたので、その方法を紹介します。

環境

ソースから Open JTalk をコンパイルしてみる

ダウンロード

まず、Open JTalk の本家のサイトを訪れ、Open JTalk のソースコードをダウンロードします。2012/04/14 現在では version 1.05 でした。

$ wget http://downloads.sourceforge.net/open-jtalk/open_jtalk-1.05.tar.gz

しかし、これだけではコンパイルできません。というのも、libopenjtalk [ja.nishimotz.com] によると、Open JTalk は HTS Engine API のサンプリアプリ的な位置づけのものらしいからです。ということで、HTS Engine API も同じくダウンロードします。こちらは version 1.06 でした。

$ wget http://downloads.sourceforge.net/hts-engine/hts_engine_API-1.06.tar.gz
コンパイル

まずは、HTS Engine API の方をコンパイルします。

$ cd hts_engine_API-1.06
$ ./configure
$ make

これで bin と lib ができます。
次に Open JTalk の方をコンパイルします。

$ cd open_jtalk-1.05
$ ./configure \
--with-hts-engine-header-path=/home/hecomi/tmp/hts_engine_API-1.06/include \
--with-hts-engine-library-path=/home/hecomi/tmp/hts_engine_API-1.06/lib \
--with-charset=utf-8
$ make

(2012/04/15 utf-8 の設定を追加)
「/home/hecomi/tmp/」のとこは自分の環境に合わせて変えて下さい。
これで、bin/open_jtalk が出来ますので、Open JTalk で MMDAgent のメイちゃんに喋らせてみた - 凹みTips のように使えば喋ってくれます。

Open JTalk を組み込んでみる

どうやって組み込む?

と、コンパイルは出来たものの、じゃぁどうやって組み込むのさ?という話になると思います。そこで、open_jtalk の実行ファイルがどうやって出来たのか調べれば分かるんじゃね?となりまして、open_jtalk-1.05/bin/Makefile を覗いてみました。追ってみると、どうやら open_jtalk.c を次のようにコンパイルしている模様でした。

gcc open_jtalk.c \
-o open_jtalk \
-I ../text2mecab \
-I ../mecab/src \
-I ../mecab2njd \
-I ../njd \
-I ../njd_set_pronunciation \
-I ../njd_set_digit \
-I ../njd_set_accent_phrase \
-I ../njd_set_accent_type \
-I ../njd_set_unvoiced_vowel \
-I ../njd_set_long_vowel \
-I ../njd2jpcommon \
-I ../jpcommon \
-I /home/hecomi/tmp/hts_engine_API-1.06/include \
../text2mecab/libtext2mecab.a \
../mecab/src/libmecab.a \
../mecab2njd/libmecab2njd.a \
../njd/libnjd.a \
../njd_set_pronunciation/libnjd_set_pronunciation.a \
../njd_set_digit/libnjd_set_digit.a \
../njd_set_accent_phrase/libnjd_set_accent_phrase.a \
../njd_set_accent_type/libnjd_set_accent_type.a \
../njd_set_unvoiced_vowel/libnjd_set_unvoiced_vowel.a \
../njd_set_long_vowel/libnjd_set_long_vowel.a \
../njd2jpcommon/libnjd2jpcommon.a \
../jpcommon/libjpcommon.a \
/home/hecomi/tmp/hts_engine_API-1.06/lib/libHTSEngine.a -lstdc++

実際にこれを bin 内で実行してみるとコンパイルが通ります。なるほど、と言うことで open_jtalk.c をいじって自分仕立てにしたものをこんな感じでコンパイルすれば組み込めるわけです。

ディレクトリ構成

Open JTalk を使って喋らせることの出来るクラスを作って利用してみるサンプルを書いてみます。
次のようなディレクトリ構成にします。

work
 ├── Makefile
 ├── text_to_speech.hpp
 ├── text_to_speech.cpp
 ├── main.cpp
 └── openjtalk/
     ├── hts_engine_API-1.06/
     ├── open_jtalk-1.05/
     ├── open_jtalk_dic_utf_8-1.05/ (追加)
     ├── mei_normal/ (追加)
     └── hts_voice_nitech_jp_atr503_m001-1.04(mei_normal の代わりにこっちでもおk)

Open JTalk 用の辞書を Open JTalk 本家サイト(Open JTalk)からダウンロードして配置しておきます。
ボイスについては本家サイトのもの(hts_voice_nitech_jp_atr503_m001-1.04)でも OK ですが、今回は Open JTalk で MMDAgent のメイちゃんに喋らせてみた - 凹みTips で使用したメイちゃんの声(mei_normal)でパラメタ調整しました。パラメタは引数として与えられるように修正しても良いと思います。
それぞれのコードは次のような感じになります。長いですが。

text_to_speech.hpp
/* ----------------------------------------------------------------- */
/*           The Japanese TTS System "Open JTalk"                    */
/*           developed by HTS Working Group                          */
/*           http://open-jtalk.sourceforge.net/                      */
/* ----------------------------------------------------------------- */
/*                                                                   */
/*  Copyright (c) 2008-2011  Nagoya Institute of Technology          */
/*                           Department of Computer Science          */
/*                                                                   */
/* All rights reserved.                                              */
/*                                                                   */
/* Redistribution and use in source and binary forms, with or        */
/* without modification, are permitted provided that the following   */
/* conditions are met:                                               */
/*                                                                   */
/* - Redistributions of source code must retain the above copyright  */
/*   notice, this list of conditions and the following disclaimer.   */
/* - Redistributions in binary form must reproduce the above         */
/*   copyright notice, this list of conditions and the following     */
/*   disclaimer in the documentation and/or other materials provided */
/*   with the distribution.                                          */
/* - Neither the name of the HTS working group nor the names of its  */
/*   contributors may be used to endorse or promote products derived */
/*   from this software without specific prior written permission.   */
/*                                                                   */
/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND            */
/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,       */
/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF          */
/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE          */
/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */
/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,          */
/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED   */
/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,     */
/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */
/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,   */
/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY    */
/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE           */
/* POSSIBILITY OF SUCH DAMAGE.                                       */
/* ----------------------------------------------------------------- */

#ifndef INCLUDE_OPENJTALK_HPP
#define INCLUDE_OPENJTALK_HPP

#include <string>
#include <vector>

/* openjtalk header */
#include "mecab.h"
#include "njd.h"
#include "jpcommon.h"
#include "HTS_engine.h"

/* openal header */
#include <AL/alut.h>

/**
 * Open JTalkでTTSするクラス
 */
class TextToSpeech
{
public:
	/**
	 * OpenJTalkに渡すパラメータを生成
	 * @param[in] voice_dir	音素などが入ったディレクトリ
	 * @param[in] dic_dir	辞書入ったディレクトリ
	 */
	TextToSpeech(const std::string& voice_dir, const std::string& dic_dir);

	/**
	 * デストラクタ
	 */
	~TextToSpeech();

	/**
	 * 喋らせる
	 * @param[in] str	喋らせる文章
	 */
	void talk(const std::string& str);

	/**
	 * 喋りをストップする
	 */
	void stop();

private:
	/**
	 * Open JTalk に必要な情報をまとめておく構造体
	 */
	struct OpenJTalk {
		Mecab mecab;
		NJD njd;
		JPCommon jpcommon;
		HTS_Engine engine;
	} open_jtalk_;

	//! OpenJTalk を指定されたパラメタで初期化する(open_jtalk.c より抜粋)
	void initialize(
		int sampling_rate, int fperiod, double alpha, int stage, double beta, int audio_buff_size,
		double uv_threshold, HTS_Boolean use_log_gain, double gv_weight_mgc,
		double gv_weight_lf0, double gv_weight_lpf);

	//! 必要なファイル群をロードする(open_jtalk.c より抜粋)
	void load(
		char *dn_mecab, char *fn_ms_dur, char *fn_ts_dur,
		char *fn_ms_mgc, char *fn_ts_mgc, char **fn_ws_mgc, int num_ws_mgc,
		char *fn_ms_lf0, char *fn_ts_lf0, char **fn_ws_lf0, int num_ws_lf0,
		char *fn_ms_lpf, char *fn_ts_lpf, char **fn_ws_lpf, int num_ws_lpf,
		char *fn_ms_gvm, char *fn_ts_gvm, char *fn_ms_gvl, char *fn_ts_gvl,
		char *fn_ms_gvf, char *fn_ts_gvf, char *fn_gv_switch);

	//! 指定したテキストを喋る wav ファイルを作成する
	void synthesis(char *txt, FILE * wavfp);

	/**
	 * パラメタなどを整形して synthesis を実行する
	 * @param[in] sentence	喋らせる文章
	 */
	//!
	void make_wav(const std::string& sentence);

	//! wav ファイルを再生する
	void play_wav();

	//! wav ファイルを削除する
	void remove_wav() const;

	//! 出力wavファイル名
	const std::string wav_filename_;

	//! 再生中の wav
	ALuint wav_src_;
};

#endif // INCLUDE_OPENJTALK_HPP
text_to_speech.cpp
/* ----------------------------------------------------------------- */
/*           The Japanese TTS System "Open JTalk"                    */
/*           developed by HTS Working Group                          */
/*           http://open-jtalk.sourceforge.net/                      */
/* ----------------------------------------------------------------- */
/*                                                                   */
/*  Copyright (c) 2008-2011  Nagoya Institute of Technology          */
/*                           Department of Computer Science          */
/*                                                                   */
/* All rights reserved.                                              */
/*                                                                   */
/* Redistribution and use in source and binary forms, with or        */
/* without modification, are permitted provided that the following   */
/* conditions are met:                                               */
/*                                                                   */
/* - Redistributions of source code must retain the above copyright  */
/*   notice, this list of conditions and the following disclaimer.   */
/* - Redistributions in binary form must reproduce the above         */
/*   copyright notice, this list of conditions and the following     */
/*   disclaimer in the documentation and/or other materials provided */
/*   with the distribution.                                          */
/* - Neither the name of the HTS working group nor the names of its  */
/*   contributors may be used to endorse or promote products derived */
/*   from this software without specific prior written permission.   */
/*                                                                   */
/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND            */
/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,       */
/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF          */
/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE          */
/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */
/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,          */
/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED   */
/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,     */
/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */
/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,   */
/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY    */
/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE           */
/* POSSIBILITY OF SUCH DAMAGE.                                       */
/* ----------------------------------------------------------------- */

#include "text_to_speech.hpp"

#include "text2mecab.h"
#include "mecab2njd.h"
#include "njd_set_pronunciation.h"
#include "njd_set_digit.h"
#include "njd_set_accent_phrase.h"
#include "njd_set_accent_type.h"
#include "njd_set_unvoiced_vowel.h"
#include "njd_set_long_vowel.h"
#include "njd2jpcommon.h"

#include <sstream>
#include <iostream>

#include <boost/shared_ptr.hpp>

//! buffer size for text2mecab
const size_t MAXBUFLEN = 1024;

TextToSpeech::TextToSpeech(const std::string& voice_dir, const std::string& dic_dir)
	: wav_filename_("__tmp__.wav")
{
	// dictionary
	std::string s_dn_mecab = dic_dir;
	char *dn_mecab = &s_dn_mecab[0];

	// models
	std::string
		s_fn_ms_dur = voice_dir + "/dur.pdf",
		s_fn_ms_mgc = voice_dir + "/mgc.pdf",
		s_fn_ms_lf0 = voice_dir + "/lf0.pdf";
	char
		*fn_ms_dur = &s_fn_ms_dur[0],
		*fn_ms_mgc = &s_fn_ms_mgc[0],
		*fn_ms_lf0 = &s_fn_ms_lf0[0],
		*fn_ms_lpf = NULL;

	// trees
	std::string
		s_fn_ts_dur = voice_dir + "/tree-dur.inf",
		s_fn_ts_mgc = voice_dir + "/tree-mgc.inf",
		s_fn_ts_lf0 = voice_dir + "/tree-lf0.inf";
	char
		*fn_ts_dur = &s_fn_ts_dur[0],
		*fn_ts_mgc = &s_fn_ts_mgc[0],
		*fn_ts_lf0 = &s_fn_ts_lf0[0],
		*fn_ts_lpf = NULL;

	// windows
	int num_ws_mgc = 3, num_ws_lf0 = 3, num_ws_lpf = 0;
	std::string
		s_fn_ws_mgc0 = voice_dir + "/mgc.win1",
		s_fn_ws_mgc1 = voice_dir + "/mgc.win2",
		s_fn_ws_mgc2 = voice_dir + "/mgc.win3",
		s_fn_ws_lf00 = voice_dir + "/lf0.win1",
		s_fn_ws_lf01 = voice_dir + "/lf0.win2",
		s_fn_ws_lf02 = voice_dir + "/lf0.win3";
	char *fn_ws_mgc[] = {&s_fn_ws_mgc0[0], &s_fn_ws_mgc1[0], &s_fn_ws_mgc2[0]};
	char *fn_ws_lf0[] = {&s_fn_ws_lf00[0], &s_fn_ws_lf01[0], &s_fn_ws_lf02[0]};
	char **fn_ws_lpf = NULL;

	// global variance
	std::string
		s_fn_ms_gvm = voice_dir + "/gv-mgc.pdf",
		s_fn_ms_gvf = voice_dir + "/gv-lf0.pdf";
	char
		*fn_ms_gvm = &s_fn_ms_gvm[0],
		*fn_ms_gvf = &s_fn_ms_gvf[0],
		*fn_ms_gvl = NULL;

	// global variance trees
	std::string
		s_fn_ts_gvm = voice_dir + "/tree-gv-mgc.inf",
		s_fn_ts_gvf = voice_dir + "/tree-gv-lf0.inf";
	char
		*fn_ts_gvm = &s_fn_ts_gvm[0],
		*fn_ts_gvf = &s_fn_ts_gvf[0],
		*fn_ts_gvl = NULL;

	// file names of global variance switch
	std::string s_fn_gv_switch = voice_dir + "/gv-switch.inf";
	char* fn_gv_switch = &s_fn_gv_switch[0];

	// global parameter
	int    sampling_rate   = 48000;
	int    fperiod         = 240;
	double alpha           = 0.5;
	int    stage           = 0;
	double beta            = 0.8;
	int    audio_buff_size = 48000;
	double uv_threshold    = 0.5;
	double gv_weight_mgc   = 1.0;
	double gv_weight_lf0   = 1.0;
	double gv_weight_lpf   = 1.0;
	HTS_Boolean use_log_gain = FALSE;

	// initialize and load
	initialize(sampling_rate, fperiod, alpha, stage, beta,
		audio_buff_size, uv_threshold, use_log_gain, gv_weight_mgc,
		gv_weight_lf0, gv_weight_lpf);
	load(dn_mecab, fn_ms_dur, fn_ts_dur, fn_ms_mgc, fn_ts_mgc,
		fn_ws_mgc, num_ws_mgc, fn_ms_lf0, fn_ts_lf0, fn_ws_lf0, num_ws_lf0,
		fn_ms_lpf, fn_ts_lpf, fn_ws_lpf, num_ws_lpf, fn_ms_gvm, fn_ts_gvm,
		fn_ms_gvl, fn_ts_gvl, fn_ms_gvf, fn_ts_gvf, fn_gv_switch);
}

TextToSpeech::~TextToSpeech()
{
	Mecab_clear(&open_jtalk_.mecab);
	NJD_clear(&open_jtalk_.njd);
	JPCommon_clear(&open_jtalk_.jpcommon);
	HTS_Engine_clear(&open_jtalk_.engine);
}

void TextToSpeech::initialize(
	int sampling_rate, int fperiod, double alpha, int stage, double beta, int audio_buff_size,
	double uv_threshold, HTS_Boolean use_log_gain, double gv_weight_mgc,
	double gv_weight_lf0, double gv_weight_lpf)
{
	Mecab_initialize(&open_jtalk_.mecab);
	NJD_initialize(&open_jtalk_.njd);
	JPCommon_initialize(&open_jtalk_.jpcommon);
	HTS_Engine_initialize(&open_jtalk_.engine, 2);
	HTS_Engine_set_sampling_rate(&open_jtalk_.engine, sampling_rate);
	HTS_Engine_set_fperiod(&open_jtalk_.engine, fperiod);
	HTS_Engine_set_alpha(&open_jtalk_.engine, alpha);
	HTS_Engine_set_gamma(&open_jtalk_.engine, stage);
	HTS_Engine_set_log_gain(&open_jtalk_.engine, use_log_gain);
	HTS_Engine_set_beta(&open_jtalk_.engine, beta);
	HTS_Engine_set_audio_buff_size(&open_jtalk_.engine, audio_buff_size);
	HTS_Engine_set_msd_threshold(&open_jtalk_.engine, 1, uv_threshold);
	HTS_Engine_set_gv_weight(&open_jtalk_.engine, 0, gv_weight_mgc);
	HTS_Engine_set_gv_weight(&open_jtalk_.engine, 1, gv_weight_lf0);
}

void TextToSpeech::load(
	char *dn_mecab, char *fn_ms_dur, char *fn_ts_dur,
	char *fn_ms_mgc, char *fn_ts_mgc, char **fn_ws_mgc, int num_ws_mgc,
	char *fn_ms_lf0, char *fn_ts_lf0, char **fn_ws_lf0, int num_ws_lf0,
	char *fn_ms_lpf, char *fn_ts_lpf, char **fn_ws_lpf, int num_ws_lpf,
	char *fn_ms_gvm, char *fn_ts_gvm, char *fn_ms_gvl, char *fn_ts_gvl,
	char *fn_ms_gvf, char *fn_ts_gvf, char *fn_gv_switch)
{
	Mecab_load(&open_jtalk_.mecab, dn_mecab);
	HTS_Engine_load_duration_from_fn(&open_jtalk_.engine, &fn_ms_dur, &fn_ts_dur, 1);
	HTS_Engine_load_parameter_from_fn(&open_jtalk_.engine, &fn_ms_mgc, &fn_ts_mgc, fn_ws_mgc, 0, FALSE, num_ws_mgc, 1);
	HTS_Engine_load_parameter_from_fn(&open_jtalk_.engine, &fn_ms_lf0, &fn_ts_lf0, fn_ws_lf0, 1, TRUE, num_ws_lf0, 1);
	if (HTS_Engine_get_nstream(&open_jtalk_.engine) == 3)
		HTS_Engine_load_parameter_from_fn(&open_jtalk_.engine, &fn_ms_lpf, &fn_ts_lpf, fn_ws_lpf, 2, FALSE, num_ws_lpf, 1);
	if (fn_ms_gvm != NULL) {
		if (fn_ts_gvm != NULL)
			HTS_Engine_load_gv_from_fn(&open_jtalk_.engine, &fn_ms_gvm, &fn_ts_gvm, 0, 1);
		else
			HTS_Engine_load_gv_from_fn(&open_jtalk_.engine, &fn_ms_gvm, NULL, 0, 1);
	}
	if (fn_ms_gvl != NULL) {
		if (fn_ts_gvl != NULL)
			HTS_Engine_load_gv_from_fn(&open_jtalk_.engine, &fn_ms_gvl, &fn_ts_gvl, 1, 1);
		else
			HTS_Engine_load_gv_from_fn(&open_jtalk_.engine, &fn_ms_gvl, NULL, 1, 1);
	}
	if (HTS_Engine_get_nstream(&open_jtalk_.engine) == 3 && fn_ms_gvf != NULL) {
		if (fn_ts_gvf != NULL)
			HTS_Engine_load_gv_from_fn(&open_jtalk_.engine, &fn_ms_gvf, &fn_ts_gvf, 2, 1);
		else
			HTS_Engine_load_gv_from_fn(&open_jtalk_.engine, &fn_ms_gvf, NULL, 2, 1);
	}
	if (fn_gv_switch != NULL)
		HTS_Engine_load_gv_switch_from_fn(&open_jtalk_.engine, fn_gv_switch);
}

void TextToSpeech::synthesis(char *txt, FILE * wavfp)
{
	char buff[MAXBUFLEN];

	text2mecab(buff, txt);
	Mecab_analysis(&open_jtalk_.mecab, buff);
	mecab2njd(&open_jtalk_.njd, Mecab_get_feature(&open_jtalk_.mecab), Mecab_get_size(&open_jtalk_.mecab));
	njd_set_pronunciation(&open_jtalk_.njd);
	njd_set_digit(&open_jtalk_.njd);
	njd_set_accent_phrase(&open_jtalk_.njd);
	njd_set_accent_type(&open_jtalk_.njd);
	njd_set_unvoiced_vowel(&open_jtalk_.njd);
	njd_set_long_vowel(&open_jtalk_.njd);
	njd2jpcommon(&open_jtalk_.jpcommon, &open_jtalk_.njd);
	JPCommon_make_label(&open_jtalk_.jpcommon);
	if (JPCommon_get_label_size(&open_jtalk_.jpcommon) > 2) {
		HTS_Engine_load_label_from_string_list(
			&open_jtalk_.engine,
			JPCommon_get_label_feature(&open_jtalk_.jpcommon),
			JPCommon_get_label_size(&open_jtalk_.jpcommon)
		);
		HTS_Engine_create_sstream(&open_jtalk_.engine);
		HTS_Engine_create_pstream(&open_jtalk_.engine);
		HTS_Engine_create_gstream(&open_jtalk_.engine);
		if (wavfp != NULL)
			HTS_Engine_save_riff(&open_jtalk_.engine, wavfp);
		HTS_Engine_refresh(&open_jtalk_.engine);
	}
	JPCommon_refresh(&open_jtalk_.jpcommon);
	NJD_refresh(&open_jtalk_.njd);
	Mecab_refresh(&open_jtalk_.mecab);
}

void TextToSpeech::make_wav(const std::string& sentence)
{
	// 喋らせる言葉
	std::string s_talk_str = sentence;
	char *talk_str = &s_talk_str[0];

	// wav を用意
	FILE *wavfp = fopen(wav_filename_.c_str(), "wb");
	if (wavfp == NULL) {
		fprintf(stderr, "ERROR: Getfp() in open_jtalk.c: Cannot open %s.\n", wav_filename_.c_str());
		return;
	}

	// wav を作成
	synthesis(&talk_str[0], wavfp);

	// wav を閉じる
	fclose(wavfp);
}

void TextToSpeech::play_wav()
{
	// alutの初期化
	int alut_argc = 0;
	char* alut_argv[] = {};
	alutInit(&alut_argc, alut_argv);

	// ソースの用意
	ALuint buf;
	ALenum state;
	buf = alutCreateBufferFromFile(wav_filename_.c_str());
	alGenSources(1, &wav_src_);
	alSourcei(wav_src_, AL_BUFFER, buf);

	// 再生
	alSourcePlay(wav_src_);
	alGetSourcei(wav_src_, AL_SOURCE_STATE, &state);
	while (state == AL_PLAYING) {
		alGetSourcei(wav_src_, AL_SOURCE_STATE, &state);
	}

	// 後片付け
	alDeleteSources(1, &wav_src_);
	alDeleteBuffers(1, &buf);
	alutExit();
}

void TextToSpeech::stop()
{
	alSourceStop(wav_src_);
}

void TextToSpeech::remove_wav() const
{
	remove( wav_filename_.c_str() );
}

void TextToSpeech::talk(const std::string& str)
{
	std::cout << str << std::endl;
	make_wav(str);
	play_wav();
	remove_wav();
}
main.cpp
#include "text_to_speech.hpp"

int main(int argc, char const* argv[])
{
	TextToSpeech tts(
		"openjtalk/mei_normal",
		"openjtalk/open_jtalk_dic_utf_8-1.05"
	);
	tts.talk("ハローワールド!");
	return 0;
}
Makefile
LOPENJTALK = ./openjtalk/open_jtalk-1.05/text2mecab/libtext2mecab.a ./openjtalk/open_jtalk-1.05/mecab/src/libmecab.a ./openjtalk/open_jtalk-1.05/mecab2njd/libmecab2njd.a ./openjtalk/open_jtalk-1.05/njd/libnjd.a ./openjtalk/open_jtalk-1.05/njd_set_pronunciation/libnjd_set_pronunciation.a ./openjtalk/open_jtalk-1.05/njd_set_digit/libnjd_set_digit.a ./openjtalk/open_jtalk-1.05/njd_set_accent_phrase/libnjd_set_accent_phrase.a ./openjtalk/open_jtalk-1.05/njd_set_accent_type/libnjd_set_accent_type.a ./openjtalk/open_jtalk-1.05/njd_set_unvoiced_vowel/libnjd_set_unvoiced_vowel.a ./openjtalk/open_jtalk-1.05/njd_set_long_vowel/libnjd_set_long_vowel.a ./openjtalk/open_jtalk-1.05/njd2jpcommon/libnjd2jpcommon.a ./openjtalk/open_jtalk-1.05/jpcommon/libjpcommon.a /home/hecomi/Program/cpp/HAS/openjtalk/hts_engine_API-1.06/lib/libHTSEngine.a
IOPENJTALK = -DHAVE_CONFIG_H -I./openjtalk/open_jtalk-1.05/ -I./openjtalk/open_jtalk-1.05/mecab -I./openjtalk/open_jtalk-1.05/text2mecab -I./openjtalk/open_jtalk-1.05/mecab/src -I./openjtalk/open_jtalk-1.05/mecab2njd -I./openjtalk/open_jtalk-1.05/njd -I./openjtalk/open_jtalk-1.05/njd_set_pronunciation -I./openjtalk/open_jtalk-1.05/njd_set_digit -I./openjtalk/open_jtalk-1.05/njd_set_accent_phrase -I./openjtalk/open_jtalk-1.05/njd_set_accent_type -I./openjtalk/open_jtalk-1.05/njd_set_unvoiced_vowel -I./openjtalk/open_jtalk-1.05/njd_set_long_vowel -I./openjtalk/open_jtalk-1.05/njd2jpcommon -I./openjtalk/open_jtalk-1.05/jpcommon -I./openjtalkhome/hecomi/Program/cpp/HAS/openjtalk/hts_engine_API-1.06/include -finput-charset=UTF-8 -fexec-charset=UTF-8 -MT open_jtalk.o -MD -MP -MF ./openjtalk/open_jtalk-1.05/bin/.deps/open_jtalk.Tpo

CXX      = g++-4.6
CXXFLAGS = -g -O2 -Wall -std=c++0x
LDFLAGS  = -lalut

SOURCES  = $(shell ls *.cpp)
OBJECTS  = $(SOURCES:.cpp=.o)
TARGET   = tts

CLEAN    = $(RM) *.o *.bak *~ core TAGS

############################################################

all: $(TARGET)

$(TARGET): $(OBJECTS)
	$(CXX) $(CXXFLAGS) -o $(TARGET) $(OBJECTS) $(LDFLAGS) $(LOPENJTALK)

text_to_speech.o:
	$(CXX) $(CXXFLAGS) -c $(@:.o=.cpp) $(IOPENJTALK)

main.o:
	$(CXX) $(CXXFLAGS) -c $(@:.o=.cpp) $(IOPENJTALK)

clean:
	$(CLEAN)

distclean:
	$(CLEAN)
	$(RM) $(TARGET)

wav 再生部分はOpen JTalk で作った wav を OpenAL で再生する - 凹みTips を使ってます。コンパイル&実行で「はろーわーるど!」と元気に喋ってくれます。

$ make
$ ./tts
ハローワールド!

あとは速度を変えたり、ボイスデータを変えたりするメンバを追加すれば、動的にピッチやボイスを変更可能になります。

おわりに

MMD Agent に含まれるメイちゃんのボイスデータは感情別になっているので、これを利用すれば感情を変化させるような表現もできるんじゃないでしょうか?
あと、wav 作らないでそのまま再生とか出来ないですかね?