Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to configure Julius in a way such that only phoneme recognition happens without tokenizing into words? #168

Open
j-j-kam opened this issue Jun 20, 2021 · 4 comments

Comments

@j-j-kam
Copy link

j-j-kam commented Jun 20, 2021

I'm working on an application and need to simply recognize a stream of sounds, even if they are not tokenized into words. I expect to get a list of all recognized phonemes. Is it possible with Julius?

@colbec
Copy link

colbec commented Jun 20, 2021

I think you will need the output from Julius running in server mode, that is, using the -module option. See this help page in particular -module and -outcode (phone sequence).

@Estalhun
Copy link

Yes, you can do it, because I did it with HTK and Julius. Try to picture a word dictionary wich contains one phoneme for one word, like vowels.

@j-j-kam
Copy link
Author

j-j-kam commented Jun 20, 2021

@Estalhun sounds promising. thanks. can you expand just a little bit?

@Estalhun
Copy link

Estalhun commented Jun 21, 2021

Something like this:

Sample "sententences" for training (Hungarian):

<s> T A N UU S II T V AA NY O K N A K T A R G O N<4> C A K E Z E L OEE !BREATH T A R T A L O M T AA R A KK A L !BREATH T A K SZ I S O F OEE R T E H N O L OO G I J<4> A T E H N O L OO G I A J<4> I !BREATH T E L E P H E JJ E L </s>
<s> T E R V E Z OEE A SZ T A L T OO L !BREATH T U L A J D O N UU T AA FF E L UE GY E L E T T EE R II T EE S M E N<4> T E S !BREATH T II Z M I LL I OO J<4> I T !BREATH T OE B R OEE L </s>
<s> T OE R T EE N EE S E J<4> I N E K !BREATH T UU L M E N OEE E N !BREATH V A D<2> NY U G A T O N !BREATH V A D AA SZ A TT OO L !BREATH V E H E SS E N E K !BREATH V E R S E NY<1> B E N !BREATH V E V OEE K I G !BREATH V I LL A M O S M UEE V E K V I L AA G M EE R E T UEE !BREATH V AA L A SZ I D E J E !BREATH V AA L A SZ I D OEE T </s>
<s> !BREATH V AA LL A L K O Z OO J<4> I V EE TT EE K H U SZ O N E GGY E D I K !BREATH Z AA SZ L OO S H A J OO J A K EE N<4> T AA T J AA R J A AA TT OE R EE S EE L E T UU T !BREATH EE R D E KK OE R B E OE K O SZ I SZ T EE M A OE K O SZ I SZ T EE M AA B A N OE N<3> M A G AA N A K !BREATH OE N<4> T OE R V EE NY UEE !BREATH </s>
<s> UE TY F EE L SZ O L G AA L A T O S !BREATH UE GY I N<4> T EE Z EE SS E L UE T E MM E L </s>

Dictionary:
!ANIMALS	[!ANIMALS]		!ANIMALS
!BREATH		[!BREATH]		!BREATH
!COUGH		[!COUGH]		!COUGH
!HNOISE		[!HNOISE]		!HNOISE
!MUSIC		[!MUSIC]		!MUSIC
!NOISE		[!NOISE]		!NOISE
!OOO		[!OOO]			!OOO
!PHONE		[!PHONE]		!PHONE
</s>		[]		sil
<s>		[]		sil
SENT-END        []      sil
SENT-START      []    sil
A		[A]		A
AA		[AA]		AA
E		[E]		E
EE		[EE]		EE
I		[I]		I
II		[II]		II
O		[O]		O
OO		[OO]		OO
OE		[OE]		OE
OEE		[OEE]		OEE
U		[U]		U
UU		[UU]		UU
UE		[UE]		UE
UEE		[UEE]		UEE
B		[B]		B
BB		[BB]		BB
P		[P]		P
PP		[PP]		PP
D		[D]		D
DD		[DD]		DD
D<2>		[D<2>]		D<2>
D<1>		[D<1>]		D<1>
T		[T]		T
TT		[TT]		TT
GY<1>		[GY<1>]		GY<1>
GY		[GY]		GY
GGY		[GGY]		GGY
T<2>		[T<2>]		T<2>
TY		[TY]		TY
TTY		[TTY]		TTY
G		[G]		G
GG		[GG]		GG
T<4>		[T<4>]		T<4>
K		[K]		K
KK		[KK]		KK
V		[V]		V
VV		[VV]		VV
F		[F]		F
FF		[FF]		FF
Z		[Z]		Z
ZZ		[ZZ]		ZZ
SZ		[SZ]		SZ
SSZ		[SSZ]		SSZ
SZS		[SZS]		SZS
ZS		[ZS]		ZS
ZZS		[ZZS]		ZZS
S		[S]		S
SS		[SS]		SS
H		[H]		H
HH		[HH]		HH
H<1>		[H<1>]		H<1>
J<1>		[J<1>]		J<1>
J<3>		[J<3>]		J<3>
J<4>		[J<4>]		J<4>
DZ		[DZ]		DZ
DDZ		[DDZ]		DDZ
C		[C]		C
CC		[CC]		CC
C<1>		[C<1>]		C<1>
DZS		[DZS]		DZS
DDZS		[DDZS]		DDZS
T<3>		[T<3>]		T<3>
CS		[CS]		CS
CCS		[CCS]		CCS
L		[L]		L
LL		[LL]		LL
L<1>		[L<1>]		L<1>
J		[J]		J
JJ		[JJ]		JJ
R		[R]		R
RR		[RR]		RR
R<3>		[R<3>]		R<3>
RR<3>		[RR<3>]		RR<3>
M		[M]		M
MM		[MM]		MM
M<4>		[M<4>]		M<4>
M<5>		[M<5>]		M<5>
N<3>		[N<3>]		N<3>
N		[N]		N
NN		[NN]		NN
N<4>		[N<4>]		N<4>
NY<1>		[NY<1>]		NY<1>
NY<3>		[NY<3>]		NY<3>
N<5>		[N<5>]		N<5>
NY		[NY]		NY
NNY		[NNY]		NNY

'
But phoneme based (monophones) recognition - without triphones and LM - is absolutely worst ever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants