spoken-digit-recognition

Classifying English spoken digit by Hidden Markov Model

Classifier

HMM - Hidden Markov Model

Feature Extractor

MFCC - Mel-frequency Cepstrum

Accuracy

0.98%

General Step

this is for curious guys and implement by themselves.

Downlaod dataset from Kaggle
Extract feature of each data with mfcc
Train Hmm states by hmmlearn
Predict test data
Evaluate the model

Hands on code

this is for lazy progrmmer and easy understand whole of project at one look.

Parsing data and extract feature from them in this way (0.20 % for test)

def build_dataset(sound_path='spoken_digit/'):
    files = sorted(os.listdir(sound_path))
    x_train = []
    y_train = []
    x_test = []
    y_test = []
    data = dict()
    i = 0

    for f in files:
        feature = feature_extractor(sound_path=sound_path + f)
        if i % 5 == 0:
            x_test.append(feature)
            y_test.append(int(f[0]))
        else:
            x_train.append(feature)
            y_train.append(f[0])
        i += 1

    for i in range(0, len(x_train), len(x_train) // 10):
        data[y_train[i]] = x_train[i:i + len(x_train) // 10]
    return x_train, y_train, x_test, y_test, data

x_train, y_train, x_test, y_test, data = build_dataset()

we give data to train hmm

def train_model(data):
    learned_hmm = dict()
    for label in data.keys():
        model = hmm.GMMHMM(n_components=14)
        feature = np.ndarray(shape=(1, 13))
        for list_feature in data[label]:
            feature = np.vstack((feature, list_feature))
        obj = model.fit(feature)
        learned_hmm[label] = obj
    return learned_hmm
    
learned_hmm = train_model(data)

Save learned hmm to pickle and Speed up the test phase (after first run comment this lines) :

with open("learned.pkl", "wb") as file:
     pickle.dump(learned_hmm, file)

clever guy can guess this step ;) -> read from pickle:

with open("learned.pkl", "rb") as file:
   learned_hmm = pickle.load(file)

prediction:

def prediction(test_data, trained):
    # predict list of test
    predict_label = []
    if type(test_data) == type([]):
        for test in test_data:
            scores = []
            for node in trained.keys():
                scores.append(trained[node].score(test))
            predict_label.append(scores.index(max(scores)))
    # predict a test
    else:
        scores = []
        for node in trained.keys():
            scores.append(trained[node].score(test_data))
        predict_label.append(scores.index(max(scores)))
    return predict_label
    

y_pred = prediction(x_test, learned_hmm)

Best part is evaluate our model:

def report(y_test, y_pred, show_cm=True):
    print("confusion_matrix:\n\n", confusion_matrix(y_test, y_pred))
    print("----------------------------------------------------------")
    print("----------------------------------------------------------\n")
    print("classification_report:\n\n", classification_report(y_test, y_pred))
    print("----------------------------------------------------------")
    print("----------------------------------------------------------\n")
    print("Accuracy:", accuracy_score(y_test, y_pred))
    print("----------------------------------------------------------")
    print("----------------------------------------------------------\n")
    if show_cm:
        plot_confusion_matrix(confusion_matrix(y_test, y_pred), [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

report(y_test, y_pred, show_cm=True)

if show_cm is True , i'll magically plot a cool confusion matrix for U <3.

Special thanks to my dear friend samadvalipour

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
spoken_digit		spoken_digit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cm.png		cm.png
learned.pkl		learned.pkl
logo.png		logo.png
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spoken_digit

spoken_digit

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

cm.png

cm.png

learned.pkl

learned.pkl

logo.png

logo.png

main.py

main.py

Repository files navigation

spoken-digit-recognition

Classifier

Feature Extractor

Accuracy

General Step

Hands on code

Parsing data and extract feature from them in this way (0.20 % for test)

we give data to train hmm

Save learned hmm to pickle and Speed up the test phase (after first run comment this lines) :

clever guy can guess this step ;) -> read from pickle:

prediction:

Best part is evaluate our model:

About

Releases

Packages

Languages

License

Ralireza/spoken-digit-recognition

Folders and files

Latest commit

History

Repository files navigation

spoken-digit-recognition

Classifier

Feature Extractor

Accuracy

General Step

Hands on code

Parsing data and extract feature from them in this way (0.20 % for test)

we give data to train hmm

Save learned hmm to pickle and Speed up the test phase (after first run comment this lines) :

clever guy can guess this step ;) -> read from pickle:

prediction:

Best part is evaluate our model:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages