Shazam : An Industrial Strength Audio Search Algorithm

Adapted from, hence credits to
https://github.com/bmoquist/Shazam
https://github.com/miishke/PyDataNYC2015

Init

cd Shazam-An-Industrial-Strength-Audio-Search-Algorithm-
. path.sh
# Make sure your ./data/*/wav.scp are tab-seperated

./utils/conf.py
```
class Shazam_conf():
...
self.threshold_short=10 # For comparing short sequences ~10s
self.threshold_long=300 # For comparing long sequences ~3min
self.th=self.threshold_short
```
By default, we assume we are matching short audio sequence against the database. If however, your audio is full song duration, set self.th=self.threshold_long to give less False Negatives.

Database : Hashing & Setting Up Look-Up Table (LUT)

Initial setting up of the database hashes

## Hashing the database, and storing them to a dictionary
python -u ./utils/Hashing.py -DB_type $DB_type &> $logdir/Hashing.log

Updating database hashes

## To add new data to the current database
python -u ./utils/Hashing_update.py \
    -DB_type $DB_type \
    -add "./data/database/wav_add.scp" \
    &> $logdir/Hashing_update.add.log

## To remove existing data from the current database
python -u ./utils/Hashing_update.py \
    -DB_type $DB_type \
    -delete "./data/database/wav_delete.scp" \
    &> $logdir/Hashing_update.delete.log

Queries : Hashing & Matching

Data (Splitting the queries to $nj jobs for parallel processing)

split --verbose -a $num_a -d --numeric-suffixes=1 -n l/$nj $qry_wavscp $splitdir/wav.scp.
    wc -l $qry_wavscp
    wc -l $splitdir/wav.scp.*

Matching (Queries)

# Set cont=True in path.sh if you are continuing from a prematurely stopped script
for jid in `seq -w $nj`; do
    python -u ./utils/Matching.py \
        -DB_type $DB_type \
        -Qry_type $Qry_type \
        -jid $jid \
        -cont $cont \
        &>> $logdir/Matching.$jid.log &
done
wait

# Do this if u want to observe what are some of the matches you have
grep " matched " $logdir/Matching.*

Combine

# This combines all the outputs of the split tasks together
python -u ./utils/Combine.py \
    -DB_type $DB_type \
    -Qry_type $Qry_type \
    -nj $nj \
    -zfill $num_a \
    &> $logdir/Combine.log
wc -l $qry2db $db2qry

GetStats

# Here we get the num_matches, for statistical overview of how good the matches are.
python -u ./utils/GetStats.py \
    -DB_type $DB_type \
    -Qry_type $Qry_type \
    &> $logdir/GetStats.log

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
utils		utils
wav/database		wav/database
README.md		README.md
path.sh		path.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

utils

utils

wav/database

wav/database

README.md

README.md

path.sh

path.sh

Repository files navigation

Shazam : An Industrial Strength Audio Search Algorithm

Init

Database : Hashing & Setting Up Look-Up Table (LUT)

Initial setting up of the database hashes

Updating database hashes

Queries : Hashing & Matching

Data (Splitting the queries to $nj jobs for parallel processing)

Matching (Queries)

Combine

GetStats

About

Releases

Packages

Languages

leonardltk/Shazam-An-Industrial-Strength-Audio-Search-Algorithm-

Folders and files

Latest commit

History

Repository files navigation

Shazam : An Industrial Strength Audio Search Algorithm

Init

Database : Hashing & Setting Up Look-Up Table (LUT)

Initial setting up of the database hashes

Updating database hashes

Queries : Hashing & Matching

Data (Splitting the queries to $nj jobs for parallel processing)

Matching (Queries)

Combine

GetStats

About

Topics

Resources

Stars

Watchers

Forks

Languages