Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Icefall-based, Transducer ASR Model #2339

Open
wants to merge 6 commits into
base: dev_1.18.0
Choose a base branch
from

Conversation

HSTEHSTEHSTE
Copy link

This is a preliminary pull request, containing the ART-adaptation for Icefall-based ASR Models

@HSTEHSTEHSTE HSTEHSTEHSTE marked this pull request as draft November 28, 2023 16:25
@beat-buesser beat-buesser self-requested a review November 30, 2023 15:38
@beat-buesser beat-buesser self-assigned this Nov 30, 2023
@beat-buesser beat-buesser added the enhancement New feature or request label Nov 30, 2023
@beat-buesser
Copy link
Collaborator

Hi @HSTEHSTEHSTE Thank you for your pull request! Could you please target one of the dev branches, either dev_1.17.0 or dev_1.18.0?

@beat-buesser beat-buesser added this to the ART 1.18.0 milestone Nov 30, 2023
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.


| Repository link: https://github.com/k2-fsa/icefall/tree/master
"""
import ast

Check notice

Code scanning / CodeQL

Unused import

Import of 'ast' is not used.
| Repository link: https://github.com/k2-fsa/icefall/tree/master
"""
import ast
from argparse import Namespace

Check notice

Code scanning / CodeQL

Unused import

Import of 'Namespace' is not used.

import numpy as np

from art import config

Check notice

Code scanning / CodeQL

Unused import

Import of 'config' is not used.
features, _, _ = self.transform_model_input(x=masked_adv_input[sample_index])
x_lens = torch.tensor([features.shape[1]]).to(torch.int32).to(self.device)
y = k2.RaggedTensor(original_output[sample_index])
loss = self.transducer_model(x=features, x_lens=x_lens, y=y)

Check notice

Code scanning / CodeQL

Unused local variable

Variable loss is not used.
raise ValueError("This estimator does not support `postprocessing_defences`.")

# Set cpu/gpu device
self._device = torch.device("cpu")

Check warning

Code scanning / CodeQL

Overwriting attribute in super-class or sub-class

Assignment overwrites attribute _device, which was previously defined in superclass [PyTorchEstimator](1). Assignment overwrites attribute _device, which was previously defined in superclass [PyTorchEstimator](2).
isnan = torch.isnan(x[i])
nisnan = torch.sum(isnan).item()
if nisnan > 0:
logging.info("input isnan={}/{} {}".format(nisnan, x[i].shape, x[i][isnan], torch.max(torch.abs(x[i]))))

Check warning

Code scanning / CodeQL

Unused argument in a formatting call

Too many arguments for string format. Format ["input isnan={{}}/{{}} {{}}"](1) requires only 3, but 4 are provided.
import ast
from argparse import Namespace
import logging
from typing import Dict, List, Optional, Tuple, TYPE_CHECKING, Union

Check notice

Code scanning / CodeQL

Unused import

Import of 'Dict' is not used.
from art import config
from art.estimators.pytorch import PyTorchEstimator
from art.estimators.speech_recognition.speech_recognizer import SpeechRecognizerMixin, PytorchSpeechRecognizerMixin
from art.utils import get_file

Check notice

Code scanning / CodeQL

Unused import

Import of 'get_file' is not used.
vtln_warp: float = 1.0

params = asdict(FbankConfig())
params.update({"sample_frequency": 16000, "snip_edges": False, "num_mel_bins": 23})

Check failure

Code scanning / CodeQL

Modification of parameter with default

This expression mutates a [default value](1).

params = asdict(FbankConfig())
params.update({"sample_frequency": 16000, "snip_edges": False, "num_mel_bins": 23})
params["frame_shift"] *= 1000.0

Check failure

Code scanning / CodeQL

Modification of parameter with default

This expression mutates a [default value](1).
for line in lexicon_file:
if len(line.strip()) > 0: # and '<UNK>' not in line and '<s>' not in line and '</s>' not in line:
vocab_size += 1
except:

Check notice

Code scanning / CodeQL

Empty except

'except' clause does nothing but pass and there is no explanatory comment.
if len(line.strip()) > 0:
word2id[line.split()[0]] = id
id += 1
except:

Check notice

Code scanning / CodeQL

Empty except

'except' clause does nothing but pass and there is no explanatory comment.
for line in lexicon_file:
if len(line.strip()) > 0: # and '<UNK>' not in line and '<s>' not in line and '</s>' not in line:
vocab_size += 1
except:

Check notice

Code scanning / CodeQL

Except block handles 'BaseException'

Except block directly handles BaseException.
if len(line.strip()) > 0:
word2id[line.split()[0]] = id
id += 1
except:

Check notice

Code scanning / CodeQL

Except block handles 'BaseException'

Except block directly handles BaseException.
# Set cpu/gpu device
self._device = torch.device("cpu")
if torch.cuda.is_available():
self._device = torch.device("cuda", 0)

Check warning

Code scanning / CodeQL

Overwriting attribute in super-class or sub-class

Assignment overwrites attribute _device, which was previously defined in superclass [PyTorchEstimator](1). Assignment overwrites attribute _device, which was previously defined in superclass [PyTorchEstimator](2).
@HSTEHSTEHSTE HSTEHSTEHSTE changed the title Todo: Attacks against Icefall-based, Transducer ASR Models Icefall-based, Transducer ASR Model Jan 31, 2024
Copy link
Collaborator

@f4str f4str left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks for implementing the new ASR model. I left a few comments on things that should be addressed.

Comment on lines 113 to 116
# Set cpu/gpu device
self._device = torch.device("cpu")
if torch.cuda.is_available():
self._device = torch.device("cuda", 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already defined in the PyTorchEstimator superclass and is redundant. Hence the CodeQL warning. This segment should be removed

isnan = torch.isnan(x[i])
nisnan = torch.sum(isnan).item()
if nisnan > 0:
logging.info("input isnan={}/{} {}".format(nisnan, x[i].shape, x[i][isnan], torch.max(torch.abs(x[i]))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logging.info("input isnan={}/{} {}".format(nisnan, x[i].shape, x[i][isnan], torch.max(torch.abs(x[i]))))
logging.info(f"input isnan={nisnan}/{x[i][isnan]} {torch.max(torch.abs(x[i]))}")

Use f-strings to avoid confusion with string formatting

Comment on lines 220 to 245
@dataclass
class FbankConfig:
# Spectogram-related part
dither: float = 0.0
window_type: str = "povey"
# Note that frame_length and frame_shift will be converted to milliseconds before torchaudio/Kaldi sees them
frame_length: float = 0.025
frame_shift: float = 0.01
remove_dc_offset: bool = True
round_to_power_of_two: bool = True
energy_floor: float = 1e-10
min_duration: float = 0.0
preemphasis_coefficient: float = 0.97
raw_energy: bool = True

# Fbank-related part
low_freq: float = 20.0
high_freq: float = -400.0
num_mel_bins: int = 40
use_energy: bool = False
vtln_low: float = 100.0
vtln_high: float = -500.0
vtln_warp: float = 1.0

params = asdict(FbankConfig())
params.update({"sample_frequency": 16000, "snip_edges": False, "num_mel_bins": 23})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataclass seems a bit extraneous. What's the issue with just using a dictionary?

features, _, _ = self.transform_model_input(x=masked_adv_input[sample_index])
x_lens = torch.tensor([features.shape[1]]).to(torch.int32).to(self.device)
y = k2.RaggedTensor(original_output[sample_index])
loss = self.transducer_model(x=features, x_lens=x_lens, y=y)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the loss not also be returned?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially assumed so too but apprently not (if the other estimator implementations are anything to go by)

Comment on lines 40 to 55
sudo apt-get update
sudo apt-get -y -q install ffmpeg libavcodec-extra
python -m pip install --upgrade pip setuptools wheel
pip3 install -r requirements_test.txt
apt-get update \
&& apt-get install -y \
libgl1-mesa-glx \
libx11-xcb1 \
git \
gcc \
mono-mcs \
libavcodec-extra \
ffmpeg \
curl \
libsndfile-dev \
libsndfile1 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this is done as two apt install steps? Can't they be combined into one?

Comment on lines 267 to 269
indices = torch.LongTensor(indices)
num_frames = torch.IntTensor([num_frames[idx] for idx in indices])
start_frames = torch.zeros(len(x), dtype=torch.int)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
indices = torch.LongTensor(indices)
num_frames = torch.IntTensor([num_frames[idx] for idx in indices])
start_frames = torch.zeros(len(x), dtype=torch.int)
indices = torch.tensor(indices, dtype=torch.int64, device=self._device)
num_frames = torch.tensor([num_frames[idx] for idx in indices], dtype=torch.int32, device=self._device)
start_frames = torch.zeros(len(x), dtype=torch.int32, device=self._device)

Let's use the modern torch tensor instantiation on the proper device for optimal performance.

return self._input_shape # type: ignore

@property
def model(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def model(self):
def model(self) -> "torch.nn.Module":

Missing return type

contrib/lingvo-patched-decoder.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
ART 1.18.0
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

3 participants