Sub-Query Routing

systems consists of multiple captioned query engines
splits query into multiple sub-queries
sub-queries are routed to query engine with most similar caption
responses to sub-queries are later fused to answer query
multiple captioned query engines allow more fine grained distinction of information than jsut using metadata

German Typos

generated faulty versions of text with typical fault patterns
- deletion/transposition/insertion/substitution typos ("und" -> "nd", "udn", "undd", "umd")
- keyboard layout based ("n" is closer to "m" so it is more likely to be a substitute)
- phonetics based ("Maier" -> "Meier")
- ocr based ("OvvISB`" -> "0w!58'")
trained sentence classifier (original text vs typo text), used word level gradcam to deduct word predictions

Semi-Supervised Clustering (Cluster-Then-Label)

highly related to k-means, but with supervised guidance
given: few labeled data points, many unlabeled data points (from approx. same distribution)
perform clustering, align clusters to respect class labels, and predict according to the majority class per cluster
- objective function = mean distance to cluster center + $\alpha$ class impurity per cluster
1. initiate cluster centers
2. unsupervised: assign data points to closest cluster center
3. supervised: move labeled data point to other cluster, if that minimizes objective function
4. cluster_center = mean(cluster)
5. repeat 2. - 4. until no improvement in objective fnction is achieved

Exploit "bandwagon effect" with classifier guidance

bandwagon effect: LLM favors following (fictional) majority vote presented in prompt
improved tweet classification by adding guidance from XGBoost to exploit bandwagon effect
XGBoost trained on embedded tweets
also tested: setting a fixed fictional guidance to always favor positive/negative class doesn't lead to better recall/precision

Frameworks

optuna hyperparameter optimization
- optimizes trial score (validation loss, validation acc, ...) over $n$ trials, each trial is a run with a certain hyperparameter set
- narrows hyperparameter search space based on past trial scores (focuses on regions that lead to better scores)
- aborts unpromising trials via early stopping

Learning Rate Range Test

implementation of learning rate range test (lrrt)
stable algorithm for determining learning rates (and other hyperparameters) along a range of training batches
naive comparison between initial and last batch loss can fail to detect best lr due to variance in the batch losses

define a set of lr candidates
train from the same checkpoint on few batches with each lr candidate
fit a line through the batch losses for each lr candidate
return the lr candidate with the steepest negative line slope

Cotton Embeddings with TripletMarginLoss

implementation of a multihead resnet
classification head classifies cotton plants (healthy, powdy mildew, aphids, army worm, bacterial blight, target spot)
embedding head creates 2d latent space using the TripletMarginLoss on triplets of data points:
- $a$ = anchor (embedding of a data point)
- $p$ = positive (embedding of a data point of same class as $a$)
- $n$ = negative (embedding of a data point of different class as $a$)
- $L(a, p, n) = max(d(a, p) - d(a, n) + \alpha, 0)$ (with $d$ being a distance, \alpha a desired margin)
- learns to fulfill $d(a, p) + \alpha < d(a, n)$

hard triplet mining
- find $p$ so that $p$ is the most different embedding to $a$ of same class
- find $n$ so that $n$ is the most similar embedding to $a$ of different class

MIMO Classification

implementation of a MIMO (Multi-Input Multi-Output) Ensemble
- implicit ensemble that learns independent subnetworks within one neural network
- exploits network capacity
- M ensemble predictions with a single forward pass
- few more time and space complexity (less than 1%), but can converge in independent subnetworks with decorrelated errors/high disagreement
- M ensemble predictions allow uncertainty measure
- MIMO paper: https://openreview.net/pdf?id=OGg9XnKxFAH
cifar10 preprocessing for MIMO ensembles
my presentation slides about the MIMO paper
my seminar paper reviewing the MIMO paper

Monte Carlo Dropout

implementation of a monte carlo dropout CNN on MNIST
- drops out certain activations not only during training but also in inference
- multiple forward passes create ensemble predictions that can be averaged to increase the generalization ability
- mc dropout paper: https://arxiv.org/pdf/1506.02142.pdf
performed 10 runs to compare monte carlo ensembles of different size with a normal dropout baseline

Self Supervised Training

masked language modeling (mlm) with bert
- texts are split into tokens ((sub-) words)
- each token is masked with a certain probability (usually 15%)
- model "fills the gaps" with tokens (simple classification to check if predicted token is correct)
rotation detection with rezero-cnn
- images are rotated by 0, 90, 180, 270 degrees, model predicts respective class (4 class classification)
- detecting that a truck is rotated by 90 degrees demands basic knowledge about the concept "truck"

Adversarial Attacks

carlini wagner attack (targeted attack)
- target class $t$: flamingo
- change $x$ using gradient descent so that target probability is at least $\kappa$ bigger than second biggest probability
- makes $x$ and $x_0$ more similar to each other, if softmax output is of desired form
- carlini wagner criterion: $max(-\kappa, \underset{j\neq t}{max}(p_j)-p_t) + ||x-x_0||^2_2$

fast gradient sign method (untargeted attack)
- goal: create $x_{fgsm}$ that is close to $x$ and leads to misclassification
- $x_{fgsm}=x - sign(\frac{\partial f(x)_{y}}{\partial x})) \cdot \epsilon$
- $sign(\frac{\partial f(x)_{y}}{\partial x})):$ direction in which score for class $y$, increases
- strong perturbations can make $x_{fgsm}$ OOD and can lead to even higher class score, because gradient is only local approximation

Faster R-CNN Object Detection

detecting litter objects on forest floor
created data set
- made photos of forest floor
- most photos contain at least one litter object (plastic, metal, paper, glass)
- annotated litter objects with bounding boxes (corner coordinates)
- photos contain benign confounders, i.e. natural objects that are easily confused with litter (reflecting puddles, colorful blossoms and berries, ...)
- annotated data is available on https://www.kaggle.com/datasets/milankalkenings/litter-on-forest-floor-object-detection
fine tuned Faster R-CNN (pretrained on COCO)

Unsupervised Classification Support

semi supervised training with cross entropy and unsupervised support
unsupervised support: loss functions that can be calculated on unlabeled data points
stability loss: $\lambda d(f(x), f(x_{aug}))$ favors similar softmax outputs for $n$ augmented versions of same data point
- risk: trivial solution is to always predict the same vector
- also called consistency regularization, because it biases model towards similar softmax outputs and thus bigger training error)
mutual exclusivity loss: favors low-entropy softmax outputs
- leads to decision boundary through low-density regions in feature space
- prevents trivial solution for stability loss

Transfer Learning with U-Net

implementation of a (small) unet architecture
- U-Nets have two main components:
  - down: spatial resolution $\downarrow$, channel resolution $\uparrow$. Creates dense input representation
  - up: spatial resolution $\uparrow$, channel resolution $\downarrow$. Output is often of same (spatial) resolution as down-input.
- skip-connections (concatenation) between up and down blocks of same resolution improve gradient flow to early layers
pretraining of the down part with image classification using a classification head
fine tuning on image segmentation data in two stages:
1. adjusting upwards part with frozen pretrained downwards part
2. end-to-end fine tuning of the downwards part and the upwards part

Self Training

given: few labeled data points, many unlabeled data points (from approx. same distribution)
iteratively add semi-supervised labels to the unlabeled data points
1. train model on labeled training set
2. predict labels of unlabeled data points
3. add data point(s) with most confident prediction to labeled training set
4. repeat 1. to 3. until no improvement is achieved on validation data
possible in transductive setting (treat test datapoints as unlabeled training data points)

Autoencoder

training an autoencoder on MNIST and CIFAR100
- if autoencoder is trained to reconstruct instances of data set X, it is likely to achieve good results on reconstructing instances of data set Y, if X and Y are similar enough.
- pretraining an encoder within an autoencoder, and later using it for as a feature learner in a classifier can speed up the training process, because the encoder already learned how to extract general features in the given data
- a well trained autoencoder can be used to generate new data points that still contain the data signal, but add further noise (similar to data augmentation)
training a variational autoencoder on MNIST
- learns reparameterization
  - $encoding=\mu + (\sigma \epsilon), \epsilon \textasciitilde \mathcal{N}(0,1)$
  - $\mu=linear(encoding_{raw})$
  - $\sigma=exp(linear(encoding_{raw}))$
- visualization of PCA-reduced latent space

Feature Importance

fake news detection (feature engineering, random forest feature importance & selection
- feature engineering
  - number of words (title and body)
  - number of exclamation marks (title and body)
  - number of question marks (title and body)
  - lexical diversity (title and body)
  - $\frac{\text{number of title words}}{\text{number of title words} + \text{number of body words}}$
- random forest feature importance and respective feature selection
  - good resuls can already be achieved by using only 1 feature
  - further sklearn mechanics used (stacking ensemble, gridsearch)

co2 emission time series forecasting for rwanda based on chem-sensor data at varying locations
- discretization and one-hot encoding of numerical data
- catboost feature importance and respective feature selection
- optuna hyperparameter tuning

Reptile

reptile meta learning:
1. sample small batch of tasks from task set
2. for each task train copy of meta model on sampled task for few iterations
3. update meta model with average of parameter updates of copy models
4. repeat 1. - 3. until meta model performs well over all tasks
reptile pretraining improves few shot results on data and task similar enough to reptile tasks

Positional Encoding

visualization of positional encodings as used in Transformers

cam (+ smoothgrad + guided)

implementation of regular class activation maps (cam)
- cams show the importance of the individual input elements for the activation for the respective class
- cams are based on the gradient of a class activation w.r.t. the input $\frac{\partial f_\theta(x)_{class}}{\partial x}$
- cams can e.g. be used for
  - inferring weakly supervised labels (here: bounding boxes from classification labels)
  - model debugging
  - deducting model design decisions
implementation of smoothgrad
- better cams by averaging over gradients for $n$ noisy versions of the input
implementation of guided cam
- better cams by only propagating positive gradients back
gradually masking out one object decreases its class score

Token Classification For German PII Detection

semi-supervised versatile training data
- generated using text block augmented LLM prompts
- substrings are used to annotate text
allows detecting names in german natural language (probably not useful without grammatical structures)
- fine tuned bert
- evaluation set is formed as sampled collection of edge cases

Active Learning

pistachio type prediction, 4 initially labeled instances, model queries next label per
- baseline: label a random unlabeled instance
- min_confidence: label unlabeled instance that has lowest predicted max. class confidence (approximation of prediction entropy)
informative initially labeled instances are chosen as training data clusters centers (approx. represent training data best; exploration-paradigm)
min_confidence policy leads to higher val accuracy mean and smaller val accuracy std when informative initial labeled instances are used (otherwise even worse than random)

Multiple Instance Learning

weakly supervised method:
- data instances come in bags
- only the labels of few instances are known
- predictions are made on bag level
binary classification: bag is of positive class, if it contains a "2", else bag is of negative class
attention pooling can be used to deduct instance level predictions
- model pays high attention to instances of positive class (even to those that are not labeled)
mil-attention-pooling paper: http://proceedings.mlr.press/v80/ilse18a/ilse18a.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sub-Query Routing

German Typos

Semi-Supervised Clustering (Cluster-Then-Label)

Exploit "bandwagon effect" with classifier guidance

Frameworks

Learning Rate Range Test

Cotton Embeddings with TripletMarginLoss

MIMO Classification

Monte Carlo Dropout

Self Supervised Training

Adversarial Attacks

Faster R-CNN Object Detection

Unsupervised Classification Support

Transfer Learning with U-Net

Self Training

Autoencoder

Feature Importance

Reptile

Positional Encoding

cam (+ smoothgrad + guided)

Token Classification For German PII Detection

Active Learning

Multiple Instance Learning

About

Releases

Packages

Languages

MilanKalkenings/small_machine_learning_projects

Folders and files

Latest commit

History

Repository files navigation

Sub-Query Routing

German Typos

Semi-Supervised Clustering (Cluster-Then-Label)

Exploit "bandwagon effect" with classifier guidance

Frameworks

Learning Rate Range Test

Cotton Embeddings with TripletMarginLoss

MIMO Classification

Monte Carlo Dropout

Self Supervised Training

Adversarial Attacks

Faster R-CNN Object Detection

Unsupervised Classification Support

Transfer Learning with U-Net

Self Training

Autoencoder

Feature Importance

Reptile

Positional Encoding

cam (+ smoothgrad + guided)

Token Classification For German PII Detection

Active Learning

Multiple Instance Learning

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages