Skip to content

GenesisBlock3301/probability_statistics_and_machine_learning

Repository files navigation

Learning Roadmap of Probability and Statistics

Statistics roadmap for ML:

Probability theory:

  • Probability
  • Random variables
  • Probability distributions
  • Conditional probability is crucial for modeling uncertainty in ML.

Descriptive statistics:

  • Measures of central tendency (mean, median, mode)
  • measures of dispersion (variance, standard deviation)

Inferential statistics:

  • Hypothesis testing
  • Confidence intervals
  • P-values are essential for making inferences and drawing conclusions from data samples.

Regression analysis:

  • Linear regression and its variants are widely used in ML for modeling relationships between variables and making predictions.

Probability distributions:

  • Gaussian (normal) distribution
  • Binomial distribution
  • Poisson's distribution is beneficial for understanding the behavior of data and modeling assumptions.

Sampling techniques:

  • Understanding different sampling techniques, such as random sampling and stratified sampling, is important for collecting representative training and test datasets.

Statistical hypothesis testing:

  • Knowing how to perform hypothesis tests, interpret the results
  • Make decisions based on statistical significance is crucial for evaluating ML models.

Statistical modeling: Knowledge of techniques like

  • maximum likelihood estimation (MLE),
  • Bayesian inference can be helpful for parameter estimation and building probabilistic models.

Experimental design:

  • Understanding principles of experimental design, such as randomization, control groups, and factorial designs, helps in conducting rigorous experiments and A/B testing in ML.

Multivariate statistics:

  • Techniques like principal component analysis (PCA), factor analysis
  • Cluster analysis provide tools for dimensionality reduction, feature selection
  • Pattern recognition in high-dimensional datasets.

Exploratory data analysis

  1. Scatter plot.
  2. Pair Plot.
  3. Histogram
  4. Cumulative Distribution
  5. Mean and Standard Deviation
  6. Median, Percentile, Quantile
  7. MAD, Box plot and Voilin Plot

  1. EDA on Cancer Dataset
  2. Gaussian or Normal distribution
  3. Skewness and Kurtosis
  4. Sampling Distribution & Standard Normal Variate(z) and Standardization
  5. Quantile quantile plot
  6. Chebyshev's inequality
  7. Uniform Distribution
  8. Bernoulli Vs Binomial VS Normal VS Pareto Distribute.
  9. Box Cox Transformation
  10. Covariance Statistics
  11. Pearson Correlation
  12. Spearman rank Correlation Coefficient
  13. Correlation VS Causation and confidence interval.
  14. Confidence Interval with underlying or Gaussian Distribution.
  15. Hypothesis testing and P value statistics.
  16. T test vs Chi Square test VS Anova test

About

This repo for learning ML related concept and tools

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published