Skip to content

The data sets used in this exercise contain world population evolution (from 1960 to 2017) and the country's income classification for 264 observations. Firstly, the variables in the data sets have been carefully examined in order to get a proper understanding of the data sets. The two data sets have been merged using the common variable of Coun…

Notifications You must be signed in to change notification settings

UdeshikaDissa/Data-Preprocessing_R

Repository files navigation

Data-Preprocessing_R

Data Pre-Processing for World Population dynamics Data-set

The data sets used in this exercise contain world population evolution (from 1960 to 2017) and the country's income classification for 264 observations. Firstly, the variables in the data sets have been carefully examined to get a proper understanding of the data sets. The two data sets have been merged using the common variable of Country Code. Then, the structure and attributes of the merged data set have been carefully checked. Data types of a few variables have been converted to have a better representation of the data. To make the data set tidier, unnecessary variables for the exercise have been dropped. Also, a few variables have been relabeled to have a better representation. After that, the data set have been transformed from wide format to long format. Subsequently, the missing values and special values in the data set have been appropriately treated. The outliers of the data set have been investigated using the z-score method. The numerical variable of “Total_population” has been checked for its distribution using Histogram and identified that it's not normal, but strongly right-skewed. By using logarithm base e (ln) transformation, this variable has been converted to a normally distributed representation for convenient analysis.

Data Sourse:

https://data.worldbank.org/indicator/SP.POP.TOTL

R codes:

MATH2349_1850_Assignment_3_submission.rmd

Data-sets folder:

Contains data-sets used for the project

Report:

MATH2349_1850 Assignment 3_s3400652.pdf

About

The data sets used in this exercise contain world population evolution (from 1960 to 2017) and the country's income classification for 264 observations. Firstly, the variables in the data sets have been carefully examined in order to get a proper understanding of the data sets. The two data sets have been merged using the common variable of Coun…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published