Skip to content

From the given database Find out the personality using this personality traits. Applications in psychology Factor analysis has been used in the study of human intelligence and human personality as a method for comparing the outcomes of (hopefully) objective tests and to construct matrices to define correlations between these outcomes, as well as…

License

Notifications You must be signed in to change notification settings

Sagar-Darji/Personality-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Personality-prediction

From the given database response.csv Find out the personality traits using this personality prediction project.

Let's get started

The system will predict one's personality and their traits through basic survey.This system will help the human resource to select right candidate for desired job profile, which in turn provide expert workforce for the organization.

Applications in psychology:

Factor analysis has been used in the study of human intelligence and human personality as a method for comparing the outcomes of (hopefully) objective tests and to construct matrices to define correlations between these outcomes, as well as finding the factors for these results. The field of psychology that measures human intelligence using quantitative testing in this way is known as psychometrics (psycho=mental, metrics=measurement).

Advantages:

  • Offers a much more objective method of testing traits such as intelligence in humans
  • Allows for a satisfactory comparison between the results of intelligence tests
  • Provides support for theories that would be difficult to prove otherwise

Algorithm

Refine the Data
Prepare the Data
Choose the Factor
  variable
  correlation matrix
  using any method of factor analysis such as EFA
  Decide no. of factors
  factor loading of factors
  rotation of factor loadings
  provide appropriate no. of factors
  

Now understand and implement the code

Import all libraries which we needed to perform this python code

#Librerias
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt

Make Dataframe using Pandas and shaped it

#Data
df = pd.read_csv("responses.csv")
df.shape

Out: (1010, 15)

In response.csv [1010 rows, 150 rows]

Which means this data collected by surveying 1010 individuals and there is 150 types of different prefrence & fields.

MUSIC PREFERENCES (19) 0:19

MOVIE PREFERENCES (12) 19:31

HOBBIES & INTERESTS (32) 31:63

PHOBIAS (10) 63:73

HEALTH HABITS (3) 73:76

PERSONALITY TRAITS, VIEWS ON LIFE & OPINIONS (57) 76:133

SPENDING HABITS (7) 133:140

DEMOGRAPHICS (10 ) 140:150

We will take only: PERSONALITY TRAITS, VIEWS ON LIFE & OPINIONS (57) 76:133

df = df.iloc[:, 76:133]
df.head(5)

Out:

Daily events Prioritising workload Writing notes Workaholism Thinking ahead Final judgement Reliability Keeping promises Loss of interest Friends versus money ... Happiness in life Energy levels Small - big dogs Personality Finding lost valuables Getting up Interests or hobbies Parents' advice Questionnaires or polls Internet usage
0 2.0 2.0 5.0 4.0 2.0 5.0 4.0 4.0 1.0 3.0 ... 4.0 5.0 1.0 4.0 3.0 2.0 3.0 4.0 3.0 few hours a day
1 3.0 2.0 4.0 5.0 4.0 1.0 4.0 4.0 3.0 4.0 ... 4.0 3.0 5.0 3.0 4.0 5.0 3.0 2.0 3.0 few hours a day
2 1.0 2.0 5.0 3.0 5.0 3.0 4.0 5.0 1.0 5.0 ... 4.0 4.0 3.0 3.0 3.0 4.0 5.0 3.0 1.0 few hours a day
3 4.0 4.0 4.0 5.0 3.0 1.0 3.0 4.0 5.0 2.0 ... 2.0 2.0 1.0 2.0 1.0 1.0 NaN 2.0 4.0 most of the day
4 3.0 1.0 2.0 3.0 5.0 5.0 5.0 4.0 2.0 3.0 ... 3.0 5.0 3.0 3.0 2.0 4.0 3.0 3.0 3.0 few hours a day

5 rows × 57 columns

1. Prepare the Data

#Drop NAs
df = df.dropna()
#...............................................................................................
#Encode categorical data
from sklearn.preprocessing import LabelEncoder

df = df.apply(LabelEncoder().fit_transform)
df

dropna() method will remove Null value from dataframe.

Why are we encoding the data?

In order to analys data require all i/p & o/p variable to be nummeric. This means that if our data contains categorical dat, we must encode it to number before you can fit and evalute a model.

There is two type of encoding

  1. Integer encoding

each unique label is mapped to an integer.

  1. One hot encoding

It refers to splitting the column which contains numerical categorical data to many columns depending on the number of categories present in that column. Each column contains “0” or “1” corresponding to which column it has been placed.

Before Encoding After Encoding
Height Height
Tall 0
Short 1
Medium 2
Medium 2
Short 1
Tall 0

Here, We have used One hot encoding.

Out:

Daily events Prioritising workload Writing notes Workaholism Thinking ahead Final judgement Reliability Keeping promises Loss of interest Friends versus money ... Happiness in life Energy levels Small - big dogs Personality Finding lost valuables Getting up Interests or hobbies Parents' advice Questionnaires or polls Internet usage
0 1 1 4 3 1 4 3 3 0 2 ... 3 4 0 3 2 1 2 3 2 0
1 2 1 3 4 3 0 3 3 2 3 ... 3 2 4 2 3 4 2 1 2 0
2 0 1 4 2 4 2 3 4 0 4 ... 3 3 2 2 2 3 4 2 0 0
4 2 0 1 2 4 4 4 3 1 2 ... 2 4 2 2 1 3 2 2 2 0
5 1 1 2 2 2 0 2 3 2 1 ... 2 3 3 2 2 2 4 2 3 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1005 2 1 0 3 1 2 2 2 3 3 ... 3 2 2 2 3 4 3 3 2 0
1006 0 2 0 4 4 4 4 3 0 1 ... 3 3 2 4 2 0 2 3 2 1
1007 2 0 0 0 3 0 2 4 0 3 ... 2 0 2 1 2 4 0 3 4 2
1008 2 0 4 0 2 3 3 3 4 2 ... 2 1 1 3 0 4 2 2 2 2
1009 2 4 3 4 3 2 4 4 2 3 ... 3 1 2 3 0 1 1 2 4 0

864 rows × 57 columns

2. Choose the Factors

pip install factor_analyzer 
Requirement already satisfied: factor_analyzer in c:\users\dell\anaconda3\lib\site-packages (0.3.2)
Requirement already satisfied: pandas in c:\users\dell\anaconda3\lib\site-packages (from factor_analyzer) (0.25.1)
Requirement already satisfied: scipy in c:\users\dell\anaconda3\lib\site-packages (from factor_analyzer) (1.3.1)
Requirement already satisfied: numpy in c:\users\dell\anaconda3\lib\site-packages (from factor_analyzer) (1.16.5)
Requirement already satisfied: scikit-learn in c:\users\dell\anaconda3\lib\site-packages (from factor_analyzer) (0.21.3)
Requirement already satisfied: pytz>=2017.2 in c:\users\dell\anaconda3\lib\site-packages (from pandas->factor_analyzer) (2019.3)
Requirement already satisfied: python-dateutil>=2.6.1 in c:\users\dell\anaconda3\lib\site-packages (from pandas->factor_analyzer) (2.8.0)
Requirement already satisfied: joblib>=0.11 in c:\users\dell\anaconda3\lib\site-packages (from scikit-learn->factor_analyzer) (0.13.2)
Requirement already satisfied: six>=1.5 in c:\users\dell\anaconda3\lib\site-packages (from python-dateutil>=2.6.1->pandas->factor_analyzer) (1.12.0)
Note: you may need to restart the kernel to use updated packages.

Factor Analyzer

Reduce large no. variables into fewer no. of factors. This is a python module to perform exploratory and factor analysis with several optional rotations. It also includes a class to perform confirmatory factor analysis (CFA), with curtain predefined techniques.

What is Factor Roatation

minimize the complexity of the factor loadings to make the structure simpler to interpret.

There is two type of rotation

  1. Orthogonal rotation

constrain the factors to be uncorrelated. Althogh often favored, In many cases it is unrealistic to expect the factor to be uncorrelated and forcing then to be uncorrelated make it less likely that the rotation produces a solution with simple structure. Method:

  1. varimax

    it maximizes the sum of the variances of the squared loadings and makes the structure simpler. Mathematical equation of varimax png

  2. quatimax
  3. equimax
  4. Oblique rotation

permit the factors to be correlated with one another often produces solution with a simpler structure.

Here, Our data is uncorrelated so we have used Orthogonal's varimax rotation method.

Now, We determine no. of factor using Scree plot

we can use also eigenvalue to determine no. of factor but that is more complex and by Scree plot its is to find.

#Try the model with all the variables 
from factor_analyzer import FactorAnalyzer         # pip install factor_analyzer 
fa = FactorAnalyzer(rotation="varimax")
fa.fit(df) 

# Check Eigenvalues
ev, v = fa.get_eigenvalues()
ev

# Create scree plot using matplotlib
plt.scatter(range(1,df.shape[1]+1),ev)
plt.plot(range(1,df.shape[1]+1),ev)
plt.title('Scree Plot')
plt.xlabel('Factors')
plt.ylabel('Eigenvalue')
plt.grid()
plt.show()

Out:

png

How we find no. of factor?

A scree plot shows the eigenvalues on the y-axis and the number of factors on the x-axis. It always displays a downward curve.The point where the slope of the curve is clearly leveling off (the “elbow) indicates the number of factors that should be generated by the analysis.

As you can see the most usefull factors for explain the data are between 5-6 until falling significantly.

We will fit the model with 5 Factors:

#Factor analysis with 5 fators
fa = FactorAnalyzer(5, rotation="varimax")
fa.fit(df)
AF = fa.loadings_
AF = pd.DataFrame(AF)
AF.index = df.columns
AF

Out:

0 1 2 3 4
Daily events 0.250416 0.058953 0.206877 0.026094 0.028915
Prioritising workload -0.012803 -0.150045 0.555946 0.078913 0.128156
Writing notes -0.006039 -0.015927 0.420849 0.225307 0.261380
Workaholism 0.069524 0.029275 0.527082 0.088573 0.032979
Thinking ahead 0.023475 0.127909 0.530457 0.035213 0.055426
Final judgement 0.046188 0.112493 0.119861 0.381338 -0.039756
Reliability 0.061028 -0.102481 0.539373 0.073534 -0.003491
Keeping promises 0.053358 -0.034661 0.420538 0.121450 -0.033511
Loss of interest 0.273777 0.226286 0.003524 -0.149262 0.101882
Friends versus money 0.021279 -0.111839 0.022026 0.381357 -0.045824
Funniness 0.312861 0.131400 -0.043014 -0.018258 -0.026083
Fake 0.091188 0.469616 -0.024535 -0.191798 0.019356
Criminal damage 0.154868 0.177732 -0.112659 -0.240721 0.266761
Decision making -0.287128 0.102033 0.267415 0.129336 0.158694
Elections 0.074306 -0.015585 0.222003 0.131404 -0.083563
Self-criticism -0.016858 0.398420 0.229116 0.114144 0.069707
Judgment calls 0.182082 -0.010461 0.102263 0.035675 0.086474
Hypochondria -0.040254 0.258913 -0.034874 0.042981 0.213548
Empathy -0.050152 -0.073697 0.059441 0.324982 0.133754
Eating to survive -0.010608 0.183045 0.003261 -0.015131 -0.018874
Giving 0.082276 -0.154549 0.112481 0.376723 0.234000
Compassion to animals -0.083505 -0.002767 -0.010424 0.262183 0.192734
Borrowed stuff -0.097017 -0.023047 0.323253 0.171017 0.071189
Loneliness -0.199197 0.542350 -0.019272 0.045942 0.190369
Cheating in school 0.216223 -0.063183 -0.384634 -0.083940 0.208210
Health -0.012267 0.027867 0.131645 0.184296 0.437826
Changing the past -0.016622 0.482307 -0.161320 0.073843 0.159231
God 0.047894 0.032281 0.027136 0.453873 -0.025963
Dreams 0.207076 -0.187723 0.078634 0.037709 -0.124853
Charity 0.163161 0.116834 0.156898 0.354953 -0.067795
Number of friends 0.514994 -0.321738 -0.086711 0.241070 -0.006859
Punctuality 0.004662 0.090531 -0.143569 0.069648 0.078111
Lying -0.095933 -0.193370 0.001775 0.138092 0.006950
Waiting 0.032019 -0.067715 -0.000820 0.075966 -0.329606
New environment 0.470076 -0.129745 -0.058912 0.005400 -0.230743
Mood swings -0.086477 0.353226 -0.041005 0.031490 0.404388
Appearence and gestures 0.227246 -0.004762 0.105894 0.068825 0.303119
Socializing 0.537811 -0.096245 -0.048127 0.135323 -0.039204
Achievements 0.252835 0.048658 -0.042799 -0.082401 0.111902
Responding to a serious letter -0.126985 0.087976 -0.026876 0.022940 0.013346
Children 0.079877 -0.134254 0.033040 0.440103 0.075663
Assertiveness 0.353462 -0.094372 0.002509 -0.067185 0.044117
Getting angry 0.051167 0.176922 -0.086069 -0.070837 0.532025
Knowing the right people 0.478657 0.022868 0.113503 -0.045359 0.227230
Public speaking -0.385674 0.104662 0.069712 0.030447 0.190834
Unpopularity -0.082146 0.229228 0.079173 0.241031 -0.031212
Life struggles -0.226293 0.057892 -0.059615 0.384875 0.392060
Happiness in life 0.288585 -0.541050 0.158473 0.051235 -0.064525
Energy levels 0.499978 -0.478860 0.037918 0.122773 -0.025001
Small - big dogs 0.206696 0.040211 -0.143225 -0.203991 -0.131298
Personality 0.259646 -0.393197 0.064236 0.049013 -0.056988
Finding lost valuables -0.127907 -0.011367 0.163354 0.391951 -0.101749
Getting up 0.012217 0.150551 -0.312297 0.082580 0.121198
Interests or hobbies 0.465627 -0.253289 0.065015 0.144827 -0.078694
Parents' advice 0.022594 -0.032871 0.243628 0.282252 0.113225
Questionnaires or polls -0.045177 0.114865 0.154309 0.188501 -0.032532
Internet usage -0.046077 0.075435 -0.007799 -0.081575 0.048144
#Get Top variables for each Factor 
F = AF.unstack()
F = pd.DataFrame(F).reset_index()
F = F.sort_values(['level_0',0], ascending=False).groupby('level_0').head(5)    # Top 5 
F = F.sort_values(by="level_0")
F.columns=["FACTOR","Variable","Varianza_Explica"]
F = F.reset_index().drop(["index"],axis=1)
F

Out:

FACTOR Variable Varianza_Explica
0 0 New environment 0.470076
1 0 Energy levels 0.499978
2 0 Number of friends 0.514994
3 0 Socializing 0.537811
4 0 Knowing the right people 0.478657
5 1 Mood swings 0.353226
6 1 Self-criticism 0.398420
7 1 Fake 0.469616
8 1 Changing the past 0.482307
9 1 Loneliness 0.542350
10 2 Writing notes 0.420849
11 2 Workaholism 0.527082
12 2 Thinking ahead 0.530457
13 2 Prioritising workload 0.555946
14 2 Reliability 0.539373
15 3 Friends versus money 0.381357
16 3 Life struggles 0.384875
17 3 Finding lost valuables 0.391951
18 3 Children 0.440103
19 3 God 0.453873
20 4 Appearence and gestures 0.303119
21 4 Life struggles 0.392060
22 4 Mood swings 0.404388
23 4 Health 0.437826
24 4 Getting angry 0.532025
#Show the Top for each Factor 
F = F.pivot(columns='FACTOR')["Variable"]
F.apply(lambda x: pd.Series(x.dropna().to_numpy()))

Out:

FACTOR 0 1 2 3 4
0 New environment Mood swings Writing notes Friends versus money Appearence and gestures
1 Energy levels Self-criticism Workaholism Life struggles Life struggles
2 Number of friends Fake Thinking ahead Finding lost valuables Mood swings
3 Socializing Changing the past Prioritising workload Children Health
4 Knowing the right people Loneliness Reliability God Getting angry

FACTOR 1: Energy levels, Number of friends, Socializing...

Could be: Extraversion


FACTOR 2: Self-ciricism, Fake, Loneliness...

Looks very similar to "Neuroticism"


Factor 3: Thinking ahead, Prioritising workload...

very similar to "Conscientiousness"


Factor 4: Children, God, Finding lost valuables

This factor could be something like "religious" or "conservative", maybe have lowest scores of a "Openness" in Big Five model.


Factor 5: Appearence and gestures, Mood swings

Mmmm it could be "Agreeableness". What do you think it could be represent?


Conclusion

The first three Factors are very clear: Extraversion, Neuroticism and Conscientiousness. The other two not to much. Anyway is a very interesting approximation

Maybe doing first a PCA for remove hight correlate variables like "God" and "Final judgement"could help.

What do you think?

Thank you

png

I appreciate especially your Heart

About

From the given database Find out the personality using this personality traits. Applications in psychology Factor analysis has been used in the study of human intelligence and human personality as a method for comparing the outcomes of (hopefully) objective tests and to construct matrices to define correlations between these outcomes, as well as…

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published