University Admission Match Predictor

Domain             : Recommendation System

Description

Analyzed university admission statistics.
Developed tools for matching university (in percentile) using CGPA, GRE (Verbal, Quantitative, Analytical Writting) scores.

Code

GitHub Link      : University Admission Analysis(GitHub)
GitLab Link      : University Admission Analysis(GitLab)
Portfolio        : Anjana Tiha's Portfolio

Dataset

Dataset Name     : Gradcafe Data
Dataset Link     : Gradcafe Data (GitHub)

Dataset Details

Dataset Subtype	Number of item	Size of Images (GB/Gigabyte)
CS	27,822	5.12 MB
All	271,807	71.5 MB

Schema

Credit: Thanks to Debarghya Das(deedy) for data and schema

The schema of the all.csv file which contains all the content on GradCafe and the allgrad table in all.sql is:

Column Name	Type	Description
rowid	INTEGER PRIMARY KEY	A unique integer ID identifying the row. There are 271,807 rows.
uni_name	TEXT	The name of the university. The uncleaned field is user-supplied, and very noisy, containing 10,297 distinct strings. The cleaned version reduces this number to 2708. 98.5% of university names are clean.
major	TEXT	The intended major. This field isn't cleaned and is user-supplied and also noisy. It contains 18,957 distinct strings, the most common of which are "Computer Science", "Economics" and "English".
degree	TEXT(5)	The degree to be earned. This field is cleaned and takes the following values: "PhD", "MS", "MEng", "MBA", "MFA", "MA", and "Other". The top 3 are "PhD", "MS" and "Other".
season	TEXT(3)	The season is a three letter string of the form [SF][0-9]{2}. "S" is for admission into the Spring semester and "F" represents Fall. The two numbers represent the year for which admission is being sought.
decision	TEXT(15)	The decision being reported. This field takes the following values: "Accepted", "Rejected", "Wait listed", "Interview" and "Other".
decision_method	TEXT(15)	The method in which the decision was reported. The field takes the following values: "E-mail", "Website", "Phone", "Postal Service" and "Other".
decision_date	TEXT(10)	The date the decision was made in the form "dd-mm-yyyy".
decision_timestamp	INTEGER	The timestamp since epoch that the decision was made.
ugrad_gpa	FLOAT	The candidate's self-reported undergraduate GPA. Typically on a 4.0 scale, but often scores on 10.0 scales are reported with no clear disambiguation.
gre_verbal	INTEGER	The candidate's self-reported GRE Verbal score. If `is_new_gre` is 1, this field should be between 130 and 170 inclusive. If 0, then it should be between 200 and 800 exclusive.
gre_quant	INTEGER	The candidate's self-reported GRE Quantitative score. If `is_new_gre` is 1, this field should be between 130 and 170 inclusive. If 0, then it should be between 200 and 800 exclusive.
gre_writing	FLOAT	The candidate's self-reported GRE Writing score. It is on a scale of 0.0 to 6.0.
is_new_gre	INTEGER	Whether or not the candidate took the new GRE examination (where scores range from 130 to 170) or not.
gre_subject	INTEGER	The candidate's self-reported GRE Subject Test score. It can range in the 900s. Presumably, given that this is a CS dataset, I'd assume the subject in question is Computer Science.
status	TEXT(28)	The status of the candidate. Can take on 4 different values - "American", "International", "International with US degree" and "Other".
post_data	TEXT(10)	The date on which this report was posted by the candidate in the form "dd-mm-yyyy".
post_timestamp	INTEGER	The timestamp since epoch that the post was made.
comments	BLOB	All user added comments to the post he submitted

The schema of the cs.csv file, which contain all results that have word beginning with "computer", and the computer table in cs.sql is:

Column Name	Type	Description
rowid	INTEGER PRIMARY KEY	A unique integer ID identifying the row
uni_name	TEXT	The name of the university. The uncleaned field is user-supplied, and very noisy, containing 2325 distinct strings. The cleaned version reduces this number to 415.
major	TEXT	The intended major. This field is cleaned and takes the following values: "CS", "ECE", "HCI", "IS" and "Other".
degree	TEXT(5)	The degree to be earned. This field is cleaned and takes the following values: "PhD", "MS", "MEng", "MBA", "MFA" and "Other".
season	TEXT(3)	The season is a three letter string of the form [SF][0-9]{2}. "S" is for admission into the Spring semester and "F" represents Fall. The two numbers represent the year for which admission is being sought.
decision	TEXT(15)	The decision being reported. This field takes the following values: "Accepted", "Rejected", "Wait listed", "Interview" and "Other".
decision_method	TEXT(15)	The method in which the decision was reported. The field takes the following values: "E-mail", "Website", "Phone", "Postal Service" and "Other".
decision_date	TEXT(10)	The date the decision was made in the form "dd-mm-yyyy".
decision_timestamp	INTEGER	The timestamp since epoch that the decision was made.
ugrad_gpa	FLOAT	The candidate's self-reported undergraduate GPA. Typically on a 4.0 scale, but often scores on 10.0 scales are reported with no clear disambiguation.
gre_verbal	INTEGER	The candidate's self-reported GRE Verbal score. If `is_new_gre` is 1, this field should be between 130 and 170 inclusive. If 0, then it should be between 200 and 800 exclusive.
gre_quant	INTEGER	The candidate's self-reported GRE Quantitative score. If `is_new_gre` is 1, this field should be between 130 and 170 inclusive. If 0, then it should be between 200 and 800 exclusive.
gre_writing	FLOAT	The candidate's self-reported GRE Writing score. It is on a scale of 0.0 to 6.0.
is_new_gre	INTEGER	Whether or not the candidate took the new GRE examination (where scores range from 130 to 170) or not.
gre_subject	INTEGER	The candidate's self-reported GRE Subject Test score. It can range in the 900s. Presumably, given that this is a CS dataset, I'd assume the subject in question is Computer Science.
status	TEXT(28)	The status of the candidate. Can take on 4 different values - "American", "International", "International with US degree" and "Other".
post_data	TEXT(10)	The date on which this report was posted by the candidate in the form "dd-mm-yyyy".
post_timestamp	INTEGER	The timestamp since epoch that the post was made.
comments	BLOB	All user added comments to the post he submitted

Tools / Libraries

Languages               : Python
Tools/IDE               : Anaconda
Libraries               : Recommendation System

Dates

Duration                : November 2018 – January 2019
Current Version         : v1.0.0.4
Last Update             : 12.25.2018

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
code		code
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

University Admission Match Predictor

Description

Code

Dataset

Dataset Details

Schema

Tools / Libraries

Dates

About

Releases

Packages

Languages

License

anjanatiha/University-Admission-Match-Predictor

Folders and files

Latest commit

History

Repository files navigation

University Admission Match Predictor

Description

Code

Dataset

Dataset Details

Schema

Tools / Libraries

Dates

About

Topics

Resources

License

Stars

Watchers

Forks

Languages