Advanced-Data-analytics

Aim and motivation

* If we can predict homeless people’s behaviour, we will able to provide help and services with these people. 
* Among all methods, machine learning techniques have been proved that they are able to improve the decision making in the health-care sector (Chen et al., 2019). 
* Session-based recommenders  are useful when we have user interaction history  that they can learn based on the short-term interaction (Wang et al., 2022). These methods are emerging in the healthcare system to recommend the next-treatment recommendation (Haas, n.d.). 
* Our Aim is to predict the event within a session.  
* We used Word2vec model (Rong, 2016) that that capture the semantic similarities to predict the next event.

Data set

* In this work we used the MLB public dataset to represent the medical data. 
* The features in this dataset are correlated with the features that we will see in the real dataset. That’s why this dataset represents health care dataset.
* The data contains, a series of discreet events, including medical tests that can come back with good or bad results or vital crash that needs emergency or intense medical aid. 
* Another type of events in our database are stretched over a period. These events have starting and ending point

Preprocessing Method

To deal with imbalanced Classes, we used Weighted Random Sampler:

Machine learning Method

Results and cluster performance

By using job arrays and creating a loop in the shell
* Submitted several jobs on the GPU partition. 
* Each job had unique input to do hyper parameter optimization
* We have successfully received the results for about  200 jobs

Conclusion and Reflection

* Submitted several jobs on the GPU partition. 
* In each job I trained the model with 1500 epoch
* Each job took about 40 mins on GPU  (about 6 hours on CPU partition) 
* By observing the jobs on the cluster each time 12 jobs was running in parallel.
* Whole of the experiments took about 5 days in the cluster which is almost equal to the 60 days in personal laptop.
* Could use the cluster to find the best parameters almost 10 times faster than using my own resource.
* Found clusters recourses very useful and and they are time saving for doing experiments

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
README.md		README.md
dataloader.py		dataloader.py
main_preprocess_clean_data.py		main_preprocess_clean_data.py
main_program.py		main_program.py
model.py		model.py
preprocessing.py		preprocessing.py
slurm_script_do_main_program.sh		slurm_script_do_main_program.sh
slurm_script_do_preprocessing.sh		slurm_script_do_preprocessing.sh
top_level_script_fuzzy.sh		top_level_script_fuzzy.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced-Data-analytics

Aim and motivation

Data set

Preprocessing Method

To deal with imbalanced Classes, we used Weighted Random Sampler:

Machine learning Method

Results and cluster performance

Conclusion and Reflection

About

Releases

Packages

Languages

Fuzzy-sh/Advanced-Data-analytics

Folders and files

Latest commit

History

Repository files navigation

Advanced-Data-analytics

Aim and motivation

Data set

Preprocessing Method

To deal with imbalanced Classes, we used Weighted Random Sampler:

Machine learning Method

Results and cluster performance

Conclusion and Reflection

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages