Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db Modeling and db related discussion. #42

Open
asiyani opened this issue Jun 3, 2017 · 10 comments
Open

db Modeling and db related discussion. #42

asiyani opened this issue Jun 3, 2017 · 10 comments

Comments

@asiyani
Copy link

asiyani commented Jun 3, 2017

There are few topics we need to discuss regarding the database.

// Schema for logged in users.
userSchema = new Schema({
   _id: mongoose.Schema.Types.ObjectId,
    githubId: Number,
    login: String,
    name: String,
    html_url: String,
    accessToken: String,
});
// Schema of usernames of stargazers in Github
usernameSchema = new Schema({
    _id: mongoose.Schema.Types.ObjectId,
    githubId: Number,
    login: String,
    name: String,
    html_url: String,
    location: String,
    bio: String,
    public_repos: Number,
    public_gists: Number,
    followers: Number,
    dbLastUpdated: Date,
    starredIds: [mongoose.Schema.Types.ObjectId],
});
repositorySchema = new Schema({
    _id: mongoose.Schema.Types.ObjectId,
    name: String,
    html_url: String,
    description: String,
    stargazers_count: Number,
    forks_count: Number,
    created_at: Date,
    updated_at: Date,
    language: String,
});
****Questions****
1. is properties of each schema is enough or do we need to store more data related to each collection.
2. relationship between username and repository Schema? Embedded, one-to-N or N-to-N?
3. DB server for development -  local mongo OR mlab?

I think

  1. userSchema & repositorySchema is fine but usernameSchema got lots of stuff which we might not need. like location, bio

  2. This one depends on the query we will be running on DB and amount of data. If I am right at the moment we are querying usernames to get repository. in that case......

usernameSchema = new Schema({
    _id: 
    name: 
    :
    repositoryIDs: [ObjectId1,ObjectId2,......N],
});

Problem with this is some username like 'tj' got 1.7k starred repositories! thats to many Ids to put in array.
Other solution.
because we have limited number of usernames we can do this....

repositorySchema = new Schema({
    _id: 
    name: 
    :
    usernameIDs: [ObjectId1,ObjectId2,......N],
});

Problem is it will be dificult to just query repository based on usernames..
Don't know 😖

  1. local DB requires initial setup, mlab needs creating account and MAX limit is 0.5GB (I think this should be more than enough 😉 ). I personally prefer local DB server for developing.

Lets discuss answers for all 3 questions or any other questions related to DB.

@mubaris
Copy link
Owner

mubaris commented Jun 7, 2017

  1. I don't think we need to collect more info about the stargazers. We need their username and preferably followers or something like that. Not more than this.

  2. If we store the repos as array of the format authorUsername/repoName, for example if tj starred curiositylab/curiosity and addyosmany/xyz ... we can do like this,

usernameSchema = new Schema({
    _id: 
    name: tj
    :
    repos: ['curiositylab/curiosity', 'addyosmani/xyz', ...]
});

Is that efficient.

  1. Local DB is okay for dev stage 😄

@asiyani
Copy link
Author

asiyani commented Jun 7, 2017

If we store the repos as array of the format authorUsername/repoName, for example if tj starred curiositylab/curiosity and addyosmany/xyz ... we can do like this,

usernameSchema

 = new Schema({
    _id: 
    name: tj
    :
    repos: ['curiositylab/curiosity', 'addyosmani/xyz', ...]
});

But then repos array can have thousands of entry for each stargazers(username).

Question: - How are quering GitHub at the moment. I know that we are querying each stargazer but is there any sort or filter while doing API call to Github?

In NoSQL you design database based on Queries you will be doing.

@raulvillares
Copy link
Contributor

Question: - How are quering GitHub at the moment. I know that we are querying each stargazer but is there any sort or filter while doing API call to Github?

No, there is no real query filter at the moment. In case the user selects a language, the array filter function is used, but it's applied once you query all projects starred by each user.

response.data.filter(filterFunction).slice(0, MAX_PROJECTS_PER_USER).forEach((entry) => {
...
}

When a language is selected, I tried to query just the projects developed with thath language bit It seems like there is no language parameter at Github API.

@asiyani
Copy link
Author

asiyani commented Jun 7, 2017

Ok let just start by writing down queries we think we will be doing to DB.

  1. get all repo of say.... 'tj'
  2. get all repo written in 'Javascript'
  3. get repo which is starred by all/most stargazer.
  4. get repo which is updated in last 24/48h.
    (there repo means starred repos)

anything else you guys can think of.....

@mubaris
Copy link
Owner

mubaris commented Jun 8, 2017

@alejandronanez What do you think about this?

@alejandronanez
Copy link
Contributor

1 & 2. Have you tried querying the graphql endpoint instead of the rest endpoints? GQL helps us to 'filter' what data we get back from the server.
3. Local db is fine! 😊

@asiyani
Copy link
Author

asiyani commented Jun 9, 2017

@alejandronanez good shout about GQL, don't know how to do that. 😉 but it will be fun to learn. 😄

I think instead of creating an array of repos in usernameSchema we should add usernames to repos schema....

repositorySchema = new Schema({
    name: curiosity
    :
    language:'javascript',
    githubLogins: [asiyani,alejandronanez,mubaris....],
});

In this way we don't have to search username() collection at all, we can just query repository collection. Of course, this will only work if githubLogin are unique and I am sure they are.

# following should give me all repos started by 'asiyani' from DB.
Repository.find({ githubLogins: { "$in" : ["asiyani"]} }, ...);

# following should give me all repos started by 'asiyani' & language=javascript from DB. 
Repository.find({ githubLogins: { "$in" : ["asiyani"]} }, language:'javascript');

If user do need info about stargazers then we can query that separtly.

usernameSchema.findOne({login:'asiyani'})

but most of the time we will be quering Repository collection ratherthen username collection. In this way there want be any application level joints.

  1. I am thinking of changing the name of the collection from username to stargazers. Just to avoid confusion between userSchema ( our site user) and usernameSchema (stargazers).

Let me know what you guys think.

@mubaris
Copy link
Owner

mubaris commented Jun 9, 2017

I like this new way of storing repository details. Easy to get details.

@alejandronanez
Copy link
Contributor

@asiyani I like this new approach too. I have experience with GQL, let me know if you hit any roadblock or something.
FWIW you don't need any fancy framework to use GQL, so I suggest just to keep it simple at the beginning.

@asiyani
Copy link
Author

asiyani commented Jun 9, 2017

Good then we will go with this scheme.
Before we go GQL for our client. we need to work on github API to populate data.
I will start with that first so we have some data to send via GQL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants