Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding suffix tree #323

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

Arvind-raj06
Copy link
Member

@Arvind-raj06 Arvind-raj06 commented Jan 31, 2021

Suffix Tree

References to other Issues or PRs or Relevant literature

"Fixes #290". See
#290

Brief description of what is fixed or changed

Addition of suffix tree with reference to the pypi project.

Other comments

The code works fine to me, the PR is in count of SWOC and if you find any bug pls ping me.

@codecov
Copy link

codecov bot commented Jan 31, 2021

Codecov Report

Merging #323 (e3586fa) into master (bcf9753) will decrease coverage by 0.181%.
The diff coverage is 95.312%.

@@              Coverage Diff              @@
##            master      #323       +/-   ##
=============================================
- Coverage   98.550%   98.369%   -0.181%     
=============================================
  Files           25        26        +1     
  Lines         3243      3435      +192     
=============================================
+ Hits          3196      3379      +183     
- Misses          47        56        +9     
Impacted Files Coverage Δ
pydatastructs/utils/__init__.py 100.000% <ø> (ø)
pydatastructs/strings/suffix_tree.py 94.230% <94.230%> (ø)
pydatastructs/strings/__init__.py 100.000% <100.000%> (ø)
pydatastructs/utils/misc_util.py 99.019% <100.000%> (+0.189%) ⬆️

Impacted file tree graph

@czgdp1807 czgdp1807 added this to the v0.0.1 milestone Feb 1, 2021
'SuffixTree'
]

class Suffix_Node():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else:
return {x for n in self.transition_links.values() for x in n._get_leaves()}

class SuffixTree():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 1 to 7
from pydatastructs import SuffixTree

def test_suffixtree():
s = SuffixTree("HelloworldHe")
assert s.find("Hel") == 0
assert s.find_all("He") == {0, 10}
assert s.find("Win") == -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide references for the tests for verifying the results. You may use examples from university lecture notes, wikipedia, books, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still awaited.

@czgdp1807
Copy link
Member

There are several uncovered lines in https://codecov.io/gh/codezonediitj/pydatastructs/pull/323/diff (see red). Use steps in https://github.com/codezonediitj/pydatastructs#testing for checking the coverage for your patch locally and add tests accordingly to fully cover your code by tests.

@Arvind-raj06
Copy link
Member Author

Thank you for informing me, I haven't noticed these a while

@Arvind-raj06
Copy link
Member Author

I have added some test to cover that area, I can't get it to max but given my level best

@Arvind-raj06
Copy link
Member Author

@czgdp1807 any more corrections needed!?

@czgdp1807
Copy link
Member

Please add documentation for public methods (the ones which don't start with _).

@Arvind-raj06
Copy link
Member Author

Arvind-raj06 commented Feb 5, 2021

Please add documentation for public methods (the ones which don't start with _).

No issue found that

@Arvind-raj06
Copy link
Member Author

@czgdp1807 They are requesting the leaderboard update at SWoC, today is the last date

@czgdp1807
Copy link
Member

Done.

@czgdp1807
Copy link
Member

I am a bit busy these days. There might be a delay in reviews.

@Arvind-raj06
Copy link
Member Author

Arvind-raj06 commented Feb 5, 2021

I am a bit busy these days. There might be a delay in reviews.

Yeah sure take your time

@Arvind-raj06
Copy link
Member Author

Done.

Thank you 😉

i += 1
return i

def lcs(self, stringIdxs = -1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid using short forms. Use the full name, largest_common_substring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had some methods added to algorithms under this module related with longest common substring I believe. With that backtracking thing. How is this method different from that one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backtracking just searches all the strings in the given list by going in reverse direction from the child to root and finds the longest among that but by using this method we give users the freedom to search the string from the index they want. And this function just reduces the comparison time by removing all the non subset of longest sequence

@czgdp1807
Copy link
Member

You can continue with your other PRs. I will finalize all of these on Saturday, next week.

@Arvind-raj06
Copy link
Member Author

You can continue with your other PRs. I will finalize all of these on Saturday, next week.

Yeah 👍

@Arvind-raj06
Copy link
Member Author

Arvind-raj06 commented Feb 23, 2021

I hope that you might have gone through the changes I made to your SkipList PR. Would request to make this better by learning from there. The code you write works correctly and efficiently but a bit of consistency (both in APIs and docs) is needed with respect to the existing things in master. For example, in your code of SkipList.insert(num) was the API but for every other insert method in master, a key value and an optional data value is taken as input.

Yeah It was like literally wow! I have studied it and got some good idea on consistency, Thank you
Yes later having seen the test change it's clear, I think it will take practice for me to obtain that consistency!

@Arvind-raj06
Copy link
Member Author

Shall I resolve the conflicts out here

@czgdp1807
Copy link
Member

Shall I resolve the conflicts out here

Preferrable resolve the conflicts locally. When you will update your master and try to merge it into Allgood branch, git will raise conflicts. If you are using VSCode then it will highlight the files of conflict and the changes you want to accept. After doing that just make a commit and conflicts will be resolved.

@Arvind-raj06
Copy link
Member Author

Arvind-raj06 commented Feb 26, 2021

Shall I resolve the conflicts out here

Preferrable resolve the conflicts locally. When you will update your master and try to merge it into Allgood branch, git will raise conflicts. If you are using VSCode then it will highlight the files of conflict and the changes you want to accept. After doing that just make a commit and conflicts will be resolved.

No worries I'm using atom which will also do the same. I had the experience of same in my previous PR this time will try to implement it perfectly.

@czgdp1807
Copy link
Member

Have you implemented this algorithm?

@czgdp1807
Copy link
Member

Note this PR will not be counted towards SWoC but it will be counted for GSSoC. This needs some more work both at the design level and at the implementation level.

@Arvind-raj06
Copy link
Member Author

Yeah sure then

@Arvind-raj06
Copy link
Member Author

Arvind-raj06 commented Mar 11, 2021

I'm back did you find anything to be added to make code better!
PS you can change the label to GSSoC

@czgdp1807 czgdp1807 added GSSoC and removed SWoC labels Mar 11, 2021
@Arvind-raj06
Copy link
Member Author

Cool happy to work with you back

@czgdp1807
Copy link
Member

czgdp1807 commented Mar 11, 2021

Well, IMHO, the code has a lot of room for improvement. It should be as simple as possible and easy to understand. I would suggest instead of jumping to code let's pick some examples and solve it via fake APIs and keep refining those APIs until a consensus is reached. Then we can proceed with coding and all. This is an important data structure and should be added after a lot of discussions. We are in no hurry.

Fault is mine too, I shouldn't have approved the previous designs and stuff for this DS. Let's pick an example to programmatically duplicate some example here.

@Arvind-raj06
Copy link
Member Author

Well, IMHO, the code has a lot of room for improvement. It should be as simple as possible and easy to understand. I would suggest instead of jumping to code let's pick some examples and solve it via fake APIs and keep refining those APIs until a consensus is reached. Then we can proceed with coding and all. This is an important data structure and should be added after a lot of discussions. We are in no hurry.

Yeah as you say

@Arvind-raj06
Copy link
Member Author

Hey @czgdp1807 I think this is the best suffix tree structure I could obtain from the examples. But as for the property side it's still lacking some potential ! Should I work on that rather I make a new PR for Suffix tree properties!?

@czgdp1807
Copy link
Member

We have two options with us, one to keep spending time on Suffix Tree which is a hard data structure. This will delay our first release. Another option is to put this on hold and add it to the next version. I think we should go for the second one since once people will start using our project, contributions will automatically scale. So, let us ignore it for now. Let's keep this PR open and add it to 0.0.2 milestone.

P.S. We should aim to complete the first release by first half of the year that is by August 2021. Let me know what you think.

@czgdp1807 czgdp1807 modified the milestones: v0.0.1, v0.0.2 Apr 13, 2021
@czgdp1807
Copy link
Member

I have updated the list of features to be added to 0.0.1. We should focus on graph algorithms and KMP in strings. The framework for former is already set (just algorithms are required to be implemented) and KMP shouldn't be that hard to implement.

@Arvind-raj06
Copy link
Member Author

That's really a nice idea! I will work on other issues as this is time consuming I felt the same way.

@Arvind-raj06
Copy link
Member Author

I have updated the list of features to be added to 0.0.1. We should focus on graph algorithms and KMP in strings. The framework for former is already set (just algorithms are required to be implemented) and KMP shouldn't be that hard to implement.

Cool gonna learn something new on graphs

@czgdp1807
Copy link
Member

czgdp1807 commented Apr 13, 2021

Here is the updated list, https://github.com/codezonediitj/pydatastructs/wiki/Planned-Features-for-v0.0.1

Not much is to be done for the first release. We will postpone the harder parts for the second half of the year. Otherwise we will never be able to release the package. Just three graph algorithms and one KMP is to be implemented. All of these are easy medium and can be done in one month's time. After that we will be able to focus on building docs and upload it on readthedocs.io

@czgdp1807 czgdp1807 added the Please take over PRs that can be continued by anyone. label Oct 21, 2021
@czgdp1807 czgdp1807 removed this from the v0.0.2 milestone Oct 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Please take over PRs that can be continued by anyone. strings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Suffix Tree
2 participants