How to get Comments and Docstring in TreeSitter #2470
-
I have written the following code: PYTHON_LANGUAGE = Language('build/my-languages.so', 'python')
parser = Parser()
parser.set_language(PYTHON_LANGUAGE)
start_points=[]
end_points=[]
def walk(node):
"""
node: takes ast.root_node to start traversing
returns-> startpoints: starting index of each token
endpointes: ending index of each token
"""
start_points = []
end_points = []
stack = [node]
while stack:
curr_node = stack.pop()
if len(curr_node.children) == 0:
start_points.append(curr_node.start_point)
end_points.append(curr_node.end_point)
else:
for child in curr_node.children:
stack.append(child)
return start_points, end_points
def break_into_lines(code_sample):
"""
code_sample: code taken as a single string
return: each line in the corresponding code
"""
return code_sample.split('\n')
def get_tokens(code_sample, start_points, end_points):
"""
code_sample: full code in string
startpoints: starting index of each token
endpointes: ending index of each token
"""
tokens=[]
lines_in_code=break_into_lines(code_sample)
assert len(start_points)==len(end_points), 'problem in finding the start and end points in the code'
for i in range(len(start_points)):
tokens.append(lines_in_code[start_points[i][0]][start_points[i][1]:end_points[i][1]])
return tokens In the |
Beta Was this translation helpful? Give feedback.
Answered by
Arnab9Codes
Aug 6, 2023
Replies: 1 comment
-
I have got the answer myself.
in tree_sitter docstring is an expression_statement and then it is a string under that, |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
Arnab9Codes
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have got the answer myself.
use the following logic:
in tree_sitter docstring is an expression_statement and then it is a string under that,
a line comment is a single line comment.