You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reviseParagraphClassification returns paragraphs, but it doesn't look like the paragraphs get manipulated in any way, and that reviseParagraphs is where the new filtered data is. Is reviseParagraphs supposed to be returned instead of paragraphs ?
I added the python reference code for convenience.
defrevise_paragraph_classification(paragraphs, max_heading_distance=MAX_HEADING_DISTANCE_DEFAULT):
""" Context-sensitive paragraph classification. Assumes that classify_pragraphs has already been called. """# copy classesforparagraphinparagraphs:
paragraph.class_type=paragraph.cf_class# good headingsfori, paragraphinenumerate(paragraphs):
ifnot (paragraph.headingandparagraph.class_type=='short'):
continuej=i+1distance=0whilej<len(paragraphs) anddistance<=max_heading_distance:
ifparagraphs[j].class_type=='good':
paragraph.class_type='neargood'breakdistance+=len(paragraphs[j].text)
j+=1# classify shortnew_classes= {}
fori, paragraphinenumerate(paragraphs):
ifparagraph.class_type!='short':
continueprev_neighbour=get_prev_neighbour(i, paragraphs, ignore_neargood=True)
next_neighbour=get_next_neighbour(i, paragraphs, ignore_neargood=True)
neighbours=set((prev_neighbour, next_neighbour))
ifneighbours==set(['good']):
new_classes[i] ='good'elifneighbours==set(['bad']):
new_classes[i] ='bad'# it must be set(['good', 'bad'])elif (prev_neighbour=='bad'andget_prev_neighbour(i, paragraphs, ignore_neargood=False) =='neargood') or \
(next_neighbour=='bad'andget_next_neighbour(i, paragraphs, ignore_neargood=False) =='neargood'):
new_classes[i] ='good'else:
new_classes[i] ='bad'fori, cinnew_classes.items():
paragraphs[i].class_type=c# revise neargoodfori, paragraphinenumerate(paragraphs):
ifparagraph.class_type!='neargood':
continueprev_neighbour=get_prev_neighbour(i, paragraphs, ignore_neargood=True)
next_neighbour=get_next_neighbour(i, paragraphs, ignore_neargood=True)
if (prev_neighbour, next_neighbour) == ('bad', 'bad'):
paragraph.class_type='bad'else:
paragraph.class_type='good'# more good headingsfori, paragraphinenumerate(paragraphs):
ifnot (paragraph.headingandparagraph.class_type=='bad'andparagraph.cf_class!='bad'):
continuej=i+1distance=0whilej<len(paragraphs) anddistance<=max_heading_distance:
ifparagraphs[j].class_type=='good':
paragraph.class_type='good'breakdistance+=len(paragraphs[j].text)
j+=1
The text was updated successfully, but these errors were encountered:
reviseParagraphClassification
returnsparagraphs
, but it doesn't look like theparagraphs
get manipulated in any way, and thatreviseParagraphs
is where the new filtered data is. IsreviseParagraphs
supposed to be returned instead ofparagraphs
?I added the python reference code for convenience.
The text was updated successfully, but these errors were encountered: