Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

small changes to parsing.py to address issue #559 #564

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mor3dr3ad
Copy link

馃毃Please review the guidelines for contributing to this repository.

Proposed changes

Update of edx_dl/parsing.py to account for new edx structure. Class of sections and subsections has been changed. A change in parsing.py lines 385 and 397 fixes the issue.

Types of changes

What types of changes does your code introduce?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask. We're here
to help! This is simply a reminder of what we are going to look for before
merging your code.

  • I have read the CONTRIBUTING doc
  • I agree to contribute my changes under the project's LICENSE
  • I have checked that the unit tests pass locally with my changes
  • I have checked the style of the new code (lint/pep).
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

If this is a relatively large or complex change, please explain why you chose
the solution you did and what alternatives you considered, etc.

Reviewers

@rbrito

Updated parsing on lines 385 and 397 to account for a slightly changed structure in edx's website. Sections can now have one of two classes: `outline-item section` or `outline-item section scored`. 

Changed code for subsections accordingly too.
recently edx slightly changed their website's structure for some courses:
- sections now have one of two classes: `outline-item section` or `outline-item section scored`
- subsections have one of two classes: `vertical outline-item focusable` or `vertical outline-item focusable scored`.

  this resulted in 0 downloadable sections being found for some courses, e.g. https://courses.edx.org/courses/course-v1:MITx+14.750x+3T2019/course/. 

A slight change in edx_dl/parsing.py in the class NewEdxPageExtractor resolves the issue. Class identifier has been replaced with a list representing both possibilities. This should therefore work for both new and old courses as well as mixed instances.
@coveralls
Copy link

Coverage Status

Coverage remained the same at 47.7% when pulling 006045f on mor3dr3ad:master into 265718c on coursera-dl:master.

@bran22
Copy link

bran22 commented Nov 7, 2019

I'm just a random user of edx-dl, but I wanted to say that I tested this change on 2 non-free courses that I'm taking and it works for me. One course recently had the li classes changed to outline-item section scored which was what caused it to stop working for that course I guess. I'm now able to download the course content for that one again!

The other course has always been li class="vertical outline-item focusable", and that course still works with no issues.

I really appreciate your contribution! I didn't realize it was such a simple change that caused the parser to fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants