Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blurred pages not showing up #62

Open
peppeamend98 opened this issue Jan 17, 2024 · 8 comments
Open

blurred pages not showing up #62

peppeamend98 opened this issue Jan 17, 2024 · 8 comments

Comments

@peppeamend98
Copy link

image
what do i need to do?

@PhammyJr
Copy link

I also have this problem. Some pages will say "Blurred Content in Page ___" and all you see is blank white page.

@isaackogan
Copy link

yep

@ArtyomCZ
Copy link

For me, the page loads but the overlay remains. Using newest version 0.5.6

Snímek obrazovky 2024-01-19 181907

@chrisfeldkircher
Copy link

@peppeamend98 the issue here is that they use two different CDNs. One is hosted on their own server (https://pieces.studocu.com), and the other is hosted by Amazon (https://d3tvd1u91rr79.cloudfront.net). For some reason (it may be geographical nature to load resources more efficiently; Spotify and Co. use the same strategy to ensure global efficient distribution), they employ their own CDN for some documents, where the plugin works just fine, and for others, they use the Amazon one. When we try to replace the /blurred part in the image src, it gives a 403 (unauthorized access error). When a document is uploaded, it gets processed, where the text of the pdf is extracted, and only the parts (like graphics, images, and so on) are stored as images. When you now try to access the uploaded document, the image is loaded as background, and the text is laid over it. As you might have realized, no text is overlayed when the image is blurred. It is hard to circumvent this, as getting the unblurred images is only part of the problem. I am currently trying to understand how the text is loaded. However, the js code is quite messy as it uses extensive libraries (react, PerimeterX (against web-scrapping), and so on), which are loaded in chunks.

@KiSa04
Copy link

KiSa04 commented Jan 27, 2024

@peppeamend98

This is the script that gets the text and this is the part of the code that loads the text (I mean, it's the part where the text is not loaded, because the pages are blurred) :

if(o&&i&&!j()(l).call(l,r)){var u,d="pages/blurred/page".concat(r,".webp"),h=E.Z.get("blurred_content_of_page",{pageNumber:r}),v=S()(u='<div class="blurred-container"><img alt="'.concat(h,'" src="')).call(u,d,'" />');`

(the variable s, which is above the code I pasted, contains the query parameters, so if you manipulate the script directly, you need to switch from "blurredPage" to the page0 (which is always unblurred)

and this is where I got so far:

const originalUrl = bluredContainer.firstChild.src; let regex = /cloudfront.net/([^\/]+)/html/; let match = originalUrl.match(regex);

let dynamicValue; if (match && match[1]) { dynamicValue = match[1]; console.log("Dynamic Value:", dynamicValue); }

let ovalue; regex = /Signature=([^\/]+)&/; match = originalUrl.match(regex); if (match && match[1]) { ovalue = match[1]; }

regex = /Policy=([^&]+)&Signature/; match = originalUrl.match(regex);

let modifiedUrl = originalUrl .replace("pages/blurred/page", dynamicValue) .replace(".webp", ".page") .replace(ovalue, "${Premium_Signature from free trial}");

if (match && match[1]) { const base64String = match[1]; const decodedString = atob(base64String); regex = /Time":([^&]+)}}}]}/; match = decodedString.match(regex); if (match && match[1]) { const encodedString = btoa(decodedString.replace('pages\/blurred\/', '').replace(match[1], '1706411507')).replace('=', '_'); modifiedUrl = modifiedUrl.replace(base64String, encodedString); }
}

bluredContainer.firstChild.src = modifiedUrl;

issues:

apparently, the validation process includes reading the timestamp present in the Policy token [edit: fixed by replacing the timestamp in the decoded token)
though I haven't tried it, it's almost certain that the signature is only valid for a given document
edit: yes, the signature is only valid within a given document

@KXTOD
Copy link

KXTOD commented Feb 3, 2024

@KiSa04 have u made any progress? I'm getting stuck at the signature part. You can only use one signature for one page and not for another. Did you find a solution for this?

@KiSa04
Copy link

KiSa04 commented Feb 7, 2024

@KXTOD I haven't looked any more into it, as I haven't had any time for it, but I doubt we will be able to crack down the signature. I'd say the best approach right now would be reverse engineering their Android App API and finding out if they use aws there as well (I'm very confident their approach is different there - it would be a mess to replicate their way of handling it on the website)

@KiSa04
Copy link

KiSa04 commented Feb 14, 2024

It seems like they are migrating from their servers to aws.

A generous number of documents have been migrated already, so this extension will not work anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants