Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for user info (user info gql) #1868

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Conversation

SaeidB
Copy link
Contributor

@SaeidB SaeidB commented Apr 9, 2024

__a=1&__d=dis no longer work , replaced with web_profile_info

__a=1&__d=dis no longer work , replaced with web_profile_info
@claell
Copy link
Contributor

claell commented Apr 10, 2024

While you probably found the correct workaround, this doesn't seem to be compatible with the already existing methods and setup of instagrapi.

@subzeroid (or @tajbowness), as this is a rather important problem, do you possibly have the time to look into fixing this, based on this PR? The solution itself looks pretty easy and just needs to be fit into the existing code base.

@SaeidB
Copy link
Contributor Author

SaeidB commented Apr 10, 2024

While you probably found the correct workaround, this doesn't seem to be compatible with the already existing methods and setup of instagrapi.

@subzeroid (or @tajbowness), as this is a rather important problem, do you possibly have the time to look into fixing this, based on this PR? The solution itself looks pretty easy and just needs to be fit into the existing code base.

What do you mean ? Did you try and got any error ? I tried it and worked for me without any problem, the native public request of Instagrapi uses __a=1 and i bring plane requests to this part to fix it, you can use "extract_gql..." or not, i comment that line, if you use commented line, the result is exactly the same as before, you can easily get user info without need of login

Copy link
Contributor

@claell claell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed your code to highlight the current problems.

Of course, I am sure that it works right now. It is just not fitting the project structure and thus creating new issues and maintainability problems if it's merged like this.

Let me know if you need more help with refactoring your code.

instagrapi/mixins/user.py Outdated Show resolved Hide resolved
@@ -141,7 +141,10 @@ def user_info_by_username_gql(self, username: str) -> User:
An object of User type
"""
username = str(username).lower()
return extract_user_gql(self.public_a1_request(f"/{username!s}/")["user"])
headers = {'Host': 'www.instagram.com','X-Requested-With': 'XMLHttpRequest','Sec-Ch-Prefers-Color-Scheme': 'dark','Sec-Ch-Ua-Platform': '"Linux"','X-Ig-App-Id': '936619743392459','Sec-Ch-Ua-Model': '""','Sec-Ch-Ua-Mobile': '?0','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.112 Safari/537.36','Accept': '*/*','X-Asbd-Id': '129477','Sec-Fetch-Site': 'same-origin','Sec-Fetch-Mode': 'cors','Sec-Fetch-Dest': 'empty','Referer': 'https://www.instagram.com/','Accept-Language': 'en-US,en;q=0.9','Priority': 'u=1, i'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you set up new headers here? Please use the existing methods. Otherwise, the user cannot set its own headers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new endpoint for user info need some new headers and user-agent and the old headers wont work , this is the reason i put headers directly , and instagrapi users wont need to change headers in gql requests like this request , so the better idea is to use headers directly in endpoints like this , understood ?

Copy link
Contributor

@claell claell Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh. But just giving new headers hard coded is not a clean way to go in that case, either. This might also result in Instagram banning clients, if they see requests with different user agents from the same IP.

So, how do the new headers differ from the old expected ones? Can they somehow get translated into each other? Or be set explicitly somewhere, like the old headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as i said this is a public request (gql , or non auth request) , and the headers are same everytime you send request , even the current model uses same headers everytime (for public request) ( monitor the instagrapi requests so you can see ) , so instagram will not ban you ,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if I set a different user agent, this won't get used for this request. Also, and I repeat, hardcoding the header here is the wrong place.

I am happy to assist you with improving the code quality. But I hope, you can understand why this is not good coding style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the instagrapi gql requests , the headers are same everytime , it uses its own user agent , not logged-in client user agent (mobile), not base_headers , if you use another user agent for this request , it will return you "mismatch user agent error etc"

instagrapi/mixins/user.py Outdated Show resolved Hide resolved
instagrapi/mixins/user.py Outdated Show resolved Hide resolved
remove commented line
@SaeidB
Copy link
Contributor Author

SaeidB commented Apr 10, 2024

the new endpoint for user info need some new headers and user-agent and the old headers wont work , this is the reason i put headers directly , and instagrapi users wont need to change headers in gql requests like this request , so the better idea is to use headers directly in endpoints like this , understood ?

commented line has been removed. thank you !

@claell
Copy link
Contributor

claell commented Apr 10, 2024

I hope, you understand my points. Hardcoding values like this and not relying on existing functions is in general a bad practice (maintainability, generalizability, ...).

I got it that the endpoint probably requires different requests, so in that case, probably the way to go is to generalize the existing functions for the new use cases.

@SaeidB
Copy link
Contributor Author

SaeidB commented Apr 10, 2024

I hope, you understand my points. Hardcoding values like this and not relying on existing functions is in general a bad practice (maintainability, generalizability, ...).

I got it that the endpoint probably requires different requests, so in that case, probably the way to go is to generalize the existing functions for the new use cases.

as i said this is a public request (gql , or non auth request) , and the headers are same everytime you send request , even the current model uses same headers everytime (for public request) ( monitor the instagrapi requests so you can see ) , so instagram will not ban you , rotating headers is for private request ( authenticated requests , my pr is for non authenticated request for user info ) , now do you understand ?

@claell
Copy link
Contributor

claell commented Apr 10, 2024

Here, you can see the current setup:

def public_a1_request(self, endpoint, data=None, params=None, headers=None):
. You can possibly also create a new function there (if needed) and then rely on that one. Please note that there is also already a function for public requests:
def public_request(
.

@SaeidB
Copy link
Contributor Author

SaeidB commented Apr 10, 2024

Here, you can see the current setup:

def public_a1_request(self, endpoint, data=None, params=None, headers=None):

. You can possibly also create a new function there (if needed) and then rely on that one. Please note that there is also already a function for public requests:

def public_request(

.

this is for __a=1 , they removed that

@SaeidB
Copy link
Contributor Author

SaeidB commented Apr 10, 2024

check the instagrapi gql requests , the headers are same everytime , it uses its own user agent , not logged-in client user agent (mobile), not base_headers , if you use another user agent for this request , it will return you "mismatch user agent error etc"

@claell
Copy link
Contributor

claell commented Apr 10, 2024

My main argument stands. Hard coding this like you did is not good practice. Also, there is no error handling for the request.

The already existing functions provide that already. So, I hope, you can find a way how to rely on them (or refactoring them appropriately). I will be of help, if needed.

@claell
Copy link
Contributor

claell commented Apr 10, 2024

I meant hard coding like you did. You hard coded this in the wrong place, where it can't be reused. There is no error handling, no reliance on good, already existing functions.

@SaeidB
Copy link
Contributor Author

SaeidB commented Apr 10, 2024

My main argument stands. Hard coding this like you did is not good practice. Also, there is no error handling for the request.

The already existing functions provide that already. So, I hope, you can find a way how to rely on them (or refactoring them appropriately). I will be of help, if needed.

you can see hard-coded by the author in the link below, no problem in public request to hard-code headers

"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 "

thats why i told you to check instagrapi public requests in mitm

thank you for your contribution

@claell
Copy link
Contributor

claell commented Apr 10, 2024

(You can also edit GitHub comments, no need to delete and send a new one)

instagrapi/mixins/user.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@SaeidB SaeidB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats not personalized , its app id

@@ -141,7 +141,10 @@ def user_info_by_username_gql(self, username: str) -> User:
An object of User type
"""
username = str(username).lower()
return extract_user_gql(self.public_a1_request(f"/{username!s}/")["user"])
headers = {'Host': 'www.instagram.com','X-Requested-With': 'XMLHttpRequest','Sec-Ch-Prefers-Color-Scheme': 'dark','Sec-Ch-Ua-Platform': '"Linux"','X-Ig-App-Id': '936619743392459','Sec-Ch-Ua-Model': '""','Sec-Ch-Ua-Mobile': '?0','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.112 Safari/537.36','Accept': '*/*','X-Asbd-Id': '129477','Sec-Fetch-Site': 'same-origin','Sec-Fetch-Mode': 'cors','Sec-Fetch-Dest': 'empty','Referer': 'https://www.instagram.com/','Accept-Language': 'en-US,en;q=0.9','Priority': 'u=1, i'}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the instagrapi gql requests , the headers are same everytime , it uses its own user agent , not logged-in client user agent (mobile), not base_headers , if you use another user agent for this request , it will return you "mismatch user agent error etc"

instagrapi/mixins/user.py Outdated Show resolved Hide resolved
@claell
Copy link
Contributor

claell commented Apr 11, 2024

Here, you can see some part of the already existing request error handling code:

https://github.com/subzeroid/instagrapi/blob/b83a00b6d5c23029d9ab22b573ef26ebc79ef262/instagrapi/mixins/public.py#L190C1-L202C54

So I hope this helps you understand, why it is important to rely on existing functions and why just using requests like you did is not error prone.

@SaeidB
Copy link
Contributor Author

SaeidB commented Apr 11, 2024

the current code which i wrote in my pr will work , however if you mean adding error handling , you can do it by this :

return extract_user_gql(json.loads(cl.public_request(f'https://www.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=headers))['data']['user'])

but the custom headers still needed

added self.public_request for better error handling.
@claell
Copy link
Contributor

claell commented Apr 11, 2024

Already better. But still needs some more refactoring, from my perspective.

One possible problem now is this line:

https://github.com/subzeroid/instagrapi/blob/b83a00b6d5c23029d9ab22b573ef26ebc79ef262/instagrapi/mixins/public.py#L130C13-L130C48

Not sure, but probably we don't want to update the internal headers with the ones that are only used for that request?

Additionally, probably some more refactoring is sensible (like moving the custom headers to a different place. Defining them here like you do is not really sensible, as it won't allow any other functions to make use of them.).

@SaeidB
Copy link
Contributor Author

SaeidB commented Apr 11, 2024

Not sure, but probably we don't want to update the internal headers with the ones that are only used for that request?\n\nAdditionally, probably some more refactoring is sensible (like moving the custom headers to a different place. Defining them here like you do is not really sensible, as it won't allow any other functions to make use of them.).

Correct we don't need to use these headers on the other methods so make sense to pass them directly to the self.public_request instead of update the headers

@claell
Copy link
Contributor

claell commented Apr 12, 2024

Try to have a look at my referenced line again. It shows that if headers are passed to public_request, the headers will then get updated (which is not wanted). So that needs to be slightly refactored, probably.

@SaeidB
Copy link
Contributor Author

SaeidB commented Apr 12, 2024

Try to have a look at my referenced line again. It shows that if headers are passed to public_request, the headers will then get updated (which is not wanted). So that needs to be slightly refactored, probably.

yeah i got you ,

default_public_headers = self.public.headers
temporary_public_headers = {'Host': 'www.instagram.com','X-Requested-With': 'XMLHttpRequest','Sec-Ch-Prefers-Color-Scheme': 'dark','Sec-Ch-Ua-Platform': '"Linux"','X-Ig-App-Id': '936619743392459','Sec-Ch-Ua-Model': '""','Sec-Ch-Ua-Mobile': '?0','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.112 Safari/537.36','Accept': '*/*','X-Asbd-Id': '129477','Sec-Fetch-Site': 'same-origin','Sec-Fetch-Mode': 'cors','Sec-Fetch-Dest': 'empty','Referer': 'https://www.instagram.com/','Accept-Language': 'en-US,en;q=0.9','Priority': 'u=1, i'}
data = extract_user_gql(json.loads(self.public_request(f'https://www.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=temporary_public_headers))['data']['user'])
self.public.headers.update(default_public_headers)
return data

@claell
Copy link
Contributor

claell commented Apr 12, 2024

Your solution fixes the problem, but I think, there is a better way:

Refactoring some functions in public.py.

Why? Because I am rather confident that the headers will be needed for other calls as well in the future. So one possible suggestion would be adding another function similar to public_a1_request in public.py.

That function will have the headers (there, they can probably be hardcoded, then), and it will call public_request in a similar manner, as you showed in your code.

I suggest though, to also refactor public_request and _send_public_request with an additional parameter update_headers=True. Then, you can set this parameter to False in your calls and that will prevent updating the headers in the first place (which seems to be cleaner, compared to setting them first and then reversing that afterwards).

This was kind of what I had in mind all along, and I thought, it was rather obvious, but apparently, my suggestions weren't clear to you, before.

What do you think?

user info gql fix + implemented in public_request + avoid headers to be updated
user info gql fix + implemented in public_request + avoid headers to be updated
@SaeidB
Copy link
Contributor Author

SaeidB commented Apr 12, 2024

check the changes i made now , i added update_headers=False and i think everything is good now and enough for this step

user info gql fix + implemented in public_request + avoid headers to be updated
@SaeidB SaeidB requested a review from claell April 15, 2024 12:19
@claell
Copy link
Contributor

claell commented Apr 16, 2024

Thanks, some more nitpicks, though.

instagrapi/mixins/public.py Show resolved Hide resolved
instagrapi/mixins/public.py Show resolved Hide resolved
instagrapi/mixins/user.py Show resolved Hide resolved
instagrapi/mixins/user.py Show resolved Hide resolved
@claell
Copy link
Contributor

claell commented Apr 23, 2024

@subzeroid, can you give a review and possibly make some calls on discussed topics?

@marek-knappe
Copy link

Just add one thing, hardcoding user-agent can lead to auto-detection of the bot on the Instagram-side so I would recommend actually making it exactly the same globally for every request or at least being able to set it as a config variable.
Also having that variable (temporary_public_headers) globally available (instead of having it in the user scope) would help if we will have any other request that will need to use it - but that can be tomorrow's problem :)

@SaeidB
Copy link
Contributor Author

SaeidB commented May 2, 2024

Just add one thing, hardcoding user-agent can lead to auto-detection of the bot on the Instagram-side so I would recommend actually making it exactly the same globally for every request or at least being able to set it as a config variable.
Also having that variable (temporary_public_headers) globally available (instead of having it in the user scope) would help if we will have any other request that will need to use it - but that can be tomorrow's problem :)

Hi, thx for your attention
Actually user agent will not be detected .

@marek-knappe
Copy link

marek-knappe commented May 7, 2024

@SaeidB

Traceback (most recent call last):
  File "/Users/mknappe/Projects/inst/instagrapi/test.py", line 2, in <module>
    from instagrapi import Client
  File "/Users/mknappe/Projects/inst/instagrapi/instagrapi/__init__.py", line 31, in <module>
    from instagrapi.mixins.public import (
  File "/Users/mknappe/Projects/inst/instagrapi/instagrapi/mixins/public.py", line 83
    update_headers=None
                   ^^^^
SyntaxError: invalid syntax. Perhaps you forgot a comma?

Forgotten comma on that line.

also:

name 'json' is not defined
Traceback (most recent call last):
  File "/Users/mknappe/Projects/inst/instagrapi/instagrapi/mixins/user.py", line 194, in user_info_by_username
    user = self.user_info_by_username_gql(username)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mknappe/Projects/inst/instagrapi/instagrapi/mixins/user.py", line 146, in user_info_by_username_gql
    data = extract_user_gql(json.loads(self.public_request(f'https://www.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=temporary_public_headers))['data']['user'], update_headers=update_headers)
                            ^^^^
NameError: name 'json' is not defined

No Json imported for json.loads

Also:

TypeError: extract_user_gql() got an unexpected keyword argument 'update_headers'

Looks like user.py:147 has a update_headers after () from public request

SaeidB added 2 commits May 7, 2024 09:58
 fix missing comma
import json
@SaeidB
Copy link
Contributor Author

SaeidB commented May 7, 2024

@SaeidB

Traceback (most recent call last):
  File "/Users/mknappe/Projects/inst/instagrapi/test.py", line 2, in <module>
    from instagrapi import Client
  File "/Users/mknappe/Projects/inst/instagrapi/instagrapi/__init__.py", line 31, in <module>
    from instagrapi.mixins.public import (
  File "/Users/mknappe/Projects/inst/instagrapi/instagrapi/mixins/public.py", line 83
    update_headers=None
                   ^^^^
SyntaxError: invalid syntax. Perhaps you forgot a comma?

Forgotten comma on that line.

also:

name 'json' is not defined
Traceback (most recent call last):
  File "/Users/mknappe/Projects/inst/instagrapi/instagrapi/mixins/user.py", line 194, in user_info_by_username
    user = self.user_info_by_username_gql(username)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mknappe/Projects/inst/instagrapi/instagrapi/mixins/user.py", line 146, in user_info_by_username_gql
    data = extract_user_gql(json.loads(self.public_request(f'https://www.instagram.com/api/v1/users/web_profile_info/?username={username}', headers=temporary_public_headers))['data']['user'], update_headers=update_headers)
                            ^^^^
NameError: name 'json' is not defined

No Json imported for json.loads

Also:

TypeError: extract_user_gql() got an unexpected keyword argument 'update_headers'

Looks like user.py:147 has a update_headers after () from public request

hi, thanks for the mentioning
the missing comma and missing json import has been fixed , for the last issue (unexpected keyword) i think you forgot to update public.py , can you update user.py and public.py and try again ?

@marek-knappe
Copy link

@SaeidB Works now, thanks, also I see that there are some conflicts so you need to resolve it before anyone will merge it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants