Skip to content
This repository has been archived by the owner on Mar 30, 2023. It is now read-only.

Twint doesn't get all followers list #340

Open
3 tasks done
mmosleh opened this issue Jan 29, 2019 · 26 comments
Open
3 tasks done

Twint doesn't get all followers list #340

mmosleh opened this issue Jan 29, 2019 · 26 comments

Comments

@mmosleh
Copy link

mmosleh commented Jan 29, 2019

It seems Twint doesn't get the list of all followers for accounts with large number of followers and stoped abruptly at some random number. For example tried twint -u nasa --followers and each time the script stopped at some random number with few thousand screen_names.

  • Python version is 3.6;
  • Updated Twint with pip3 install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint;
  • I have searched the issues and there are no duplicates of this issue/question/request.
@pielco11
Copy link
Member

It could be possible that Twitter stops returning new entities because you (as everyone else) in that case requested too many queries.

@mmosleh
Copy link
Author

mmosleh commented Jan 29, 2019

Thanks @pielco11 , I was wondering what is the way around it. e.g., If it raises an error and it can continue pulling the information from that point. or controlling the requests within some limit so that doesn't happen. thanks

@pielco11
Copy link
Member

I'll look deeper (can't determine a dead-line), for now I can say that the issue does not seem to have an unique pattern. I
tried a couple of queries and got a long list, ~100k users. Plus when one stopped I started a new one, and this lasted a long. So I guess that's not Twitter that's blocking you, for what I tested I think that using a VPN will not get you around the issue.

A solution could be to re-try the query when it fails, anyway the code should be changed after a deeper look at what-is-going-on

@mmosleh
Copy link
Author

mmosleh commented Feb 4, 2019

Thank you @pielco11 !

@castrovictor
Copy link

castrovictor commented Feb 9, 2019

I'll look deeper (can't determine a dead-line), for now I can say that the issue does not seem to have an unique pattern. I
tried a couple of queries and got a long list, ~100k users. Plus when one stopped I started a new one, and this lasted a long. So I guess that's not Twitter that's blocking you, for what I tested I think that using a VPN will not get you around the issue.

A solution could be to re-try the query when it fails, anyway the code should be changed after a deeper look at what-is-going-on

Hi, first of all, thanks for doing such amazing tool and public the code, I am sure I will learn a lot from your work. I tried getting a long list, ~130k and it stopped in random number of followers in each query.
On the other hand, I am making a script to get all tweet links of a user, because I think your tool does not do it. Without log in and without using the API, but in some way, after doing a lot of querys (with my script), twitter blocks your user's tweets search. Using an VPN, the problem was solved. This is just to give you some information.

Finally, If you have a PayPal account, I would like to buy you a coffe for posting the source code, because as I said, I would like to learn who do you did the tool, which would be impossible without the source code.

@pielco11
Copy link
Member

pielco11 commented Feb 9, 2019

~130k followers are a lot so Twitter might be blocking requests at a random time

For the second point, that's why Twitter blocks an IP if it makes too many requests, that's why using a VPN solves the problem.

What we could try is handling that "followers count" issue and ask the user to change the IP and then retry the query, and see if this solves the issue.

Unfortunately I do not have enough time to solve every issue, so the patch will be delayed. Every kind of help in the development is widely accepted

@pielco11
Copy link
Member

@mmosleh Here is what's going on

immagine

immagine

In the first case there is a show more, Twint extracts that link and does a new request. Then that button vanishes so Twint is not able to make a new request.
If I get the last cursor-id and make a new request changing the IP and stuff, nothing changes

I think that we found the origin of the issue and sadly we can't do anything, at least for now

@mmosleh
Copy link
Author

mmosleh commented Feb 11, 2019

@pielco11 I made a quick dirty patch into the previous version of Twint (the one with a single file). Just few retrial on the last curser-id when receive the error massage. I managed to download all 32M NASA followers this way. (I'm not familiar the code base on the new version though)

@pielco11
Copy link
Member

@mmosleh oh, nice... may you provide me the commit id? git rev-parse HEAD

@pielco11 pielco11 reopened this Feb 11, 2019
@castrovictor
Copy link

@mmosleh oh, nice... may you provide me the commit id? git rev-parse HEAD

So, was the update uploaded? Is it possible to download a large list of followers? as @mmosleh managed to do

@pielco11
Copy link
Member

Adding timeout seems to solve the issue.

Without timeouts I'm able to get upto 40 followers/following, adding time.sleep(3) to line 161 in twint/get.py allows me to get upto 440 followers/following

@KrisM-tor
Copy link

In the current iteration of get.py - has this issue been resolved? I'm not seeing the time.sleep(3) line within the script

thanks once again!

@pielco11
Copy link
Member

@KristopherMakuch I did not apply that "patch" since I'm not sure that's a patch. More testing is needed, everyone is welcome to find a workaround

@Matiusco
Copy link

Matiusco commented Jun 14, 2019

how do I include a control file to know on which page it stopped?

example:
twint -u username --followers -o username_followers.txt username_page.txt -t 3 -r 15

username_page.txt = file with last id page followers.
-t = 3 (Time elipse for new page followers)
-r = 15 (time random ofr new page followers)
the time to go to the next page to find followers should be the sum of t + r. R will always be random and can be 2, 3 or 15. So time will vary.

If processing is interrupted, it may after a while try to run again and will continue from the followers page according to the id of the page in file.

My original command:
twint -u username --followers -o username_followers.txt

my error today:
CRITICAL:root:twint.feed:Follow:IndexError

file with 1036 followers, but this profile have 3.800 followers.

Thanks again for all work.
Congratz

Sorry my poor english....
my first language is portugues.

@pielco11
Copy link
Member

Hi @Matiusco

to add some timeouts you have just to add a line as descried above

You could resume the scrape with something like twint -u username --followers -o user_followers.txt --resume username_followers_resume.txt

When Twint will stop (most probably because Twitter does not return more data) you will have just to re-run the command to resume from where it stopped

@Matiusco
Copy link

Matiusco commented Jun 17, 2019

hi @pielco11
Thanks for informing.
I will try to perform this operation.

edited report
yes, work perfect now.
resume.txt is [] if finished. ;)

@Matiusco
Copy link

Matiusco commented Jun 19, 2019

I can not get all followers.
Sometimes reaching up to 15,000 others ends in 9,000, but the resume file is empty [].

my command in terminal:

twint -u zehdeabreu --followers -o user_followers.txt --resume zehdeabreu_followers_resume.txt -t 15 -l 50

-t 15 and -l 50 not working...

How could I do inside a python file to control the time of each request to give a much longer time between requests?

=====
max followers at moment is:

wc -l user_followers.txt

16470 user_followers.txt

tks for all help

@pielco11
Copy link
Member

-t is not implemented, yet (at least); -l is for the lang, --limit is for the limit. If you want to control the time for each request, you have to play with get.py

Your query should be something like twint -u zehdeabreu --followers -o user_followers.txt --resume zehdeabreu_followers_resume.txt --limit 60

I also tried the resume option and it works fine

@Matiusco
Copy link

tks @pielco11 , I'll try

@mmosleh
Copy link
Author

mmosleh commented Jun 20, 2019 via email

@Matiusco
Copy link

ok @mmosleh , but I do not know how I could change this part in code get.py

I still can not get all the followers.

@nxhuy-github
Copy link

How can I get the IDs of the followers instead of the username, please? Thank you

@pielco11
Copy link
Member

@nxhuy-github please write comments about the topic of the issue. Anyway you can do that using .Lookup as showed in the wiki

@datduong
Copy link

datduong commented Oct 2, 2019

Hi, what is the current status for the code that retrieves all the followers of one person? I am still having the problem that only a subset of followers is downloaded. I am using the command
twint -u SpeakerPelosi --followers but unable to get all 3 millions followers (my result is only about 30k users). I saw that line 161 has a timeout. Would increasing this timeout help ?

@pielco11
Copy link
Member

pielco11 commented Oct 2, 2019

@datduong Twitter works effectively to not allow Twint to get all the followers, I highly suggest you to use the API

@yuiseki
Copy link
Contributor

yuiseki commented Nov 15, 2019

Hi. I faced same issue now.
I've trial and error so many times, And perhaps I found some workaround of this issue.

My found is this:

  • For example, twint -u nasa --following --resume nasa_following_resume.txt --limit 60 is basically works well.
  • When repeating above command in short period, We got CRITICAL:root:twint.feed:Follow:IndexError
  • But after waiting of several secounds, We can resume above command once again.
  • Wait and Resume above command, I can perfectly collect hundreds followings.

My Proposal is this:

  • twint command should add command line args like --wait-random 120, for example.
  • When twint faced CRITICAL:root:twint.feed:Follow:IndexError, twint should wait random seconds and try again the command.
  • For final ideal command is like this: twint -u nasa --following --wait-random 120.
    • --resume filename is should automatically determine or store only in memory.
    • --limit 60 is should determine appropriate default value.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

8 participants