Skip to content

SOMJANG/Instagram_Crawler

Repository files navigation

Instagram_Crawler

Extract Data From Instagram Using Selenium/Python.

Detail InfoDescriptionInstall LibrariesGet StartedArchitectureStackContribute

logo_image

Description

Instagram Crawler is a python module for crawling Instagram data.

⚠️ If you access more than a certain number of posts on Instagram, the posts are no longer loaded. Therefore, about 100 to 300 posts can be crawled.

Install

Simply run :

pip install -r requirements.txt

You can also install additional dependencies (for running examples, generating documentation, etc...) with : ⚠️ Python ≥ 3.6 required

Get Started

The full documentation contains more detailed tutorials, but to get a taste of the framework, you can take a look at the examples folder.
Let's look at the easy example, bart_easy.py. You can run the example with following command :

$ python3 main.py --id=[user_id] \
  --password=['user_password']\
  --hash_tag=[hash_tag] \
  --display=[0 or 1] \
  --extract_num=[extract_num: int] \
  --login_option=[instagram or facebook] \
  --extract_file=[file name] \
  --extract_tag_file=[tag file name] \
  --driver_path=[chromedriver path]
# -*- coding:utf-8 -*-

import argparse
from instagram_crawler.metadata import EXTRACT_NUM, LOGIN_OPTION, SAVE_FILE_NAME, SAVE_FILE_NAME_TAG
from instagram_crawler.extract_data import crawling_instagram


parser = argparse.ArgumentParser(description='Crawling Instagram Post - Comment',
                                 formatter_class=argparse.RawTextHelpFormatter)


def get_arguments():
    parser.add_argument("--driver_path", 
                        help="selenium chrome driver path", 
                        required=True, type=str)

    parser.add_argument("--id", 
                        help="instagram or facebook id", 
                        required=True, type=str)

    parser.add_argument("--password", 
                        help="instagram or facebook password", 
                        required=True, type=str)

    parser.add_argument("--hash_tag", 
                        help="The hashtag you want to extract.", 
                        required=True, type=str)

    parser.add_argument("--display",
                        help="display selenium chromedriver or not 0 or 1",
                        required=True, type=int)


    parser.add_argument("--extract_num", 
                        help="The number of posts I want to extract.", 
                        default=EXTRACT_NUM, type=int)

    parser.add_argument("--login_option", 
                        help="select login account [facebook, instagram]", 
                        default=LOGIN_OPTION, type=str)

    parser.add_argument("--extract_file",
                        help="set extract file name", 
                        default=SAVE_FILE_NAME, type=str)

    parser.add_argument("--extract_tag_file",
                        help="set extract tag file name", 
                        default=SAVE_FILE_NAME_TAG, type=str)

    _args = parser.parse_args()

    return _args


def instagram_main():
    args = get_arguments()
    is_file_save, is_tag_file_save = crawling_instagram(args=args)

    if is_file_save:
        print("file save success - {}".format(args.extract_file))

    if is_tag_file_save:
        print("file save success - {}".format(args.extract_tag_file))


if __name__ == "__main__":
    instagram_main()

Stack

Library used for make result csv file.

Library used for extract instagram data in chrome browser.

Contribute

To contribute, simply clone the repository, add your code in a new branch and open a pull request !