Skip to content

Latest commit

 

History

History
248 lines (168 loc) · 8.04 KB

README-en.md

File metadata and controls

248 lines (168 loc) · 8.04 KB

Contributors Forks Stargazers Issues MIT License


Logo

BaiduSpider

A perfect tool for crawling Baidu
简体中文 | 繁體中文 | English
Getting Started »

View Demo · Report Issue · Feature Request

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Disclaimer
  9. Contributors
  10. Acknowledgements

About The Project

Product screenshot

Search engine is a very powerful tool. However, if other tools could implant the most features of the search engine, then it will be even more powerful. But, I have not found any web spider to extract the search results accurately. So, with that goal in mind, I developed this project to crawl Baidu: BaiduSpider.

Here's why:

  • Makes the time of extracting data less, which speeds up the development of projects like deep-learning.

  • Extract data accurately, without Ads.

  • Provides in-detailed search results, supports multiple search types and return models.

Of course, nothing is perfect, including this project. Any open-source project needs the community's help. You can help BaiduSpider by opening an issue or submit a PR! 😄

Some of the helpful documentations and tools will be listed in the acknowledgements.

Built With

Some open-source packages used in BaiduSpider.

Getting Started

Please follow the steps below in order to install BaiduSpider.

Prerequisites

Before installing BaiduSpider, please make sure you have Python3.6+ installed:

$ python --version

If the version is lower than 3.6.0, please go to python.org to download and install a higher version of Python.

Installation

Installing Using pip

Please enter the following commands in the terminal:

$ pip install baiduspider

Installing Manually Using GitHub

$ git clone [email protected]:BaiduSpider/BaiduSpider.git

# ...

$ python setup.py install

Usage

You can get the search result by using one simple command using BaiduSpider:

# Import BaiduSpider
from baiduspider import BaiduSpider
from pprint import pprint

# Generate the BaiduSpider object
spider = BaiduSpider()

# Search the web
pprint(spider.search_web(query='Python'))

For more examples and configurations, please refer to the documentation.

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b NewFeatures)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin username/BaiduSpider)
  5. Open a Pull Request

License

Distributed under the GPL-V3 License. See LICENSE for more information.

Contact

samzhangjy - @samzhangjy - [email protected]

Project Link: https://github.com/BaiduSpider/BaiduSpider

Disclaimer

This project can only be used for learning purposes and cannot be used in commercial projects or crawl a lot of data. Also, BaiduSpider is distributed under the GPL-V3 license, meaning any project using BaiduSpider must be open-source and link to this project. The author of this project will not afford any legal risks. It is hereby stated that offenders are responsible for the consequences.

Contributors

Acknowledgements