BaiduSpider

A perfect tool for crawling Baidu
简体中文 | 繁體中文 | English
Getting Started »

View Demo · Report Issue · Feature Request

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact
Disclaimer
Contributors
Acknowledgements

About The Project

Search engine is a very powerful tool. However, if other tools could implant the most features of the search engine, then it will be even more powerful. But, I have not found any web spider to extract the search results accurately. So, with that goal in mind, I developed this project to crawl Baidu: BaiduSpider.

Here's why:

Makes the time of extracting data less, which speeds up the development of projects like deep-learning.
Extract data accurately, without Ads.
Provides in-detailed search results, supports multiple search types and return models.

Of course, nothing is perfect, including this project. Any open-source project needs the community's help. You can help BaiduSpider by opening an issue or submit a PR! 😄

Some of the helpful documentations and tools will be listed in the acknowledgements.

Built With

Some open-source packages used in BaiduSpider.

BeautifulSoup 4
requests

Getting Started

Please follow the steps below in order to install BaiduSpider.

Prerequisites

Before installing BaiduSpider, please make sure you have Python3.6+ installed:

$ python --version

If the version is lower than 3.6.0, please go to python.org to download and install a higher version of Python.

Installation

Installing Using `pip`

Please enter the following commands in the terminal:

$ pip install baiduspider

Installing Manually Using GitHub

$ git clone git@github.com:BaiduSpider/BaiduSpider.git

# ...

$ python setup.py install

Usage

You can get the search result by using one simple command using BaiduSpider:

# Import BaiduSpider
from baiduspider import BaiduSpider
from pprint import pprint

# Generate the BaiduSpider object
spider = BaiduSpider()

# Search the web
pprint(spider.search_web(query='Python'))

For more examples and configurations, please refer to the documentation.

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b NewFeatures)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin username/BaiduSpider)
Open a Pull Request

License

Distributed under the GPL-V3 License. See LICENSE for more information.

Contact

samzhangjy - @samzhangjy - samzhang951@outlook.com

Project Link: https://github.com/BaiduSpider/BaiduSpider

Disclaimer

This project can only be used for learning purposes and cannot be used in commercial projects or crawl a lot of data. Also, BaiduSpider is distributed under the GPL-V3 license, meaning any project using BaiduSpider must be open-source and link to this project. The author of this project will not afford any legal risks. It is hereby stated that offenders are responsible for the consequences.

Contributors

Acknowledgements

BeautifulSoup 4
requests
Img Shields
Gitmoji
Best-README-Template
Choose an Open Source License
GitHub Pages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README-en.md

README-en.md

BaiduSpider

About The Project

Built With

Getting Started

Prerequisites

Installation

Installing Using `pip`

Installing Manually Using GitHub

Usage

Roadmap

Contributing

License

Contact

Disclaimer

Contributors

Acknowledgements

Files

README-en.md

Latest commit

History

README-en.md

File metadata and controls

BaiduSpider

About The Project

Built With

Getting Started

Prerequisites

Installation

Installing Using pip

Installing Manually Using GitHub

Usage

Roadmap

Contributing

License

Contact

Disclaimer

Contributors

Acknowledgements

Installing Using `pip`