Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with the stops data #11

Open
jaimeorrego opened this issue Apr 26, 2018 · 7 comments
Open

Problems with the stops data #11

jaimeorrego opened this issue Apr 26, 2018 · 7 comments

Comments

@jaimeorrego
Copy link

Hello Will,

Thank you for the API it is very nice. I was testing it with different cities in the World, it worked very well in Portland, Or, but I found two issues in toher cities. I tested in Lisbon, Portugal using:

python transitflow.py --name=lisbon --bbox=-9.276933,38.592729,-8.940477,38.803201 --clip_to_bbox
When downloading the transit operators, the API found routes and stops but finds 0 schedule stop pairs. An example of one of the largest operators.

o-eyck-carris 7 / 8
http://transit.land/api/v1/routes?per_page=10000&operated_by=o-eyck-carris
217 routes found.

http://transit.land/api/v1/stops?per_page=10000&served_by=o-eyck-carris
2093 stops found.
 
http://transit.land/api/v1/schedule_stop_pairs?date=2018-04-26&per_page=10000&sort_min_id=0&operator_onestop_id=o-eyck-carris
0 schedule stop pairs found.

Another test I did was in Santiago, Chile. Here the API has problems downloading the stops data.

python transitflow.py --name=Santiago --bbox=-70.673777,-33.460993,-70.595499,-33.394518 --clip_to_bbox

And it seems it cannot connect:

o-66jc-transantiago 2 / 2
http://transit.land/api/v1/routes?per_page=10000&operated_by=o-66jc-transantiago
383 routes found.

http://transit.land/api/v1/stops?per_page=10000&served_by=o-66jc-transantiago
retry 1 / 5: HTTP Error 504: Gateway Time-out
retry 2 / 5: HTTP Error 504: Gateway Time-out
retry 3 / 5: HTTP Error 504: Gateway Time-out
retry 4 / 5: HTTP Error 504: Gateway Time-out
retry 5 / 5: HTTP Error 504: Gateway Time-out
failed:
HTTP Error 504: Gateway Time-out
1 operators successfully downloaded.
1 operators failed.

I thinking in Lisbon case, it may be a problem with the structure of the GTFS data, and in Santiago maybe the file is too large?

Do you have any clues?

Thanks!

@willgeary
Copy link
Contributor

Thanks for noting these issues, @jaimeorrego.

I can confirm the same errors for Lisbon and Santiago. I believe this is happening because large bus systems have a lot of stop_times to download, and the API is stalling with so many big requests.

I tried decreasing the API request size from 10,000 items per page to 1,000 items per page, and this seemed to help things! There are 10x more API requests, but each is 10x smaller. I also increased the API retry limit from 5 to 20, just in case.

Santiago looks better:

screen shot 2018-04-27 at 8 21 55 pm

Strangely, for Lisbon, it fails for me on today's date, but if I try this past Wednesday's date, the stop times for o-eyck-carris do successfully download:

transitflow will$ python transitflow.py --name=lisbon --bbox=-9.276933,38.592729,-8.940477,38.803201 --clip_to_bbox --date=2018-04-23

screen shot 2018-04-27 at 8 36 00 pm

I think I will add a new command line argument --per_page to allow for the user to determine the number of items per page of each API request, as well as --retrylimit.

Does this sound good to you?

Best,
Will

@AnthonyLovesBikes
Copy link

AnthonyLovesBikes commented Apr 29, 2018

Thanks this is very helpful, I have been having both issues above working on Toronto, Canada area. The TTC operator seems to be too large and fails for all dates I have tried, even with the API query set to 1000 - could you test this on your end? the error I get is "[Errno 34] Result too large"
Thanks, I love this tool!

python transitflow.py --name=TTC --operator=o-dpz8-ttc

python transitflow.py --name=Toronto --bbox=-79.472351,43.597798,-79.280777,43.709083 --clip_to_bbox

@willgeary
Copy link
Contributor

Thanks @AnthonyLovesBikes, I can confirm the same error for Toronto area. Yes, the TTC operator seems to be too large. Although, I have seen at least one example of somebody using this tool to visualize Toronto transit flows (they even wrote a program to convert transit frequency into audio!): See: https://rami-codes.github.io/2017/11/07/transitland-audiolizer/

Frankly, I am not sure if downloading massive schedules via the paginated transitland API is the best approach. It is much faster to download the raw GTFS zip file and process it locally with a python script. I would love to add a "drag and drop" capability to this tool, such that a user could decide to use the transitland API or to use a local GTFS zip file. Any thoughts on this functionality are welcome!

Best,
Will

@AnthonyLovesBikes
Copy link

AnthonyLovesBikes commented Apr 29, 2018 via email

@jaimeorrego
Copy link
Author

Thank you @willgeary! by changing the request site it works fine. I am just entering the world of GTFS data and definitely the drag and drop option would be interesting. Maybe is not exactly the place in this API, but also would be nice to have a GTFS data processor, that let you after some variable setting obtain a output.csv (for example, the number of the route). The idea of course would use the data in other kind of application. Thanks!

@willgeary willgeary reopened this May 1, 2018
@willgeary
Copy link
Contributor

Great, glad to hear that things are working for you @jaimeorrego.

I agree that a GTFS data processor would be nice. Frankly, I am considering whether that should belong within this project or as a standalone project.

@temospena
Copy link

Hi,
I have the same problem as @jaimeorrego with data for Lisbon.
But strangely, I can only download successfully the data for weekends or national holidays, maybe when the frequency of the buses (carris) is lower. I tried 1st May, 25th April, 1st April, and it was successful. I tried 23rd April, a regular day (as it seems @willgeary did, but the print screen then shows 25th April), and it doesn't fetch the data, neither other regular days in April.
I changed the request size and limit as you suggested.

I agree that a option to run data locally would be better.

Thanks for the api!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants