Skip to content

A curated list of example code to collect data from Web APIs using DataPrep.Connector.

Notifications You must be signed in to change notification settings

sfu-db/APIConnectors

Repository files navigation

Data Collection From Web APIs

All Contributors

A curated list of example code to collect data from Web APIs using DataPrep.Connector.

How to Contribute?

You can contribute to this project in two ways. Please check the contributing guide.

  1. Add your example code on this page
  2. Add a new configuration file to this repo

Why Contribute?

Index

Art

Harvard Art Museum -- Collect Museums' Collection Data

Find the objects with dog in their titles and were made in 1990.
from dataprep.connector import connect

# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})

df = await dc.query('object', title='dog', yearmade=1990)
df[['title', 'division', 'classification', 'technique', 'department', 'century', 'dated']]
title division classification technique department century dated
0 Paris (black dog on street) Modern and Contemporary Art Photographs Gelatin silver print Department of Photographs 20th century 1990s
1 Pregnant Woman with Dog Modern and Contemporary Art Photographs Gelatin silver print Department of Photographs 20th century 1990
2 Pompeii Dog Modern and Contemporary Art Prints Drypoint Department of Prints 20th century 1990
Find 10 people that are Dutch.
from dataprep.connector import connect

# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})

df = await dc.query('person', q='culture:Dutch', size=10)
df[['display name', 'gender', 'culture', 'display date', 'object count', 'birth place', 'death place']]
display name gender culture display date object count birth place death place
0 Joris Abrahamsz. van der Haagen unknown Dutch 1615 - 1669 7 Arnhem or Dordrecht, Netherlands The Hague, Netherlands
1 François Morellon de la Cave unknown Dutch 1723 - 65 1 None None
2 Cornelis Vroom unknown Dutch 1590/92 - 1661 3 Haarlem(?), Netherlands Haarlem, Netherlands
3 Constantijn Daniel van Renesse unknown Dutch 1626 - 1680 2 Maarssen Eindhoven
4 Dirck Dalens, the Younger unknown Dutch 1654 - 1688 3 Amsterdam, Netherlands Amsterdam, Netherlands
Find all exhibitions that take place at a Harvard Art Museums venue after 2020-01-01.
from dataprep.connector import connect

# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})

df = await dc.query('exhibition', venue='HAM', after='2020-01-01')
df
title begin date end date url
0 Painting Edo: Japanese Art from the Feinberg Collection 2020-02-14 2021-07-18 https://www.harvardartmuseums.org/visit/exhibitions/5909
Find 5 records for publications that were published in 2013.
from dataprep.connector import connect

# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})

df = await dc.query('publication', q='publicationyear:2013', size=5)
df[['title','publication date','publication place','format']]
title publication date publication place format
0 19th Century Paintings, Drawings & Watercolours January 23, 2013 London Auction/Dealer Catalogue
1 "With Éclat" The Boston Athenæum and the Orig... 2013 Boston, MA Book
2 "Review: Fragonard's Progress of Love at the F... 2013 London Article/Essay
3 Alternative Narratives February 2013 None Article/Essay
4 Victorian & British Impressionist Art July 11, 2013 London Auction/Dealer Catalogue
Find 5 galleries that are on floor (Level) 2 in the Harvard Art Museums building.
from dataprep.connector import connect

# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})

df = await dc.query('gallery', floor=2, size=5)
df[['id','name','theme','object count']]
id name theme object count
0 2200 European and American Art, 17th–19th century The Emergence of Romanticism in Early Nineteen... 20
1 2210 West Arcade None 6
2 2340 European and American Art, 17th–19th century The Silver Cabinet: Art and Ritual, 1600–1850 73
3 2460 East Arcade None 2
4 2700 European and American Art, 19th century Impressionism and the Late Nineteenth Century 19

Business

Yelp -- Collect Local Business Data

What's the phone number of Capilano Suspension Bridge Park?
from dataprep.connector import connect

# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 5)

df = await conn_yelp.query("businesses", term = "Capilano Suspension Bridge Park", location = "Vancouver", _count = 1)

df[["name","phone"]]
id name phone
0 Capilano Suspension Bridge Park +1 604-985-7474
Which yoga store has the highest review count in Vancouver?
from dataprep.connector import connect

# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 1)

  # Check all supported categories: https://www.yelp.ca/developers/documentation/v3/all_category_list
df = await conn_yelp.query("businesses", categories = "yoga", location = "Vancouver", sort_by = "review_count", _count = 1)
df[["name", "review_count"]]
id name review_count
0 YYOGA Downtown Flow 107
How many Starbucks stores in Seattle and where are they?
from dataprep.connector import connect

# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 5)
df = await conn_yelp.query("businesses", term = "Starbucks", location = "Seattle", _count = 1000)

# Remove irrelevant data
df = df[(df['city'] == 'Seattle') & (df['name'] == 'Starbucks')]
df[['name', 'address1', 'city', 'state', 'country', 'zip_code']].reset_index(drop=True)
id name address1 city state country zip_code
0 Starbucks 515 Westlake Ave N Seattle WA US 98109
1 Starbucks 442 Terry Avenue N Seattle WA US 98109
... ....... ....... ...... .. .. ....
126 Starbucks 17801 International Blvd Seattle WA US 98158
What are the ratings for a list of resturants?
from dataprep.connector import connect
import pandas as pd
import asyncio
# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 5)

names = ["Miku", "Boulevard", "NOTCH 8", "Chambar", "VIJ’S", "Fable", "Kirin Restaurant", "Cafe Medina", \
 "Ask for Luigi", "Savio Volpe", "Nicli Pizzeria", "Annalena", "Edible Canada", "Nuba", "The Acorn", \
 "Lee's Donuts", "Le Crocodile", "Cioppinos", "Six Acres", "St. Lawrence", "Hokkaido Santouka Ramen"]

query_list = [conn_yelp.query("businesses", term=name, location = "Vancouver", _count=1) for name in names]
results = asyncio.gather(*query_list)
df = pd.concat(await results)
df[["name", "rating", "city"]].reset_index(drop=True)
ID Name Rating City
0 Miku 4.5 Vancouver
1 Boulevard Kitchen & Oyster Bar 4.0 Vancouver
... ... ... ...
20 Hokkaido Ramen Santouka 4.0 Vancouver

Hunter -- Collect and Verify Professional Email Addresses

Who are executives of Asana and what are their emails?
from dataprep.connector import connect

# You can get ”hunter_access_token“ by registering as a developer https://hunter.io/users/sign_up
conn_hunter = connect("hunter", _auth={"access_token":'hunter_access_token'})

df = await conn_hunter.query('all_emails', domain='asana.com', _count=10)

df[df['department']=='executive']
first_name last_name email position department
0 Dustin Moskovitz [email protected] Cofounder executive
1 Stephanie Heß [email protected] CEO executive
2 Erin Cheng [email protected] Strategic Initiatives executive
What is Dustin Moskovitz's email?
from dataprep.connector import connect

# You can get ”hunter_access_token“ by registering as a developer https://hunter.io/users/sign_up
conn_hunter = connect("hunter", _auth={"access_token":'hunter_access_token'})

df = await conn_hunter.query("individual_email", full_name='dustin moskovitz', domain='asana.com')

df
first_name last_name email position
0 Dustin Moskovitz [email protected] Cofounder
Are the emails of Asana executives valid?
from dataprep.connector import connect

# You can get ”hunter_access_token“ by registering as a developer https://hunter.io/users/sign_up
conn_hunter = connect("hunter", _auth={"access_token":'hunter_access_token'})

employees = await conn_hunter.query("all_emails", domain='asana.com', _count=10)
executives = employees.loc[employees['department']=='executive']
emails = executives[['email']]

for email in emails.iterrows():
status = await conn_hunter.query("email_verifier", email=email[1][0])
emails['status'] = status

emails
email status
0 [email protected] valid
3 [email protected] NaN
4 [email protected] NaN
How many available requests do I have left?
from dataprep.connector import connect

# You can get ”hunter_access_token“ by registering as a developer https://hunter.io/users/sign_up
conn_hunter = connect("hunter", _auth={"access_token":'hunter_access_token'})

df = await conn_hunter.query("account")
df
requests available
0 19475
What are the counts of each level of seniority of Intercom employees?
from dataprep.connector import connect

# You can get ”hunter_access_token“ by registering as a developer https://hunter.io/users/sign_up
conn_hunter = connect("hunter", _auth={"access_token":'hunter_access_token'})

df = await conn_hunter.query("email_count", domain='intercom.io')
df.drop('total', axis=1)
junior senior executive
0 0 2 2

Calendar

Holiday -- Collect Holiday, Workday Data

What are the supported countries, their country codes and languages supported?
from dataprep.connector import connect

# You can get ”holiday_key“ by following https://holidayapi.com/docs
dc = connect('holiday', _auth={'access_token': holiday_key})

df = await dc.query("country")
df
code name languages
0 AD Andorra ['ca'
1 AE United Arab Emirates ['ar']
.. .. ... ...
249 ZW Zimbabwe ['en']
What are the public holidays of Canada in 2020?
from dataprep.connector import connect

# You can get ”holiday_key“ by following https://holidayapi.com/docs
dc = connect('holiday', _auth={'access_token': holiday_key})

df = await dc.query('holiday', country='CA', year=2020, public=True)
df
name date public observed weekday
0 New Year's Day 2020-01-01 True 2020-01-01 Wednesday
1 Good Friday 2020-04-10 True 2020-04-10 Friday
2 Victoria Day 2020-05-18 True 2020-05-18 Monday
3 Canada Day 2020-07-01 True 2020-07-01 Wednesday
4 Labor Day 2020-09-07 True 2020-09-07 Monday
5 Christmas Day 2020-12-25 True 2020-12-25 Friday
Which day is the 100th workday starting from 2020-01-01, in Canada?
from dataprep.connector import connect

# You can get ”holiday_key“ by following https://holidayapi.com/docs
dc = connect('holiday', _auth={'access_token': holiday_key})

df = await dc.query('workday', country='CA', start='2020-01-01', days=100)
df
date weekday
0 2020-5-22 Friday

Crime

JailBase -- Collect Prisoner Data

What is the URL for the mugshot of Almondo Smith?
# You can get ”jailbase_access_token“ by registering as a developer https://rapidapi.com/JailBase/api/jailbase
dc = connect('jailbase', _auth={'access_token':jailbase_access_token})

df = await dc.query('search', source_id='wi-wcsd', last_name='smith', first_name='almondo')

df['mugshot'][0]

'https://imgstore.jailbase.com/small/arrested/wi-wcsd/2017-12-29/almondo-smith-679063bf90e389938d70b0b49caf7944.pic1.jpg'

Who were the 10 most recently arrested people by Wood County Sheriff's Department?
# You can get ”jailbase_access_token“ by registering as a developer https://rapidapi.com/JailBase/api/jailbase
dc = connect('jailbase', _auth={'access_token':jailbase_access_token})
sources = await dc.query('sources')
department = sources[sources['name']=='Wood County Sheriff\'s Dept']

df = await dc.query('recent', source_id=department['source_id'].values[0])

df
id name mugshot charges more_info_url
0 23917656 Curtis Joseph https://imgstore.jailbase.com/small/arrested/w... [[]] http://www.jailbase.com/en/wi-wcsdcurtis-josep...
1 23917654 Taner Summers https://imgstore.jailbase.com/small/arrested/w... [[]] http://www.jailbase.com/en/wi-wcsdtaner-summer...
2 23901411 Maryann Randolph https://imgstore.jailbase.com/small/arrested/w... [[]] http://www.jailbase.com/en/wi-wcsdmaryann-rand...
3 23821284 Antonia Cinodijay https://imgstore.jailbase.com/widgets/NoMug.gif [[]] http://www.jailbase.com/en/wi-wcsdantonia-cino...
4 23821280 Deangelo Barker https://imgstore.jailbase.com/small/arrested/w... [[]] http://www.jailbase.com/en/wi-wcsddeangelo-bar...
5 23811811 Tekeisha Faucibus https://imgstore.jailbase.com/small/arrested/w... [[]] http://www.jailbase.com/en/wi-wcsdtekeisha-fau...
6 23811810 Tariq Nunoke https://imgstore.jailbase.com/small/arrested/w... [[]] http://www.jailbase.com/en/wi-wcsdtariq-nunoke...
7 23811808 Sarah Jusakaja https://imgstore.jailbase.com/small/arrested/w... [[]] http://www.jailbase.com/en/wi-wcsdsarah-jusaka...
8 23791805 Angela Burch https://imgstore.jailbase.com/small/arrested/w... [[]] http://www.jailbase.com/en/wi-wcsdangela-burch...
9 23775367 Suzanne Nicholson https://imgstore.jailbase.com/small/arrested/w... [[]] http://www.jailbase.com/en/wi-wcsdsuzanne-nich...
How many police offices are in each US state in the JailBase system?
# You can get ”jailbase_access_token“ by registering as a developer https://rapidapi.com/JailBase/api/jailbase
dc = connect('jailbase', _auth={'access_token':jailbase_access_token})

df = await dc.query('sources')

state_counts = df['state'].value_counts()
state_counts
North Carolina    81
Kentucky          75
Missouri          73
Arkansas          70
Iowa              67
Texas             57
Virginia          47
Florida           46
Mississippi       44
Indiana           38
New York          37
South Carolina    35
Ohio              29
Colorado          27
Tennessee         26
Alabama           26
Idaho             23
New Mexico        18
California        18
Michigan          17
Georgia           17
Illinois          14
Washington        13
Wisconsin         11
Oregon            10
Nevada             9
Arizona            9
Louisiana          8
New Jersey         7
Oklahoma           6
Utah               5
Minnesota          5
Pennsylvania       4
Maryland           4
Kansas             3
North Dakota       3
South Dakota       2
Wyoming            2
Alaska             1
West Virginia      1
Nebraska           1
Montana            1
Connecticut        1
Name: state, dtype: int64

Finance

Finnhub -- Collect Financial, Market, Economic Data

How to get a list of cryptocurrencies and their exchanges
import pandas as pd
from dataprep.connector import connect

# You can get ”finnhub_access_token“ by following https://finnhub.io/
conn_finnhub = connect("finnhub", _auth={"access_token":finnhub_access_token}, update=True)

df = await conn_finnhub.query('crypto_exchange')
exchanges = df['exchange'].to_list()
symbols = []
for ex in exchanges:
    data = await df.query('crypto_symbols', exchange=ex)
    symbols.append(data)
df_symbols = pd.concat(symbols)
df_symbols
id description displaySymbol symbol
0 Binance FRONT/ETH FRONT/ETH BINANCE:FRONTETH
1 Binance ATOM/BUSD ATOM/BUSD BINANCE:ATOMBUSD
... ... ... ...
281 Poloniex AKRO/BTC AKRO/BTC POLONIEX:BTC_AKRO
Which ipo in the current month has the highest total share values?
import calendar
from datetime import datetime
from dataprep.connector import connect

# You can get ”finnhub_access_token“ by following https://finnhub.io/
conn_finnhub = connect("finnhub", _auth={"access_token":finnhub_access_token}, update=True)

today = datetime.today()
days_in_month = calendar.monthrange(today.year, today.month)[1]
date_from = today.replace(day=1).strftime('%Y-%m-%d')
date_to = today.replace(day=days_in_month).strftime('%Y-%m-%d')
ipo_df = await conn_finnhub.query('ipo_calender', from_=date_from, to=date_to)
ipo_df[ipo_df['totalSharesValue'] == ipo_df['totalSharesValue'].max()]
id date exchange name numberOfShares ... totalSharesValue
5 2021-02-03 NYSE TELUS International (Cda) Inc. 33333333 ... 9.58333e+08
What are the average acutal earnings from the last 4 seasons of a list of 10 popular stocks?
import asyncio
import pandas as pd
from dataprep.connector import connect

# You can get ”finnhub_access_token“ by following https://finnhub.io/
conn_finnhub = connect("finnhub", _auth={"access_token":finnhub_access_token}, update=True)

stock_list = ['TSLA', 'AAPL', 'WMT', 'GOOGL', 'FB', 'MSFT', 'COST', 'NVDA', 'JPM', 'AMZN']
query_list = [conn_finnhub.query('earnings', symbol=symbol) for symbol in stock_list]
query_results = asyncio.gather(*query_list)
stocks_df = pd.concat(await query_results)
stocks_df = stocks_df.groupby('symbol', as_index=False).agg({'actual': ['mean']})
stocks_df.columns = stocks_df.columns.get_level_values(0)
stocks_df = stocks_df.sort_values(by='actual', ascending=False).rename(columns={'actual': 'avg_actual'})
stocks_df.reset_index(drop=True)
id symbol avg_actual
0 GOOGL 12.9375
1 AMZN 8.5375
2 FB 2.4475
.. ... ...
9 TSLA 0.556
What is the earnings of last 4 quarters of a given company? (e.g. TSLA)
from dataprep.connector import connect
from datetime import datetime, timedelta, timezone

# You can get ”finnhub_access_token“ by following https://finnhub.io/
conn_finnhub = connect("finnhub", _auth={"access_token":finnhub_access_token}, update=True)

today = datetime.now(tz=timezone.utc)
oneyear = today - timedelta(days = 365)
start = int(round(oneyear.timestamp()))

result = await conn_finnhub.query('earnings_calender', symbol='TSLA', from_=start, to=today)
result = result.set_index('date')
result
id date epsActual epsEstimate hour quarter ... symbol year
0 2021-01-27 0.8 1.37675 amc 4 ... TSLA 2020
1 2020-10-21 0.76 0.600301 amc 3 ... TSLA 2020
2 2020-07-22 0.436 -0.0267036 amc 2 ... TSLA 2020
.. ... ... ... ... ... ... ... ...
3 2011-02-15 -0.094 -0.101592 amc 4 ... TSLA 2010

CoinGecko -- Collect Cryptocurrency Data

What are the 10 cryptocurrencies with highest market cap and their current information?
from dataprep.connector import connect

conn_coingecko = connect("coingecko")
df = await conn_coingecko.query('markets', vs_currency='usd', order='market_cap_desc', per_page=10, page=1)
df
name symbol current_price market_cap market_cap_rank high_24h low_24h price_change_24h price_change_percentage_24h market_cap_change_24h market_cap_change_percentage_24h last_updated
0 Bitcoin btc 36811 6.86613e+11 1 37153 35344 1440.68 4.0731 3.10933e+10 4.7433 2021-02-03T19:24:09.271Z
1 Ethereum eth 1628.99 1.87035e+11 2 1645.73 1486.42 132.91 8.88404 1.64296e+10 9.63018 2021-02-03T19:22:32.413Z
.. ... ... ... ... ... ... ... ... ... ... ... ...
9 Binance Coin bnb 51.47 7.60256e+09 10 51.63 49.76 1.24 2.47631 1.64863e+08 2.21659 2021-02-03T19:25:45.456Z
What are the cryptocurrencies with highest increasing and decreasing percentage?
from dataprep.connector import connect

conn_coingecko = connect("coingecko")
df = await conn_coingecko.query('markets', vs_currency='usd', per_page=1000, page=1)
df = df.sort_values(by=['price_change_percentage_24h']).reset_index(drop=True).dropna()
print("Coin with highest decreasing percetage: {}, which decreases {}%".format(df['name'].iloc[0], df['price_change_percentage_24h'].iloc[0]))
print("Coin with highest increasing percetage: {}, which increases {}%".format(df['name'].iloc[-1], df['price_change_percentage_24h'].iloc[-1]))

Coin with the highest decreasing percentage: PancakeSwap, which decreases -13.79622%

Coin with the highest increasing percentage: StormX, which increases 101.24182%

Which cryptocurrencies are trending in CoinGecko?
from dataprep.connector import connect

conn_coingecko = connect("coingecko")
df = await conn_coingecko.query('trend')
df
id name symbol market_cap_rank score
0 bao-finance Bao Finance BAO 175 0
1 milk2 MILK2 MILK2 634 1
2 unitrade Unitrade TRADE 529 2
3 pancakeswap-token PancakeSwap CAKE 110 3
4 fsw-token Falconswap FSW 564 4
5 zeroswap ZeroSwap ZEE 550 5
6 storm StormX STMX 211 6
What are the 10 US exchanges with highest trade volume in the past 24 hours?
from dataprep.connector import connect

conn_coingecko = connect("coingecko")
df = await conn_coingecko.query('exchanges')
result = df[df['country']=='United States'].reset_index(drop=True).head(10)
result
id name year_established ... trade_volume_24h_btc_normalized
0 gdax Coinbase Pro 2012 ... 90085.6
1 kraken Kraken 2011 ... 48633.1
2 binance_us Binance US 2019 ... 7380.83
.. ... ... ... ... ...
What are the 3 latest traded derivatives with perpetual contract?
from dataprep.connector import connect
import pandas as pd

conn_coingecko = connect("coingecko")
df = await conn_coingecko.query('derivatives')
perpetual_df = df[df['contract_type'] == 'perpetual'].reset_index(drop=True)
perpetual_df['last_traded_at'] = pd.to_datetime(perpetual_df['last_traded_at'], unit='s')
perpetual_df.sort_values(by=['last_traded_at'], ascending=False).head(3).reset_index(drop=True)
market symbol index_id contract_type index basis funding_rate open_interest volume_24h last_traded_at
0 Huobi Futures MATIC-USDT MATIC perpetual 0.0433357 -0.606296 0.247604 nan 1.43338e+06 2021-02-03 20:14:24
1 Biki (Futures) 1 BTC perpetual 36769.8 -0.153111 -0.0519 nan 1.00131e+08 2021-02-03 20:14:23
2 Huobi Futures CVC-USDT CVC perpetual 0.178268 -0.336302 0.106314 nan 876960 2021-02-03 20:14:23

Geocoding

MapQuest -- Collect Driving Directions, Maps, Traffic Data

Where is the Simon Fraser University? Give all the places if there is more than one campus.
from dataprep.connector import connect

# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)

BC_BBOX = "-139.06,48.30,-114.03,60.00"
campus = await conn_map.query("place", q = "Simon Fraser University", sort = "relevance", bbox = BC_BBOX, _count = 50)
campus = campus[campus["name"] == "Simon Fraser University"].reset_index()
id index name country state city address postalCode coordinates details
0 0 Simon Fraser University CA BC Burnaby 8888 University Drive E V5A 1S6 [-122.90416, 49.27647] ...
1 2 Simon Fraser University CA BC Vancouver 602 Hastings St W V6B 1P2 [-123.113431, 49.284626] ...
How many KFC are there in Burnaby? What are their address?
from dataprep.connector import connect

# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)

BC_BBOX = "-139.06,48.30,-114.03,60.00"
kfc = await conn_map.query("place", q = "KFC", sort = "relevance", bbox = BC_BBOX, _count = 500)
kfc = kfc[(kfc["name"] == "KFC") & (kfc["city"] == "Burnaby")].reset_index()
print("There are %d KFCs in Burnaby" % len(kfc))
print("Their addresses are:")
kfc['address']

There are 1 KFCs in Burnaby

Their addresses are:

id address
0 5094 Kingsway
The ratio of Starbucks to Tim Hortons in Vancouver?
from dataprep.connector import connect

# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)
VAN_BBOX = '-123.27,49.195,-123.020,49.315'
starbucks = await conn_map.query('place', q='starbucks', sort='relevance', bbox=VAN_BBOX, page='1', pageSize = '50', _count=200)
timmys = await conn_map.query('place', q='Tim Hortons', sort='relevance', bbox=VAN_BBOX, page='1', pageSize = '50', _count=200)

is_vancouver_sb = starbucks['city'] == 'Vancouver'
is_vancouver_tim = timmys['city'] == 'Vancouver'
sb_in_van = starbucks[is_vancouver_sb]
tim_in_van = timmys[is_vancouver_tim]
print('The ratio of Starbucks:Tim Hortons in Vancouver is %d:%d' % (len(sb_in_van), len(tim_in_van)))

The ratio of Starbucks:Tim Hortons in Vancouver is 188:120

What is the closest gas station from Metropolist and how far is it?
from dataprep.connector import connect
from numpy import radians, sin, cos, arctan2, sqrt

def distance_in_km(cord1, cord2):
    R = 6373.0

    lat1 = radians(cord1[1])
    lon1 = radians(cord1[0])
    lat2 = radians(cord2[1])
    lon2 = radians(cord2[0])

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * arctan2(sqrt(a), sqrt(1 - a))
    distance = R * c

    return(distance)

# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)
METRO_TOWN = [-122.9987, 49.2250]
METRO_TOWN_string = '%f,%f' % (METRO_TOWN[0], METRO_TOWN[1])
nearest_petro = await conn_map.query('place', q='gas station', sort='distance', location=METRO_TOWN_string, page='1', pageSize = '1')
print('Metropolist is %fkm from the nearest gas station' % distance_in_km(METRO_TOWN, nearest_petro['coordinates'][0]))
print('The gas station is %s at %s' % (nearest_petro['name'][0], nearest_petro['address'][0]))

Metropolist is 0.376580km from the nearest gas station

The gas station is Chevron at 4692 Imperial St

In BC, which city has the most amount of shopping centers?
from dataprep.connector import connect

# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)
BC_BBOX = "-139.06,48.30,-114.03,60.00"
GROCERY = 'sic:541105'
shop_list = await conn_map.query("place", sort="relevance", bbox=BC_BBOX, category=GROCERY, _count=500)
shop_list = shop_list[shop_list["state"] == "BC"]
shop_list.groupby('city')['name'].count().sort_values(ascending=False).head(10)
city count
Vancouver 42
Victoria 24
Surrey 15
Burnaby 14
... ...
North Vancouver 8
Where is the nearest grocery of SFU? How many miles far? And how much time estimated for driving?
from dataprep.connector import connect

# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)
SFU_LOC = '-122.90416, 49.27647'
GROCERY = 'sic:541105'
nearest_grocery = await conn_map.query("place", location=SFU_LOC, sort="distance", category=GROCERY)
destination = nearest_grocery.iloc[0]['details']
name = nearest_grocery.iloc[0]['name']
route = await conn_map.query("route", from_='8888 University Drive E, Burnaby', to=destination)
total_distance = sum([float(i)for i in route.iloc[:]['distance']])
total_time = sum([int(i)for i in route.iloc[:]['time']])
print('The nearest grocery of SFU is ' + name + '. It is ' + str(total_distance) + ' miles far, and It is expected to take ' + str(total_time // 60) + 'm' + str(total_time % 60)+'s of driving.')
route

The nearest grocery of SFU is Nesters Market. It is 1.234 miles far, and It is expected to take 3m21s of driving.

id index narrative distance time
0 0 Start out going east on University Dr toward Arts Rd. 0.348 57
1 1 Turn left to stay on University Dr. 0.606 84
2 2 Enter next roundabout and take the 1st exit onto University High St. 0.28 60
3 3 9000 UNIVERSITY HIGH STREET is on the left. 0 0

Jobs

The Muse -- Collect Job Ads, Company Information

What are the data science jobs in Vancouver on the fisrt page?
from dataprep.connector import connect

# You can get ”app_key“ by following https://www.themuse.com/developers/api/v2/apps
dc = connect('themuse', _auth={'access_token': app_key})

df = await dc.query('jobs', page=1, category='Data Science', location='Vancouver, Canada')
df[['id', 'name', 'company', 'locations', 'levels', 'publication_date']]
id name company locations levels publication_date
0 5126286 Senior Data Scientist Discord [{'name': 'Flexible / Remote'}] [{'name': 'Senior Level', 'short_name': 'senio... 2021-03-15T11:10:24Z
1 5543215 Data Scientist-AI/ML (Remote) Dell Technologies [{'name': 'Chicago, IL'}, {'name': 'Flexible /... [{'name': 'Mid Level', 'short_name': 'mid'}] 2021-04-02T11:45:57Z
2 4959228 Senior Data Scientist Humana [{'name': 'Flexible / Remote'}] [{'name': 'Senior Level', 'short_name': 'senio... 2021-01-05T11:28:23.814281Z
3 5172631 Data Scientist - Marketing Stash [{'name': 'Flexible / Remote'}] [{'name': 'Mid Level', 'short_name': 'mid'}] 2021-03-26T23:09:33Z
4 5372353 Data Science Intern, Machine Learning Coursera [{'name': 'Flexible / Remote'}] [{'name': 'Internship', 'short_name': 'interns... 2021-04-05T23:04:40Z
5 5298606 Senior Machine Learning Engineer Affirm [{'name': 'Flexible / Remote'}] [{'name': 'Senior Level', 'short_name': 'senio... 2021-03-17T23:10:51Z
6 5166882 Data Scientist Postmates [{'name': 'Bellevue, WA'}, {'name': 'Los Angel... [{'name': 'Mid Level', 'short_name': 'mid'}] 2021-02-01T17:49:53.238832Z
7 5375212 Director, Data Science & Analytics UKG [{'name': 'Flexible / Remote'}, {'name': 'Lowe... [{'name': 'management', 'short_name': 'managem... 2021-03-31T23:17:53Z
8 5130731 Senior Data Scientist Humana [{'name': 'Flexible / Remote'}] [{'name': 'Senior Level', 'short_name': 'senio... 2021-01-26T11:42:44.232111Z
9 5306269 Director of Data Sourcing and Strategy Opendoor [{'name': 'Flexible / Remote'}] [{'name': 'management', 'short_name': 'managem... 2021-03-31T23:05:22Z
What are the senior-level data science positions at Amazon on the first page?
from dataprep.connector import connect

# You can get ”app_key“ by following https://www.themuse.com/developers/api/v2/apps
dc = connect('themuse', _auth={'access_token': app_key})

df = await dc.query('jobs', page=1, category='Data Science', company='Amazon', level='Senior Level')
df[:10][['id', 'name', 'company', 'locations', 'publication_date']]
id name company locations publication_date
0 5153796 Sr. Data Architect, Data Lake & Analytics - Na... Amazon [{'name': 'San Diego, CA'}] 2021-02-01T22:54:14.002653Z
1 4083477 Principal Data Architect, Data Lake & Analytics Amazon [{'name': 'Chicago, IL'}] 2021-02-01T23:14:17.251814Z
2 4149878 Principal Data Architect, Data Warehousing & MPP Amazon [{'name': 'Arlington, VA'}] 2021-02-01T23:15:22.017573Z
3 4497753 Data Architect - Data Lake & Analytics - Natio... Amazon [{'name': 'Irvine, CA'}] 2021-02-01T23:15:22.439949Z
4 4870271 Data Scientist Amazon [{'name': 'Seattle, WA'}] 2021-02-01T23:04:25.967878Z
5 4603482 Data Scientist - Prime Gaming Amazon [{'name': 'Seattle, WA'}] 2021-02-01T23:10:37.628292Z
6 5193240 Data Scientist Amazon [{'name': 'Seattle, WA'}] 2021-02-04T23:56:19.176327Z
7 4678426 Sr Data Architect - Streaming Amazon [{'name': 'Roseville, CA'}] 2021-02-01T22:51:25.598645Z
8 4150011 Data Architect - Data Lake & Analytics - Natio... Amazon [{'name': 'Tampa, FL'}] 2021-02-04T23:56:18.281215Z
9 4346719 Sr. Data Scientist - ML Labs Amazon [{'name': 'London, United Kingdom'}] 2021-02-01T23:12:42.038111Z
What are the top 10 companies in engineering? (sorted by factors such as trendiness, uniqueness, newness, etc)?
from dataprep.connector import connect

# You can get ”app_key“ by following https://www.themuse.com/developers/api/v2/apps
dc = connect('themuse', _auth={'access_token': app_key})

df = await dc.query('companies', industry='Engineering', page=1)
df[:10]
id name locations size publication_date url
0 706 Appian [{'name': 'Tysons Corner, VA'}] Medium Size 2015-11-25T18:17:50.926146Z https://www.themuse.com/companies/appian
1 12168 Bristol Myers Squibb [{'name': 'Boudry, Switzerland'}, {'name': 'De... Large Size 2020-12-15T15:55:56.940074Z https://www.themuse.com/companies/bristolmyers...
2 11897 McMaster-Carr [{'name': 'Atlanta, GA'}, {'name': 'Chicago, I... Large Size 2020-02-10T21:57:15.338561Z https://www.themuse.com/companies/mcmastercarr
3 12162 ServiceNow [{'name': 'Santa Clara, CA'}] Large Size 2021-01-26T23:48:13.066632Z https://www.themuse.com/companies/servicenow
4 11731 Tenaska [{'name': 'Boston, MA'}, {'name': 'Dallas, TX'... Large Size 2019-03-14T14:01:54.465873Z https://www.themuse.com/companies/tenaska
5 11885 Brex [{'name': 'Flexible / Remote'}, {'name': 'New ... Medium Size 2020-02-05T23:16:44.780028Z https://www.themuse.com/companies/brex
6 1483 Inline Plastics [{'name': 'Shelton, CT'}] Medium Size 2017-09-11T14:49:24.153633Z https://www.themuse.com/companies/inlineplastics
7 12113 Dematic [{'name': 'Atlanta, GA'}, {'name': 'Banbury, U... Large Size 2020-09-17T20:29:19.400892Z https://www.themuse.com/companies/dematic
8 11967 Kairos Power [{'name': 'Albuquerque, NM'}, {'name': 'Charlo... Medium Size 2020-12-07T21:29:33.538815Z https://www.themuse.com/companies/kairospower
9 11913 Siemens [{'name': 'Munich, Germany'}] Large Size 2020-01-23T21:35:56.937727Z https://www.themuse.com/companies/siemens

Lifestyle

Spoonacular -- Collect Recipe, Food, and Nutritional Information Data

Which foods are unhealthy, i.e.,have high carbs and high fat content?
from dataprep.connector import connect
import pandas as pd

dc = connect('spoonacular', _auth={'access_token': API_key}, concurrency=3, update=True)

df = await dc.query('recipes_by_nutrients', minFat=65, maxFat=100, minCarbs=75, maxCarbs=100, _count=20)

df["calories"] = pd.to_numeric(df["calories"]) # convert string type to numeric
df = df[df['calories']>1100] # considering foods with more than 1100 calories per serving to be unhealthy

df[["title","calories","fat","carbs"]].sort_values(by=['calories'], ascending=False)
id title calories fat carbs
2 Brownie Chocolate Chip Cheesecake 1210 92g 79g
8 Potato-Cheese Pie 1208 80g 96g
0 Stuffed Shells with Beef and Broc 1192 72g 81g
3 Coconut Crusted Rockfish 1187 72g 92g
4 Grilled Ratatouille 1143 82g 88g
7 Pecan Bars 1121 84g 91g
Which meat dishes are rich in proteins?
from dataprep.connector import connect

dc = connect('spoonacular', _auth={'access_token': API_key}, concurrency=3, update=True)

df = await dc.query('recipes', query='beef', diet='keto', minProtein=25, maxProtein=60, _count=5)
df = df[["title","nutrients"]]

# Output of 'nutrients' column : [{'title': 'Protein', 'amount': 22.3768, 'unit': 'g'}]
g = [] # to extract the exact amount of Proteins in grams and store as list
for i in df["nutrients"]:
  z = i[0]
  g.append(z['amount'])
  
df.insert(1,'Protein(g)',g)
df[["title","Protein(g)"]].sort_values(by='Protein(g)',ascending=False)
id title Protein(g)
3 Strip steak with roasted cherry tomatoes and v... 56.2915
0 Low Carb Brunch Burger 53.7958
2 Entrecote Steak with Asparagus 41.6676
1 Italian Style Meatballs 35.9293
Which Italian Vegan dishes are popular?
from dataprep.connector import connect

dc = connect('spoonacular', _auth={'access_token': API_key}, concurrency=3, update=True)

df = await dc.query('recipes', query='popular veg dishes', cuisine='italian', diet='vegan', _count=20)
df[["title"]]
id Title
0 Vegan Pea and Mint Pesto Bruschetta
1 Gluten Free Vegan Gnocchi
2 Fresh Tomato Risotto with Grilled Green Vegeta...
What are the top 5 liked chicken recipes with common ingredients?
from dataprep.connector import connect
import pandas as pd

dc = connect('spoonacular', _auth={'access_token': API_key}, concurrency=3, update=True)

df= await dc.query('recipes_by_ingredients', ingredients='chicken,buttermilk,salt,pepper')
df['likes'] = pd.to_numeric(df['likes'])

df[['title', 'likes']].sort_values(by=['likes'], ascending=False).head(5)
id title likes
9 Oven-Fried Ranch Chicken 561
1 Fried Chicken and Wild Rice Waffles with Pink ... 78
6 CCC: Carla Hall’s Fried Chicken 47
2 Buttermilk Fried Chicken 12
0 My Pantry Shelf 10
What is the average calories for high calorie Korean foods?
from dataprep.connector import connect
from statistics import mean 

dc = connect('spoonacular', _auth={'access_token': API_key}, concurrency=3, update=True)

df = await dc.query('recipes', query='korean', minCalories = 500)
nutri = df['nutrients'].tolist()

calories = []
for i in range(len(nutri)):
  calories.append(nutri[i][0]['amount'])

print('Average calories for high calorie Korean foods:', mean(calories),'kcal')

Average calories for high calorie Korean foods: 644.765 kcal

Music

MusixMatch -- Collect Music Lyrics Data

What is Katy Perry's Twitter URL?
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})

df = await conn_musixmatch.query("artist_info", artist_mbid = "122d63fc-8671-43e4-9752-34e846d62a9c")

df[['name', 'twitter_url']]
name twitter_url
0 Katy Perry https://twitter.com/katyperry
What album is the song "Gone, Gone, Gone" in?
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})

df = await conn_musixmatch.query("track_matches", q_track = "Gone, Gone, Gone")

df[['name', 'album_name']]
name album_name
0 Gone, Gone, Gone The World From the Side of the Moon
Which artist/artists group is most popular in Canada?
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})

df = await conn_musixmatch.query("top_artists", country = "Canada")

df['name'][0]
'BTS'
How many genres are in the Musixmatch database?
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})

df = await conn_musixmatch.query("genres")

len(df)
362
Who is the most popular American artist named Michael?
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token}, _concurrency = 5)

df = await conn_musixmatch.query("artists", q_artist = "Michael")
df = df[df['country'] == "US"].sort_values('rating', ascending=False)

df['name'].iloc[0]
'Michael Jackson'
What is the genre of the album "Atlas"?
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})

album = await conn_musixmatch.query("album_info", album_id = 11339785)
genres = await conn_musixmatch.query("genres")
album_genre = genres[genres['id'] == album['genre_id'][0][0]]['name']

album_genre.iloc[0]
'Soundtrack'
What is the link to lyrics of the most popular song in the album "Yellow"?
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token}, _concurrency = 5)

df = await conn_musixmatch.query("album_tracks", album_id = 10266231)
df = df.sort_values('rating', ascending=False)

df['track_share_url'].iloc[0]
'https://www.musixmatch.com/lyrics/Coldplay/Yellow?utm_source=application&utm_campaign=api&utm_medium=SFU%3A1409620992740'
What are Lady Gaga's albums from most to least recent?
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token}, update = True)

df = await conn_musixmatch.query("artist_albums", artist_mbid = "650e7db6-b795-4eb5-a702-5ea2fc46c848", s_release_date = "desc")

df.name.unique()
array(['Chromatica', 'Stupid Love',
       'A Star Is Born (Original Motion Picture Soundtrack)', 'Your Song'],
      dtype=object)
Which artists are similar to Lady Gaga?
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})

df = await conn_musixmatch.query("related_artists", artist_mbid = "650e7db6-b795-4eb5-a702-5ea2fc46c848")

df
id name rating country twitter_url updated_time artist_alias_list
0 6985 Cast 41 2015-03-29T03:32:49Z [キャスト]
1 7014 black eyed peas 77 US https://twitter.com/bep 2016-06-30T10:07:05Z [The Black Eyed Peas, ブラック・アイド・ピーズ, heiyandoud...
2 269346 OneRepublic 74 US https://twitter.com/OneRepublic 2015-01-07T08:21:52Z [ワンリパブリツク, Gong He Shi Dai, Timbaland presents...
3 276451 Taio Cruz 60 GB 2016-06-30T10:32:58Z [タイオ クルーズ, tai ou ke lu zi, Trio Cruz, Jacob M...
4 409736 Inna 54 RO https://twitter.com/inna_ro 2014-11-13T03:37:43Z [インナ]
5 475281 Skrillex 62 US https://twitter.com/Skrillex 2013-11-05T11:28:57Z [スクリレックス, shi qi lei ke si, Sonny, Skillrex]
6 13895270 Imagine Dragons 82 US https://twitter.com/Imaginedragons 2013-11-05T11:30:28Z [イマジン・ドラゴンズ, IMAGINE DRAGONS]
7 27846837 Shawn Mendes 80 CA 2015-02-17T10:33:56Z [ショーン・メンデス, xiaoenmengdezi]
8 33491890 Rihanna 81 GB https://twitter.com/rihanna 2018-10-15T20:32:58Z [りあーな, Rihanna, 蕾哈娜, Rhianna, Riannah, Robyn R...
9 33491981 Avicii 74 SE https://twitter.com/avicii 2018-04-20T18:27:01Z [アヴィーチー, ai wei qi, Avicci]
What are the highest rated songs in Canada from highest to lowest popularity?
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token}, _concurrency = 5)

df = await conn_musixmatch.query("top_tracks", country = 'CA')

df[df['is_explicit'] == 0].sort_values('rating', ascending = False).reset_index()
index id name rating commontrack_id has_instrumental is_explicit has_lyrics has_subtitles album_id album_name artist_id artist_name track_share_url updated_time genres
0 5 201621042 Dynamite 99 114947355 0 0 1 1 39721115 Dynamite - Single 24410130 BTS https://www.musixmatch.com/lyrics/BTS/Dynamite... 2021-01-15T16:40:48Z [Pop]
1 9 187880919 Before You Go 99 103153140 0 0 1 1 35611759 Divinely Uninspired To A Hellish Extent (Exten... 33258132 Lewis Capaldi https://www.musixmatch.com/lyrics/Lewis-Capald... 2019-11-20T08:44:05Z [Pop, Alternative]
2 7 189704353 Breaking Me 98 105304416 0 0 1 1 34892017 Keep On Loving 42930474 Topic feat. A7S https://www.musixmatch.com/lyrics/Topic-8/Brea... 2021-01-19T16:57:29Z [House, Dance]
3 3 189626475 Watermelon Sugar 95 103096346 0 0 1 1 36101498 Fine Line 24505463 Harry Styles https://www.musixmatch.com/lyrics/Harry-Styles... 2020-02-14T08:07:12Z [Music]
What are other songs in the same album as the song "Before You Go"?
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})

song = await conn_musixmatch.query("track_info", commontrack_id = 103153140)
album = await conn_musixmatch.query("album_tracks", album_id = song["album_id"][0])

album
id name rating commontrack_id has_instrumental is_explicit has_lyrics has_subtitles album_id album_name artist_id artist_name track_share_url updated_time genres
0 186884178 Grace 31 87857108 0 0 1 1 35611759 Divinely Uninspired To A Hellish Extent (Exten... 33258132 Lewis Capaldi https://www.musixmatch.com/lyrics/Lewis-Capald... 2019-04-09T10:21:29Z [Folk-Rock]
1 186884184 Bruises 68 70395936 0 0 1 1 35611759 Divinely Uninspired To A Hellish Extent (Exten... 33258132 Lewis Capaldi https://www.musixmatch.com/lyrics/Lewis-Capald... 2020-07-31T12:58:04Z [Music, Alternative]
2 186884187 Hold Me While You Wait 89 95176135 0 0 1 1 35611759 Divinely Uninspired To A Hellish Extent (Exten... 33258132 Lewis Capaldi https://www.musixmatch.com/lyrics/Lewis-Capald... 2020-08-02T07:23:21Z [Music]
3 186884189 Someone You Loved 95 89461086 0 0 1 1 35611759 Divinely Uninspired To A Hellish Extent (Exten... 33258132 Lewis Capaldi https://www.musixmatch.com/lyrics/Lewis-Capald... 2020-06-22T15:34:07Z [Pop, Alternative]
4 186884190 Maybe 31 95541701 0 1 1 1 35611759 Divinely Uninspired To A Hellish Extent (Exten... 33258132 Lewis Capaldi https://www.musixmatch.com/lyrics/Lewis-Capald... 2019-05-20T11:41:00Z [Music]
5 186884191 Forever 67 95541702 0 0 1 1 35611759 Divinely Uninspired To A Hellish Extent (Exten... 33258132 Lewis Capaldi https://www.musixmatch.com/lyrics/Lewis-Capald... 2019-11-18T10:46:36Z [Music]
6 186884192 One 31 95541699 0 0 1 1 35611759 Divinely Uninspired To A Hellish Extent (Exten... 33258132 Lewis Capaldi https://www.musixmatch.com/lyrics/Lewis-Capald... 2019-05-19T04:08:23Z [Music]
7 186884193 Don't Get Me Wrong 31 95541698 0 0 1 1 35611759 Divinely Uninspired To A Hellish Extent (Exten... 33258132 Lewis Capaldi https://www.musixmatch.com/lyrics/Lewis-Capald... 2019-12-20T08:25:26Z [Music]
8 186884194 Hollywood 31 95541700 0 0 1 1 35611759 Divinely Uninspired To A Hellish Extent (Exten... 33258132 Lewis Capaldi https://www.musixmatch.com/lyrics/Lewis-Capald... 2019-05-21T08:00:54Z [Music]
9 186884195 Lost on You 31 73530089 0 0 1 1 35611759 Divinely Uninspired To A Hellish Extent (Exten... 33258132 Lewis Capaldi https://www.musixmatch.com/lyrics/Lewis-Capald... 2020-03-17T08:35:18Z [Alternative]

Spotify -- Collect Albums, Artists, and Tracks Metadata

How many followers does Eminem have?
from dataprep.connector import connect

# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)

df = await conn_spotify.query("artist", q="Eminem", _count=500)

df.loc[df['# followers'].idxmax(), '# followers']
41157398
How many singles does Pink Floyd have that are available in Canada?
from dataprep.connector import connect

# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)

artist_name = "Pink Floyd"
df = await conn_spotify.query("album", q = artist_name, _count = 500)

df = df.loc[[(artist_name in x) for x in df['artist']]]
df = df.loc[[('CA' in x) for x in df['available_markets']]]
df = df.loc[df['total_tracks'] == '1']
df.shape[0]
12
In the last quarter of 2020, which artist released the album with the most tracks?
from dataprep.connector import connect
import pandas as pd

# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)

df = await conn_spotify.query("album", q = "2020", _count = 500)

df['date'] = pd.to_datetime(df['release_date'])
df = df[df['date'] > '2020-10-01'].drop(columns = ['image url', 'external urls', 'release_date'])
df['total_tracks'] = df['total_tracks'].astype(int)
df = df.loc[df['total_tracks'].idxmax()]
print(df['album_name'] + ", by " + df['artist'][0] + ", tracks: " + str(df['total_tracks']))
ASOT 996 - A State Of Trance Episode 996 (Top 50 Of 2020 Special), by Armin van Buuren ASOT Radio, tracks: 172
Who is the most popular artist: Eminem, Beyonce, Pink Floyd and Led Zeppelin
# and what are their popularity ratings?
from dataprep.connector import connect

# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)

artists_and_num_followers = []
for artist in ['Beyonce', 'Pink Floyd', 'Eminem', 'Led Zeppelin']:
    df = await conn_spotify.query("artist", q = artist, _count = 500) 
    num_followers = df.loc[df['# followers'].idxmax(), 'popularity']
    artists_and_num_followers.append((artist, num_followers))

print(sorted(artists_and_num_followers, key=lambda x: x[1], reverse=True))
[('Eminem', 94.0), ('Beyonce', 88.0), ('Pink Floyd', 83.0), ('Led Zeppelin', 81.0)]```python
Who are the top 5 artists with the most followers from the current Billboard top 100 artists?
from dataprep.connector import connect
from bs4 import BeautifulSoup
import requests

# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)

web_page = requests.get("https://www.billboard.com/charts/artist-100")
html_soup = BeautifulSoup(web_page.text, 'html.parser')
artist_100 = html_soup.find_all('span', class_ = 'chart-list-item__title-text')

artists = {}
artists_top5 = []
for artist in artist_100:
    df_temp = await conn_spotify.query("artist", q = artist.text.strip(), _count = 10)
    df_temp = df_temp.loc[df_temp['popularity'].idxmax()]
    artists[df_temp['name']] = df_temp['# followers']
artists_top5 = sorted(artists, key = artists.get, reverse = True)[:5]
artists_top5
['Ed Sheeran', 'Ariana Grande', 'Drake', 'Justin Bieber', 'Eminem']
For a list of top 10 most popular albums from rollingstone.com which album has most selling markets (countries) around the world in 2020?
from dataprep.connector import connect
import asyncio

# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)

def count_markets(text):
    lst = text.split(',')
    return len(lst)

album_artists = ["Folklore", "Fetch the Bolt Cutters", "YHLQMDLG", "Rough and Rowdy Ways", "Future Nostalgia",
                 "RTJ4", "Saint Cloud", "Eternal Atake", "What’s Your Pleasure", "Punisher"]

album_list = [conn_spotify.query("album", q = name, _count = 1) for name in album_artists]
combined = asyncio.gather(*album_list)
df = pd.concat(await combined).reset_index()
df = df.drop(columns = ['image url', 'external urls', 'index'])
df['market_count'] = df['available_markets'].apply(lambda x: count_markets(x))
df = df.loc[df['market_count'].idxmax()]
print(df['album_name'] + ", by " + df['artist'][0] + ", with " + str(df['market_count']) + " avalible countries")
folklore, by Taylor Swift, with 92 avalible countries

iTunes -- Collect iTunes Data

What are all Jack Johnson audio and video content?
from dataprep.connector import connect

conn_itunes = connect('itunes')
df = await conn_itunes.query('search', term="jack+johnson")
df
id Type kind artistName collectionName trackName trackTime
0 track song Jack Johnson Jack Johnson and Friends: Sing-A-Longs and Lul... Upside Down 208643
1 track song Jack Johnson In Between Dreams (Bonus Track Version) Better Together 207679
2 track song Jack Johnson In Between Dreams (Bonus Track Version) Sitting, Waiting, Wishing 183721
... ... ... ... ... ... ...
49 track song Jack Johnson Sleep Through the Static While We Wait 86112
How to compute the average track time of Rich Brian's music videos?
from dataprep.connector import connect

conn_itunes = connect('itunes')
df = await conn_itunes.query("search", term="rich+brian", entity="musicVideo")
avg_track_time = df['trackTime'].mean()/(1000*60)
print("The average track time is {:.3} minutes.".format(avg_track_time))

The average track time is 4.13 minutes.

How to get all Ang Lee's movies which are made in the Unite States?
from dataprep.connector import connect

conn_itunes = connect('itunes')
df = await conn_itunes.query("search", term="Ang+Lee", entity="movie", country="us")
df = df[df['artistName']=='Ang Lee']
df
id type kind artistName collectionName trackName trackTime
0 track feature-movie Ang Lee Fox 4K HDR Drama Collection Life of Pi 7642675
1 track feature-movie Ang Lee None Gemini Man 7049958
... ... ... ... ... ... ...
11 track feature-movie Ang Lee None Ride With the Devil 8290498

Networking

IPLegit -- Collect IP Address Data

How can I check if an IP address is bad, so I can block it from accessing my website?
from dataprep.connector import connect

# You can get ”iplegit_access_token“ by registering as a developer https://rapidapi.com/IPLegit/api/iplegit
conn_iplegit = connect('iplegit', _auth={'access_token':iplegit_access_token})

ip_addresses = ['16.210.143.176', 
                '98.124.198.1', 
                '182.50.236.215', 
                '90.104.138.217', 
                '61.44.131.150', 
                '210.64.150.243', 
                '89.141.156.184']

for ip in ip_addresses:
    ip_status = await conn_iplegit.query('status', ip=ip)
    bad_status = ip_status['bad_status'].get(0)
    if bad_status == True:
        print('block ip address: ', ip_status['ip'].get(0))

block ip address: 98.124.198.1

What country are most people from who have visited my website?
from dataprep.connector import connect
import pandas as pd

# You can get ”iplegit_access_token“ by registering as a developer https://rapidapi.com/IPLegit/api/iplegit
conn_iplegit = connect('iplegit', _auth={'access_token':iplegit_access_token})

ip_addresses = ['16.210.143.176', 
                '98.124.198.1', 
                '182.50.236.215', 
                '90.104.138.217', 
                '61.44.131.150', 
                '210.64.150.243', 
                '89.141.156.184',
                '85.94.168.133', 
                '98.14.201.52', 
                '98.57.106.207', 
                '185.254.139.250', 
                '206.246.126.82', 
                '147.44.75.68', 
                '123.42.224.40', 
                '253.29.140.44', 
                '97.203.209.153', 
                '196.63.36.253']

ip_details = []
for ip in ip_addresses:
    ip_details.append(await conn_iplegit.query('details', ip=ip))

df = pd.concat(ip_details)
df.country.mode().get(0)

'UNITED STATES'

Make a map showing locations of people who have visited my website.
from dataprep.connector import connect
import pandas as pd
from shapely.geometry import Point
import geopandas as gpd
from geopandas import GeoDataFrame

# You can get ”iplegit_access_token“ by registering as a developer https://rapidapi.com/IPLegit/api/iplegit
conn_iplegit = connect('iplegit', _auth={'access_token':iplegit_access_token})

ip_addresses = ['16.210.143.176', 
                '98.124.198.1', 
                '182.50.236.215', 
                '90.104.138.217', 
                '61.44.131.150', 
                '210.64.150.243', 
                '89.141.156.184',
                '85.94.168.133', 
                '98.14.201.52', 
                '98.57.106.207', 
                '185.254.139.250', 
                '206.246.126.82', 
                '147.44.75.68', 
                '123.42.224.40', 
                '253.29.140.44', 
                '97.203.209.153', 
                '196.63.36.253']

ip_details = []
for ip in ip_addresses:
    ip_details.append(await conn_iplegit.query('details', ip=ip))

df = pd.concat(ip_details)
geometry = [Point(xy) for xy in zip(df['longitude'], df['latitude'])]
gdf = GeoDataFrame(df, geometry=geometry)   

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
gdf.plot(ax=world.plot(figsize=(10, 6)), marker='o', color='red', markersize=15);

png

News

Guardian -- Collect Guardian News Data

Which news section contain most mentions related to bitcoin ?
from dataprep.connector import connect, info, Connector
import pandas as pd

conn_guardian = connect('guardian', update = True, _auth={'access_token': API_key}, concurrency=3)
df3 = await conn_guardian.query('article', _q='covid 19', _count=1000)
df3.groupby('section').count().sort_values("headline", ascending=False)
section headline url publish_date
World news 378 378 378
Business 103 103 103
US news 76 76 76
Opinion 72 72 72
Sport 53 53 53
Australia news 49 49 49
Society 44 44 44
Politics 34 34 34
Football 28 28 28
Global development 26 26 26
UK news 26 26 26
Education 17 17 17
Environment 14 14 14
Technology 10 10 10
Film 10 10 10
Science 8 8 8
Books 8 8 8
Life and style 7 7 7
Television & radio 6 6 6
Media 4 4 4
Culture 4 4 4
Stage 4 4 4
News 4 4 4
Travel 2 2 2
WEHI: Brighter together 2 2 2
Xero: Resilient business 2 2 2
Money 2 2 2
The new rules of work 1 1 1
LinkedIn: Hybrid workplace 1 1 1
Global 1 1 1
Getting back on track 1 1 1
Westpac Scholars: Rethink tomorrow 1 1 1
Food 1 1 1
All together 1 1 1
Find articles with covid precautions ?
from dataprep.connector import connect, Connector

conn_guardian = connect('guardian', update = True, _auth={'access_token': API_key}, concurrency=3)
df2 = await conn_guardian.query('article', _q='covid 19 protect',  _count=100)
df2[df2.section=='Opinion']
id headline section url publish_date
0 Billionaires made $1tn since Covid-19. They ca... Opinion https://www.theguardian.com/commentisfree/2020... 2020-12-09T11:32:20Z
1 Jeff Bezos became even richer thanks to Covid-... Opinion https://www.theguardian.com/commentisfree/2020... 2020-12-13T07:30:00Z
20 Here's how to tackle the Covid-19 anti-vaxxers... Opinion https://www.theguardian.com/commentisfree/2020... 2020-11-26T16:02:14Z
41 Can the UK deliver on the Covid vaccine rollou... Opinion https://www.theguardian.com/commentisfree/2020... 2020-12-11T09:00:24Z
68 Covid-19 has turned back the clock on working ... Opinion https://www.theguardian.com/commentisfree/2020... 2020-12-10T14:19:27Z
84 The Guardian view on Covid-19 promises: season... Opinion https://www.theguardian.com/commentisfree/2020... 2020-12-14T18:42:10Z
88 The Guardian view on responding to the Covid-1... Opinion https://www.theguardian.com/commentisfree/2020... 2020-12-30T18:58:05Z

Times -- Collect New York Times Data

Who is the author of article 'Yellen Outlines Economic Priorities, and Republicans Draw Battle Lines'
from dataprep.connector import connect

# You can get ”times_access_token“ by following https://developer.nytimes.com/apis
conn_times = connect("times", _auth={"access_token":times_access_token})
df = await conn_times.query('ac',q='Yellen Outlines Economic Priorities, and Republicans Draw Battle Lines')
df[["authors"]]
id authors
0 By Alan Rappeport
What is the newest news from Ottawa
from dataprep.connector import connect

# You can get ”times_access_token“ by following https://developer.nytimes.com/apis
conn_times = connect("times", _auth={"access_token":times_access_token})
df = await conn_times.query('ac',q="ottawa",sort='newest')
df[['headline','authors','abstract','url','pub_date']].head(1)
headline ... pub_date
0 21 Men Accuse Lincoln Project Co-Founder of Online Harassment ... 2021-01-31T14:48:35+0000
What are Headlines of articles where Trump was mentioned in the last 6 months of 2020 in the technology news section
from dataprep.connector import connect

# You can get ”times_access_token“ by following https://developer.nytimes.com/apis
conn_times = connect("times", _auth={"access_token":times_access_token})
df = await conn_times.query('ac',q="Trump",fq='section_name:("technology")',begin_date='20200630',end_date='20201231',sort='newest', _count=50)

print(df['headline'])
print("Trump was mentioned in " + str(len(df)) + " articles")
id headline
0 No, Trump cannot win Georgia’s electoral votes through a write-in Senate campaign.
1 How Misinformation ‘Superspreaders’ Seed False Election Theories
2 No, Trump’s sister did not publicly back him. He was duped by a fake account.
.. ...
49 Trump Official’s Tweet, and Its Removal, Set Off Flurry of Anti-Mask Posts

Trump was mentioned in 50 articles

What is the ranking of times a celebrity is mentioned in a headline in latter half of 2020?
from dataprep.connector import connect
import pandas as pd
# You can get ”times_access_token“ by following https://developer.nytimes.com/apis
conn_times = connect("times", _auth={"access_token":times_access_token})
celeb_list = ['Katy Perry', 'Taylor Swift', 'Lady Gaga', 'BTS', 'Rihanna', 'Kim Kardashian']
number_of_mentions = []
for i in celeb_list:
    df1 = await conn_times.query('ac',q=i,begin_date='20200630',end_date='20201231')
    df1 = df1[df1['headline'].str.contains(i)]
    a = len(df1['headline'])
    number_of_mentions.append(a)

print(number_of_mentions)
    
ranking_df = pd.DataFrame({'name': celeb_list, 'number of mentions': number_of_mentions})
ranking_df = ranking_df.sort_values(by=['number of mentions'], ascending=False)
ranking_df

[2, 6, 3, 6, 1, 0]

name number of mentions
1 Taylor Swift 6
3 BTS 6
2 Lady Gaga 3
0 Katy Perry 2
4 Rihanna 1
5 Kim Kardashian 0

Currents -- Collect Currents News Data

How to get latest Chinese news?
from dataprep.connector import connect

# You can get ”currents_access_token“ by following https://currentsapi.services/zh_CN
conn_currents = connect('currents', _auth={'access_token': currents_access_token})
df = await conn_currents.query('latest_news', language='zh')
df.head()
id title category ... author published
0 為何上市公司該汰換了 [entrepreneur] ... 經濟日報 2021-02-03 08:48:39 +0000
How to get the political news about 'Trump'?
from dataprep.connector import connect

# You can get ”currents_access_token“ by following https://currentsapi.services/zh_CN
conn_currents = connect('currents', _auth={'access_token': currents_access_token})
df = await conn_currents.query('search', keywords='Trump', category='politics')
df.head(3)
title category description url author published
0 Biden Started The Process Of Unwinding Trump's Assault On Immigration, But Activists Want Him To Move Faster ['politics', 'world'] "These people cannot continue to wait." https://www.buzzfeednews.com/article/adolfoflores/biden-immigration-executive-orders-review Adolfo Flores 2021-02-03 08:39:51 +0000
1 Pro-Trump lawyer Lin Wood reportedly under investigation for voter fraud ['politics', 'world'] A source told CBS Atlanta affiliate WGCL that Lin Wood is being investigated for allegedly voting "out of state." https://www.cbsnews.com/news/pro-trump-lawyer-lin-wood-under-investigation-for-alleged-illegal-voting-2020-02-03/ April Siese 2021-02-03 08:21:25 +0000
2 Trump Supporters Say They Attacked The Capitol Because He Told Them To, Undercutting His Impeachment Defense ['politics', 'world'] “President Trump told Us to ‘fight like hell,’” one Trump supporter reportedly posted online after the assault on the Capitol. https://www.buzzfeednews.com/article/zoetillman/trump-impeachment-capitol-rioters-fight-like-hell Zoe Tillman 2021-02-03 07:25:34 +0000
How to get the news about COVID-19 from 2020-12-25?
from dataprep.connector import connect

# You can get ”currents_access_token“ by following https://currentsapi.services/zh_CN
conn_currents = connect('currents', _auth={'access_token': currents_access_token})
df = await conn_currents.query('search', keywords='covid', start_date='2020-12-25',end_date='2020-12-25')
df.head(1)
title category ... published
0 Commentary: Let our charitable giving equal our political donations ['opinion'] ... 2020-12-25 00:00:00 +0000

Science

DBLP -- Collect Computer Science Publication Data

Who wrote this paper?
from dataprep.connector import connect
conn_dblp = connect("dblp")
df = await conn_dblp.query("publication", q = "Scikit-learn: Machine learning in Python", _count = 1)
df[["title", "authors", "year"]]
id title authors year
0 Scikit-learn - Machine Learning in Python. [Fabian Pedregosa, Gaël Varoquaux, Alexandre G... 2011
How to fetch all publications of Andrew Y. Ng?
from dataprep.connector import connect

conn_dblp = connect("dblp", _concurrency = 5)
df = await conn_dblp.query("publication", author = "Andrew Y. Ng", _count = 2000)
df[["title", "authors", "venue", "year"]].reset_index(drop=True)
id title authors venue year
0 The 1st Agriculture-Vision Challenge - Methods... [Mang Tik Chiu, Xingqian Xu, Kai Wang, Jennife... [CVPR Workshops] 2020
... ... ... ... ...
242 An Experimental and Theoretical Comparison of ... [Michael J. Kearns, Yishay Mansour, Andrew Y. ... [COLT] 1995
How to fetch all publications of NeurIPS 2020?
from dataprep.connector import connect

conn_dblp = connect("dblp", _concurrenncy = 5)
df = await conn_dblp.query("publication", q = "NeurIPS 2020", _count = 5000)

# filter non-neurips-2020 papers
mask = df.venue.apply(lambda x: 'NeurIPS' in x)
df = df[mask]
df = df[(df['year'] == '2020')]
df[["title", "venue", "year"]].reset_index(drop=True)
id title venue year
0 Towards More Practical Adversarial Attacks on ... [NeurIPS] 2020
... ... ... ...
1899 Triple descent and the two kinds of overfittin... [NeurIPS] 2020

NASA -- Collect NASA Data.

What are the title of Astronomy Picture of the Day from 2020-01-01 to 2020-01-10?
from dataprep.connector import connect

# You can get ”nasa_access_key“ by following https://api.nasa.gov/
conn_nasa = connect("api-connectors/nasa", _auth={'access_token': nasa_access_key})

df = await conn_nasa.query("apod", start_date='2020-01-01', end_date='2020-01-10')
df['title']
id title
0 Betelgeuse Imagined
1 The Fainting of Betelgeuse
2 Quadrantids over the Great Wall
... ...
9 Nacreous Clouds over Sweden
What are Coronal Mass Ejection(CME) data from 2020-01-01 to 2020-02-01?
from dataprep.connector import connect

# You can get ”nasa_access_key“ by following https://api.nasa.gov/
conn_nasa = connect("api-connectors/nasa", _auth={'access_token': nasa_access_key})

df = await conn_nasa.query('cme', startDate='2020-01-01', endDate='2020-02-01')
df
id activity_id catalog start_time ... link
0 2020-01-05T16:45:00-CME-001 M2M_CATALOG 2020-01-05T16:45Z ... https://kauai.ccmc.gsfc.nasa.gov/DONKI/view/CME/15256/-1
1 2020-01-14T11:09:00-CME-001 M2M_CATALOG 2020-01-14T11:09Z ... https://kauai.ccmc.gsfc.nasa.gov/DONKI/view/CME/15271/-1
.. ... ... ... ... ...
4 2020-01-25T18:54:00-CME-001 M2M_CATALOG 2020-01-25T18:54Z ... https://kauai.ccmc.gsfc.nasa.gov/DONKI/view/CME/15296/-1
How many Geomagnetic Storms(GST) have occurred from 2020-01-01 to 2021-01-01? When is it?
from dataprep.connector import connect

# You can get ”nasa_access_key“ by following https://api.nasa.gov/
conn_nasa = connect("api-connectors/nasa", _auth={'access_token': nasa_access_key})

df = await conn_nasa.query('gst', startDate='2020-01-01', endDate='2021-01-01')
print("Geomagnetic Storms have occurred %s times from 2020-01-01 to 2021-01-01." % len(df))
df['start_time']

Geomagnetic Storms have occurred 1 times from 2020-01-01 to 2021-01-01.

id start_time
0 2020-09-27T21:00Z
How many Solar Flare(FLR) have occurred and completed from 2020-01-01 to 2021-01-01? How long did they last?
import pandas as pd
from dataprep.connector import connect

# You can get ”nasa_access_key“ by following https://api.nasa.gov/
conn_nasa = connect("api-connectors/nasa", _auth={'access_token': nasa_access_key})

df = await conn_nasa.query('flr', startDate='2020-01-01', endDate='2021-01-01')
df = df.dropna(subset=['end_time']).reset_index(drop=True)
df['duration'] = pd.to_datetime(df['end_time']) - pd.to_datetime(df['begin_time'])
print('Solar Flare have occurred %s times from 2020-01-01 to 2021-01-01.' % len(df))
print(df['duration'])

There are 1 times Geomagnetic Storms(GST) have occurred from 2020-01-01 to 2021-01-01.

id duration
0 0 days 01:07:00
1 0 days 00:23:00
2 0 days 00:47:00
What are Solar Energetic Particle(SEP) data from 2019-01-01 to 2021-01-01?
import pandas as pd
from dataprep.connector import connect

# You can get ”nasa_access_key“ by following https://api.nasa.gov/
conn_nasa = connect("api-connectors/nasa", _auth={'access_token': nasa_access_key})

df = await conn_nasa.query('sep', startDate='2019-01-01', endDate='2021-01-01')
df
id sep_id event_time instruments ... link
0 2020-11-30T04:26:00-SEP-001 2020-11-30T04:26Z ['STEREO A: IMPACT 13-100 MeV'] ... https://kauai.ccmc.gsfc.nasa.gov/DONKI/view/SEP/16166/-1
1 2020-11-30T14:16:00-SEP-001 2020-11-30T14:16Z ['STEREO A: IMPACT 13-100 MeV'] ... https://kauai.ccmc.gsfc.nasa.gov/DONKI/view/SEP/16169/-1

Shopping

Etsy -- Collect Handmade Marketplace Data.

What are the products I can get when I search for "winter jackets"?
from dataprep.connector import connect

# You can get ”etsy_access_key“ by following https://www.etsy.com/developers/documentation/getting_started/oauth
conn_etsy = connect("etsy", _auth={'access_token': etsy_access_key}, _concurrency = 5)
# Item search
df = await conn_etsy.query("items", keywords = "winter jackets")
df[['title',"url","description","price","currency"]]
id title url description price currency quantity
0 White coat,cashmere coat,wool jacket with belt... https://www.etsy.com/listing/646692584/white-c... ★Please leave your phone number to me while yo... 183.00 USD 1
1 Vintage 90's Nike ACG Parka Jacket Large N... https://www.etsy.com/listing/937300597/vintage... Vintage 90's Nike ACG Parka Jacket Large N... 110.00 USD 1
... ... ... ... ... ... ... ... .... ..
24 Miss yo 2018 Vintage Checker Jacket for Blythe... https://www.etsy.com/listing/613790308/miss-yo... ~~ Welcome to our shop ~~\n\nSet include:\n1 Vin... 52.00 SGD 1
What's the favorites for the shop “CrazedGaming”?
from dataprep.connector import connect

# You can get ”etsy_access_key“ by following https://www.etsy.com/developers/documentation/getting_started/oauth
conn_etsy = connect("etsy", _auth={'access_token': etsy_access_key}, _concurrency = 5)

# Shop search
df = await conn_etsy.query("shops", shop_name = "CrazedGaming",  _count = 1)
df[["name", "url", "favorites"]]
id Name Url Favorites
0 CrazedGaming https://www.etsy.com/shop/CrazedGaming?utm_sou... 265
What are the top 10 custom photo pillows ranked by number of favorites?
from dataprep.connector import connect

# You can get ”etsy_access_key“ by following https://www.etsy.com/developers/documentation/getting_started/oauth
conn_etsy = connect("etsy", _auth = {"access_token": etsy_access_key}, _concurrency = 5)

# Item search sort by favorites
df_cp_pillow = await conn_etsy.query("items", keywords = "custom photo pillow", _count = 7000)
df_cp_pillow = df_cp_pillow.sort_values(by = ['favorites'], ascending = False)
df_top10_cp_pillow = df_cp_pillow.iloc[:10]
df_top10_cp_pillow[['title', 'price', 'currency', 'favorites', 'quantity']]
id title price currency favorites quantity
68 Custom Pet Photo Pillow, Valentines Day Gift, ... 29.99 USD 9619.0 320.0
193 Custom Shaped Dog Photo Pillow Personalized Mo... 29.99 USD 5523.0 941.0
374 Custom PILLOW Pet Portrait - Pet Portrait Pill... 49.95 USD 5007.0 74.0
196 Personalized Cat Pillow Mothers Day Gift for M... 29.99 USD 3839.0 939.0
69 Photo Sequin Pillow Case, Personalized Sequin ... 25.49 USD 3662.0 675.0
637 Family photo sequin pillow | custom image reve... 28.50 USD 3272.0 540.0
44 Custom Pet Pillow Custom Cat Pillow best cat l... 20.95 USD 2886.0 14.0
646 Sequin Pillow with Photo Personalized Photo Re... 32.00 USD 2823.0 1432.0
633 Personalized Name Pillow, Baby shower gift, Ba... 16.00 USD 2511.0 6.0
4416 Letter C pillow Custom letter Alphabet pillow ... 24.00 USD 2284.0 4.0
What are the prices of active products for quantities (>10) for a particular searched keyword "blue 2021 weekly spiral planner"?
from dataprep.connector import connect

# You can get ”etsy_access_key“ by following https://www.etsy.com/developers/documentation/getting_started/oauth
conn_etsy = connect("etsy", _auth={'access_token': etsy_access_key}, _concurrency = 5)

# Item search and filters
planner_df = await conn_etsy.query("items", keywords = "blue 2021 weekly spiral planner", _count = 100)

result_df = planner_df[((planner_df['state'] == 'active') & (planner_df['quantity'] > 10))]
result_df
id title state url description price currency quantity views favorites
1 2021 Plaid About You Medium Daily Weekly Month... active https://www.etsy.com/listing/789842329/2021-pl... Planning and organizing life is a snap with th... 15.99 USD 496 100 11
2 2021 Undated Diary Planner , Notebook Weekly D... active https://www.etsy.com/listing/917640414/2021-un... A6 2021 Yearly Monthly Weekly Agenda Planner ,... 12.00 GBP 792 3433 168
. ... ... ... ... ... ... ... ... .. ... ... ...
85 July 2020-June 2021 Big Blue Year Large Daily ... active https://www.etsy.com/listing/776300099/july-20... This 12-month academic year planner offers a c... 6.95 USD 493 454 31
What's the average price for blue denim frayed jacket on Etsy selling in USD currency?
from dataprep.connector import connect

# You can get ”etsy_access_key“ by following https://www.etsy.com/developers/documentation/getting_started/oauth
conn_etsy = connect("etsy", _auth = {"access_token": etsy_access_key}, _concurrency = 5)

# Item search and filters 
df_dbfjacket = await conn_etsy.query("items", keywords = "blue denim frayed jacket", _count = 500)
df_dbfjacket = df_dbfjacket[df_dbfjacket['currency'] == 'USD'].astype(float)

# Calculate average price
average_price = round(df_dbfjacket['price'].mean(), 2)
print("The average price for blue denim frayed jacket is: $", average_price)

The average price for blue denim frayed jacket is: $ 58.82

What are the top 10 viewed for keyword “ceramic wind chimes” with a given word “handmade” present in the description?
from dataprep.connector import connect

# You can get ”etsy_access_key“ by following https://www.etsy.com/developers/documentation/getting_started/oauth
conn_etsy = connect("etsy", _auth = {"access_token": etsy_access_key}, _concurrency = 5)

# Item search
df = await conn_etsy.query("items", keywords = "ceramic wind chimes",  _count = 2000)

# Filter and sorting
df = df[(df["description"].str.contains('handmade'))]
new_df = df[["title", "url", "views"]]
new_df.sort_values(by="views", ascending=False).reset_index(drop=True).head(10)
id title url views
0 Hanging ceramic wind chime in gloss white glaz... https://www.etsy.com/listing/101462779/hanging... 24406
1 Trending Now! Best Seller Birthday Gift for Mo... https://www.etsy.com/listing/555128094/trendin... 17058
2 Beautiful Ceramic outdoor hanging wind chime -... https://www.etsy.com/listing/155966922/beautif... 9758
3 Wind Chime, Garden Yard Art for Outdoor Home D... https://www.etsy.com/listing/159252106/wind-ch... 8850
4 Ceramic cow bells | wind chime bell | wall han... https://www.etsy.com/listing/538608210/ceramic... 6540
5 Mom Gift Ideas Housewarming Gifts Garden Decor... https://www.etsy.com/listing/171539253/mom-gif... 6123
6 Ceramic Wind Chimes single strand Wall Hanging... https://www.etsy.com/listing/598234797/ceramic... 5288
7 Handcraft Ceramic Bird Wind Chime/ Bird Windch... https://www.etsy.com/listing/697798625/handcra... 4733
8 Glass Wind Chime Green Leaves Windchime Garden... https://www.etsy.com/listing/744753959/glass-w... 4579
9 Handmade ceramic and driftwood wind chimes Bea... https://www.etsy.com/listing/615210251/handmad... 2774

Social

Twitch -- Collect Twitch Streams and Channels Information

How many followers does the Twitch user "Logic" have?
from dataprep.connector import connect

# You can get ”twitch_access_token“ by registering https://www.twitch.tv/signup
conn_twitch = connect("twitch", _auth={"access_token":twitch_access_token}, _concurrency=3)

df = await conn_twitch.query("channels", query="logic", _count = 1000)

df = df.where(df['name'] == 'logic').dropna()
df = df[['name', 'followers']]
df.reset_index()
index name followers
0 0 logic 540274.0
Which 5 Twitch users that speak English have the most views and what games do they play?
from dataprep.connector import connect

# You can get ”twitch_access_token“ by registering https://www.twitch.tv/signup
conn_twitch = connect("twitch", _auth={"access_token":twitch_access_token}, _concurrency=3)

df = await conn_twitch.query("channels",query="%", _count = 1000)

df = df[df['language'] == 'en']
df = df.sort_values('views', ascending = False)
df = df[['name', 'views', 'game', 'language']]
df = df.head(5)
df.reset_index()
index name views game language
0 495 Fextralife 1280705870 The Elder Scrolls Online en
1 9 Riot Games 1265668908 League of Legends en
2 16 ESL_CSGO 548559390 Counter-Strike: Global Offensive en
3 160 BeyondTheSummit 462493560 Dota 2 en
4 1 shroud 433902453 Rust en
Which channel has the most viewers for each of the top 10 games?
from dataprep.connector import connect

# You can get ”twitch_access_token“ by registering https://www.twitch.tv/signup
conn_twitch = connect("twitch", _auth={"access_token":twitch_access_token}, _concurrency=3)

df = await conn_twitch.query("streams", query="%", _count = 1000)

# Group by games, sum viewers and sort by total viewers
df_new = df.groupby(['game'], as_index = False)['viewers'].agg('sum').rename(columns = {'game':'games', 'viewers':'total_viewers'})
df_new = df_new.sort_values('total_viewers',ascending = False)

# Select the channel with most viewers from each game 
df_2 = df.loc[df.groupby(['game'])['viewers'].idxmax()]

# Select the most popular channels for each of the 10 most popular games
df_new = df_new.head(10)['games']
best_games = df_new.tolist()
result_df = df_2[df_2['game'].isin(best_games)]
result_df = result_df.head(10)
result_df = result_df[['game','channel_name', 'viewers']]
result_df.reset_index()
index game channel_name viewers
0 3 seonghwazip 32126
1 21 Call of Duty: Warzone FaZeBlaze 7521
2 9 Dota 2 dota2mc_ru 16118
3 2 Escape From Tarkov summit1g 33768
4 15 Fortnite Fresh 10371
5 8 Hearthstone SilverName 16765
6 22 Just Chatting Trainwreckstv 6927
7 0 League of Legends LCK_Korea 77613
8 10 Minecraft Tfue 15209
9 11 VALORANT TenZ 13617
(1) What is the number of Fortnite and Valorant streams in the past 24 hours? (2) Is there any relationship between viewers and channel followers?
from dataprep.connector import connect
import pandas as pd

# You can get ”twitch_access_token“ by registering https://www.twitch.tv/signup
conn_twitch = connect("twitch", _auth = {"access_token":twitch_access_token}, _concurrency = 3)

df = await conn_twitch.query("streams", query = "%fortnite%VALORANT%", _count = 1000)

df = df[['stream_created_at', 'game', 'viewers', 'channel_followers']]
df['stream_created_at'] = df['stream_created_at'].astype('str') # Convert date to string

for idx, value in enumerate(df['stream_created_at']):
    df.loc[idx,'stream_created_at'] = value[0:9] + ' ' + value[-9:-1] # Extract datetime

df['stream_created_at'] = pd.to_datetime(df['stream_created_at']) 
df['diff'] = pd.Timestamp.now().normalize() - df['stream_created_at'] 
df['diff'] = df['diff'].dt.total_seconds().astype('int') 

df2 = df[['channel_followers', 'viewers']].corr(method='pearson') # Find correlation (part 2)

df = df[df['diff'] > 864000] # Find streams in last 24 hours

options = ['Fortnite', 'VALORANT']
df = df[df['game'].isin(options)]
df = df.groupby(['game'], as_index=False)['diff'].agg('count').rename(columns={'diff':'count'})

# Print correlation part 2
print("Correlation between viewers and channel followers:")
print(df2)

# Print part 1
print('Number of streams in the past 24 hours:')
df
Correlation between viewers and channel followers:
                   channel_followers   viewers
channel_followers           1.000000  0.851698
viewers                     0.851698  1.000000

Number of streams in the past 24 hours:

game count
0 Fortnite 3
1 VALORANT 3

Twitter -- Collect Tweets Information

What are the 10 latest english tweets by SFU handle (@SFU) ?
from dataprep.connector import connect

dc = connect('twitter', _auth={'client_id':client_id, 'client_secret':client_secret})

# Querying 100 tweets from @SFU
df = await dc.query("tweets", _q="from:@SFU -is:retweet", _count=100)

# Filtering english language tweets
df = df[df['iso_language_code'] == 'en'][['created_at', 'text']]

# Displaying latest 10 tweets
df = df.iloc[0:10,]
print('-----------')
for index, row in df.iterrows():   
    print(row['created_at'], row['text'])
    print('-----------')
-----------
Mon Feb 01 23:59:16 +0000 2021 Thank you to these #SFU student athletes for sharing their insights. #BlackHistoryMonth2021 https://t.co/WGCvGrQOzu
-----------
Mon Feb 01 23:00:56 +0000 2021 How can #SFU address issues of inclusion & access for #Indigenous students & work with them to support their educat… https://t.co/knEM0SSHYu
-----------
Mon Feb 01 21:37:30 +0000 2021 DYK: New #SFU research shows media gender bias; men are quoted 3 times more often than women. #GenderGapTracker loo… https://t.co/c77PsNUIqV
-----------
Mon Feb 01 19:55:03 +0000 2021 With the temperatures dropping, how will you keep warm this winter? Check out our tips on what to wear (and footwea… https://t.co/EOCuYbio4P
-----------
Mon Feb 01 18:06:49 +0000 2021 COVID-19 has affected different groups in unique ways. #SFU researchers looked at the stresses facing “younger” old… https://t.co/gMvcxOlWvb
-----------
Mon Feb 01 16:18:51 +0000 2021 Please follow @TransLink for updates. https://t.co/nQDZQ5JYlt
-----------
Fri Jan 29 23:00:02 +0000 2021 #SFU researchers Caroline Colijn and Paul Tupper performed a modelling exercise to see if screening with rapid test… https://t.co/07aU3SP0j2
-----------
Fri Jan 29 19:01:32 +0000 2021 un/settled, a towering photo-poetic piece at #SFU's Belzberg Library, aims to centre Blackness & celebrate Black th… https://t.co/F6kp0Lwu5A
-----------
Fri Jan 29 17:02:34 +0000 2021 Learning that it’s okay to ask for help is an important part of self-care—and so is recognizing when you don't have… https://t.co/QARn1CRLyp
-----------
Fri Jan 29 00:44:11 +0000 2021 @shashjayy @shashjayy Hi Shashwat, I've spoken to my colleagues in Admissions. They're looking into it and will respond to you directly.
-----------
What are top 10 users based on retweet count ?
from dataprep.connector import connect

dc = connect('twitter', _auth={'client_id':client_id, 'client_secret':client_secret})

# Querying 1000 retweets and filtering only english language tweets
df = await dc.query("tweets", q='RT AND is:retweet', _count=1000)
df = df[df['iso_language_code'] == 'en']

# Iterating over tweets to get users and Retweet Count
retweets = {}
for index, row in df.iterrows():
  if row['text'].startswith('RT'):
      # Eg. tweet 'RT @Crazyhotboye: NMS?\nLeveled up to 80' 
      user_retweeted = row['text'][4:row['text'].find(':')]
      if user_retweeted in retweets:
          retweets[user_retweeted] += 1
      else:
          retweets[user_retweeted] = 1
          
# Sorting and displaying top 10 users
cols = ['User', 'RT_Count']
retweets_df = pd.DataFrame(list(retweets.items()), columns=cols)
retweets_df = retweets_df.sort_values(by=['RT_Count'], ascending=False).reset_index(drop=True).iloc[0:10,:]
retweets_df
id User RT_Count
0 John_Greed 195
1 uEatCrayons 85
2 Demo2020cracy 78
3 store_pup 75
4 miknitem_oasis 61
5 MarkCrypto23 54
6 realmamivee 52
7 trailblazers 50
8 devilsvalentine 40
9 SharingforCari1 38
What are the trending topics (Top 10) in twitter now based on hashtags count?
from dataprep.connector import connect
import pandas as pd
import json

dc = connect('twitter', _auth={'client_id':client_id, 'client_secret':client_secret})

pd.options.mode.chained_assignment = None
df = await dc.query("tweets", q=False, _count=2000)

def extract_tags(tags):
  tags_tolist = json.loads(tags.replace("'", '"'))
  only_tag = [str(t['text']) for t in tags_tolist]
  return only_tag

# remove tweets which do not have hashtag
has_hashtags = df[df['hashtags'].str.len() > 2]
# only 'en' tweets are our interests
has_hashtags = has_hashtags[has_hashtags['iso_language_code'] == 'en']
has_hashtags['tag_list'] = has_hashtags['hashtags'].apply(lambda t: extract_tags(t))
tags_and_text = has_hashtags[['text','tag_list']]
tag_count = tags_and_text.explode('tag_list').groupby(['tag_list']).agg(tag_count=('tag_list', 'count'))
# remove tag with only one occurence
tag_count = tag_count[tag_count['tag_count'] > 1]
tag_count = tag_count.sort_values(by=['tag_count'], ascending=False).reset_index()
# Top 10 hashtags
tag_count = tag_count.iloc[0:10,:]
tag_count
id tag_list tag_count
0 jobs 52
1 TractorMarch 24
2 corpsehusbandallegations 22
3 SidNaazians 10
4 GodMorningTuesday 8
5 SupremeGodKabir 7
6 hiring 7
7 نماز_راہ_نجات_ہے 6
8 London 5
9 TravelTuesday 5

Sports

NHL -- Collect National Hockey League Data

How long was the 2000 - 2001 season?
from dataprep.connector import connect
import pandas as pd

conn = Connector(config_path="./config")
df = await conn.query("seasons")

#string to datetime
df["regularSeasonStartDate"] = pd.to_datetime(df["regularSeasonStartDate"])
df["regularSeasonEndDate"] = pd.to_datetime(df["regularSeasonEndDate"])
df['season_length'] = df["regularSeasonEndDate"] - df["regularSeasonStartDate"]

#selecting 2000/2001 season
df.loc[df["seasonId"] == "20002001"][["seasonId", "season_length"]]
seasonId season_length
20002001 186 days
What is the venue of the Pittsburgh Penguins called?
from dataprep.connector import connect

conn = Connector(config_path="./config")
df = await conn.query("teams", expand="team.roster", season="20202021")

df.loc[df["name"]=="Pittsburgh Penguins"][["name","venue_name"]]
name venue_name
Pittsburgh Penguins PPG Paints Arena
What are the names of all the Team awards in the NHL?
from dataprep.connector import connect

conn = Connector(config_path="./config")
df = await conn.query("awards")

df.loc[df["recipientType"] == "Team"][["name"]]
name
Stanley Cup
Clarence S. Campbell Bowl
Presidents’ Trophy
Prince of Wales Trophy

TheSportsDB -- Collect Team and League Data

What were scores of the last 10 NBA games?
from dataprep.connector import connect

conn_thesportsdb = connect('thesportsdb')

NBA_LEAGUE_ID = 4387
df = await conn_thesportsdb.query('events', id=NBA_LEAGUE_ID)

df.drop(['id', 'sport', 'spectators'], axis=1).iloc[:10]
home_team away_team home_score away_score
0 Toronto Raptors Phoenix Suns 100 104
1 Milwaukee Bucks Boston Celtics 114 122
2 Detroit Pistons Brooklyn Nets 111 113
3 Sacramento Kings Golden State Warriors 141 119
4 Los Angeles Lakers Philadelphia 76ers 101 109
5 San Antonio Spurs Los Angeles Clippers 85 98
6 New York Knicks Washington Wizards 106 102
7 Miami Heat Portland Trail Blazers 122 125
8 Utah Jazz Brooklyn Nets 118 88
9 Sacramento Kings Atlanta Hawks 110 108
What is the oldest sports team in Toronto?
from dataprep.connector import connect

conn_thesportsdb = connect('thesportsdb')

df = await conn_thesportsdb.query('teams_by_city', t='toronto')

df = df[df.inaugural_year!=0]
df[df.inaugural_year==df.inaugural_year.min()]
id team inaugural_year league_id facebook twitter instagram
7 135005 Toronto Argonauts 1873 4405 www.facebook.com/ArgosFootball twitter.com/torontoargos instagram.com/torontoargos
What are all team sports supported by TheSportsDB?
from dataprep.connector import connect

conn_thesportsdb = connect('thesportsdb')

df = await conn_thesportsdb.query('sports')

df[df.type=='TeamvsTeam']
id sports type
0 102 Soccer TeamvsTeam
3 105 Baseball TeamvsTeam
4 106 Basketball TeamvsTeam
5 107 American Football TeamvsTeam
6 108 Ice Hockey TeamvsTeam
8 110 Rugby TeamvsTeam
10 112 Cricket TeamvsTeam
12 114 Australian Football TeamvsTeam
14 116 Volleyball TeamvsTeam
15 117 Netball TeamvsTeam
16 118 Handball TeamvsTeam
18 120 Field Hockey TeamvsTeam
Which NBA stadium has highest seating capacity?
from dataprep.connector import connect

conn_thesportsdb = connect('thesportsdb')

NBA_LEAGUE_ID = 4387
df = await conn_thesportsdb.query('teams_by_league', id=NBA_LEAGUE_ID)

df[df.stadium_capacity==df.stadium_capacity.max()]
id team inaugural_year league_id facebook twitter instagram stadium_capacity
4 134870 Chicago Bulls 1966 4387 facebook.com/chicagobulls twitter.com/chicagobulls instagram.com/chicagobulls 23000
What are social media links of the Vancouver Canucks?
from dataprep.connector import connect

conn_thesportsdb = connect('thesportsdb')

CANUCKS_ID = 134850
df = await conn_thesportsdb.query('team', id=CANUCKS_ID)

df[['facebook', 'twitter', 'instagram']]
facebook twitter instagram
0 www.facebook.com/Canucks twitter.com/VanCanucks instagram.com/Canucks

Travel

Amadeus -- Collect Twitch Streams and Channels Information

What are the hotels within 5 km of the Sydney city center, available from 2021-05-01 to 2021-05-02?
from dataprep.connector import connect
import pandas as pd

# You can get ”client_id“ and "cliend_secret" by following https://developers.amadeus.com/
dc = connect('amadeus', _auth={'client_id':client_id, 'client_secret':client_secret})

# Query a date in the future
df = await dc.query('hotel', cityCode="SYD", radius=5,
                    checkInDate='2021-05-01', checkOutDate='2021-05-02',
                    roomQuantity=1)
df  
name rating latitude longitude lines city contact description amenities
0 PARK REGIS CITY CENTRE 4 -33.87318 151.20901 [27 PARK STREET] SYDNEY 61-2-92676511 Park Regis City Centre boasts 122 stylishly ap... [BUSINESS_CENTER, ICE_MACHINES, DISABLED_FACIL...
1 ibis Sydney King Street Wharf 3 -33.86679 151.20256 [22 SHELLEY STREET] SYDNEY 61/2/82430700 Enjoying pride of place near the waterfront in... [ELEVATOR, 24_HOUR_FRONT_DESK, PARKING, INTERN...
2 Best Western Plus Hotel Stellar 3 -33.87749 151.2118 [4 WENTWORTH AVENUE] SYDNEY +61 2 92649754 Located on the bustling corner of Hyde Park an... [HIGH_SPEED_INTERNET, RESTAURANT, 24_HOUR_FRON..
3 ibis Sydney World Square 3 -33.87782 151.20759 [382-384 PITT STREET] SYDNEY 61/2/92820000 Located in Sydney CBD within Sydney's vibrant ... [ELEVATOR, SAFE_DEPOSIT_BOX, PARKING, INTERNET...
What are the available flights from Sydney to Bangkok on 2021-05-02?
from dataprep.connector import connect
import pandas as pd

# You can get ”client_id“ and "cliend_secret" by following https://developers.amadeus.com/
dc = connect('amadeus', _auth={'client_id':client_id, 'client_secret':client_secret})

# Query a date in the future
df = await dc.query('air', originLocationCode="SYD",
                    destinationLocationCode="BKK",
                    departureDate="2021-05-02",
                    adults=1, max=250)
df
source duration departure time arrival time number of bookable seats total price currency one way itineraries
0 GDS PT28H30M 2021-05-02T11:35:00 2021-05-03T12:05:00 9 385.42 EUR False [{'departure': {'iataCode': 'SYD', 'terminal':...
1 GDS PT14H15M 2021-05-02T11:35:00 2021-05-02T21:50:00 9 387.10 EUR False [{'departure': {'iataCode': 'SYD', 'terminal':...
. .. ... ... ... ... ... ... ... ...
68 GDS PT35H30M 2021-05-02T20:55:00 2021-05-04T05:25:00 9 5932.38 EUR False [{'departure': {'iataCode': 'SYD', 'terminal':...
What are the best tours and activities in Barcelona?
from dataprep.connector import connect
import pandas as pd

# You can get ”client_id“ and "cliend_secret" by following https://developers.amadeus.com/
dc = connect('amadeus', _auth={'client_id':client_id, 'client_secret':client_secret})

df = await dc.query('activity', latitude=41.397158, longitude=2.160873)
df[['name','short description', 'rating', 'price', 'currency']]
name short description rating price currency
0 Sagrada Familia fast-track tickets and guided ... Explore unfinished masterpiece with fast-track... 4.400000 39.00 EUR
1 Guided tour of Sagrada Familia with entrance t... Admire the astonishing views of Barcelona from... 4.400000 51.00 EUR
2 La Pedrera Night Experience: A Behind-Closed-D... In Barcelona, go inside one of Antoni Gaudi’s ... 4.500000 34.00 EUR
. ... ... ... ... ...
19 Barcelona: Casa Batlló Entrance Ticket with Sm... Discover Casa Batlló, one of Gaudí’s masterpie... 4.614300 25.00 EUR
What are the best places to visit in Barcelona?
from dataprep.connector import connect
import pandas as pd

# You can get ”client_id“ and "cliend_secret" by following https://developers.amadeus.com/
dc = connect('/amadeus', _auth={'client_id':client_id, 'client_secret':client_secret})

df = await dc.query('interest', latitude=41.397158, longitude=2.160873, limit=30)
df
name category rank tags
0 Casa Batlló SIGHTS 5 [sightseeing, sights, museum, landmark, tourgu...
1 La Pepita RESTAURANT 30 [restaurant, tapas, pub, bar, sightseeing, com...
2 Brunch & Cake RESTAURANT 30 [vegetarian, restaurant, breakfast, shopping, ...
3 Cervecería Catalana RESTAURANT 30 [restaurant, tapas, sightseeing, traditionalcu...
4 Botafumeiro RESTAURANT 30 [restaurant, seafood, sightseeing, professiona...
5 Casa Amatller SIGHTS 100 [sightseeing, sights, museum, landmark, restau...
6 Tapas 24 RESTAURANT 100 [restaurant, tapas, traditionalcuisine, sights...
7 Dry Martini NIGHTLIFE 100 [bar, restaurant, nightlife, club, sightseeing...
8 Con Gracia RESTAURANT 100 [restaurant, sightseeing, commercialplace, pro...
9 Osmosis RESTAURANT 100 [restaurant, shopping, transport, professional...
How safe is Barcelona?
from dataprep.connector import connect
import pandas as pd

# You can get ”client_id“ and "cliend_secret" by following https://developers.amadeus.com/
dc = connect('amadeus', _auth={'client_id':client_id, 'client_secret':client_secret})

df = await dc.query('safety', latitude=41.397158, longitude=2.160873)
df
id name subtype lgbtq score medical score overall score physical harm score political freedom score theft score
0 Q930402719 Barcelona CITY 39 69 45 36 50 44
1 Q930402720 Antiga Esquerra de l'Eixample (Barcelona) DISTRICT 37 69 44 34 50 42
2 Q930402721 Baix Guinardó (Barcelona) DISTRICT 37 69 44 34 50 42
3 Q930402724 Can Baró (Barcelona) DISTRICT 37 69 44 34 50 42
4 Q930402731 El Born (Barcelona) DISTRICT 42 69 47 39 50 49
5 Q930402732 El Camp de l'Arpa del Clot (Barcelona) DISTRICT 37 69 45 35 50 43
6 Q930402733 El Camp d'en Grassot i Gràcia Nova (Barcelona) DISTRICT 37 69 44 34 50 42
7 Q930402736 El Coll (Barcelona) DISTRICT 37 69 44 34 50 42
8 Q930402738 El Fort Pienc (Barcelona) DISTRICT 37 69 44 34 50 42
9 Q930402740 El Parc i la Llacuna del Poblenou (Barcelona) DISTRICT 37 69 45 35 50 43

OurAirport -- Collect Travel Data

What is the country given GeoNames ID?
from dataprep.connector import connect

# You can get ”ourairport_access_token“ by registering as a developer https://rapidapi.com/sujayvsarma/api/ourairport-data-search/details
conn_ourairport = connect('ourairport', _auth={'access_token':ourairport_access_token})

id = '302634'
df = await conn_ourairport.query('country', name_or_id_or_keyword=id)

df
id name continent
0 302634 India AS
What are region names of a range of ID numbers?
from dataprep.connector import connect
import pandas as pd

# You can get ”ourairport_access_token“ by registering as a developer https://rapidapi.com/sujayvsarma/api/ourairport-data-search/details
conn_ourairport = connect('ourairport', _auth={'access_token':ourairport_access_token})

df = pd.DataFrame()
for id in range(303294, 303306):
    id = str(id)
    row = await conn_ourairport.query('region', name_or_id_or_keyword=id)
    df = pd.concat([df, pd.DataFrame(row.iloc[0].values)], axis=1)

df = df.transpose()
df.columns = ['id', 'name', 'country']
df.reset_index(drop=True)
id name country
0 303294 Alberta CA
1 303295 British Columbia CA
2 303296 Manitoba CA
3 303297 New Brunswick CA
4 303298 Newfoundland and Labrador CA
5 303299 Nova Scotia CA
6 303300 Northwest Territories CA
7 303301 Nunavut CA
8 303302 Ontario CA
9 303303 Prince Edward Island CA
10 303304 Quebec CA
11 303305 Saskatchewan CA
What are all countries in Asia?
from dataprep.connector import connect

# You can get ”ourairport_access_token“ by registering as a developer https://rapidapi.com/sujayvsarma/api/ourairport-data-search/details
conn_ourairport = connect('ourairport', _auth={'access_token':ourairport_access_token})


df = await conn_ourairport.query('country', name_or_id_or_keyword='302742')


df = pd.DataFrame()
for id in range(302556, 302742):
    id = str(id)
    row = await conn_ourairport.query('country', name_or_id_or_keyword=id)
    df = pd.concat([df, pd.DataFrame(row.iloc[0].values)], axis=1)

df = df.transpose()
df.columns = ['id', 'name', 'continent']
df = df[df.continent=='AS'] 
df.reset_index(drop=True)
id name continent
0 302618 United Arab Emirates AS
1 302619 Afghanistan AS
2 302620 Armenia AS
3 302621 Azerbaijan AS
4 302622 Bangladesh AS
5 302623 Bahrain AS
6 302624 Brunei AS
7 302625 Bhutan AS
8 302626 Cocos (Keeling) Islands AS
9 302627 China AS
10 302628 Christmas Island AS
11 302629 Cyprus AS
12 302630 Georgia AS
13 302631 Hong Kong AS
14 302632 Indonesia AS
15 302633 Israel AS
16 302634 India AS
17 302635 British Indian Ocean Territory AS
18 302636 Iraq AS
19 302637 Iran AS
20 302638 Jordan AS
21 302639 Japan AS
22 302640 Kyrgyzstan AS
23 302641 Cambodia AS
24 302642 North Korea AS
25 302643 South Korea AS
26 302644 Kuwait AS
27 302645 Kazakhstan AS
28 302646 Laos AS
29 302647 Lebanon AS
30 302648 Sri Lanka AS
31 302649 Burma AS
32 302650 Mongolia AS
33 302651 Macau AS
34 302652 Maldives AS
35 302653 Malaysia AS
36 302654 Nepal AS
37 302655 Oman AS
38 302656 Philippines AS
39 302657 Pakistan AS
40 302658 Palestinian Territory AS
41 302659 Qatar AS
42 302660 Saudi Arabia AS
43 302661 Singapore AS
44 302662 Syria AS
45 302663 Thailand AS
46 302664 Tajikistan AS
47 302665 Timor-Leste AS
48 302666 Turkmenistan AS
49 302667 Turkey AS
50 302668 Taiwan AS
51 302669 Uzbekistan AS
52 302670 Vietnam AS
53 302671 Yemen AS

Video

OMDB -- Collect Movie Data

List Avengers movies from most to least popular
from dataprep.connector import connect
import pandas as pd

# You can get ”omdb_access_token“ by registering as a developer http://www.omdbapi.com/apikey.aspx
conn_omdb = connect('omdb', _auth={'access_token':omdb_access_token})

df = await conn_omdb.query('by_search', s='avengers')
df = df.head(4)

movies_df = pd.DataFrame()
for movie in df.iterrows():
    movies_df = movies_df.append(await conn_omdb.query('by_id_or_title', i=movie[1]['imdb_id']))

movies_df = movies_df.sort_values('imdb_rating', ascending=False)
movies_df.reset_index(drop=True)
title year rated released runtime genre director writers actors plot ... awards poster metascore imdb_rating imdb_votes imdb_id type box_office producer website
0 Avengers: Infinity War 2018 PG-13 27 Apr 2018 149 min Action, Adventure, Sci-Fi Anthony Russo, Joe Russo Christopher Markus (screenplay by), Stephen Mc... Robert Downey Jr., Chris Hemsworth, Mark Ruffa... The Avengers and their allies must be willing ... ... Nominated for 1 Oscar. Another 46 wins & 73 no... https://m.media-amazon.com/images/M/MV5BMjMxNj... 68 8.4 839,788 tt4154756 movie $678,815,482 Marvel Studios N/A
1 Avengers: Endgame 2019 PG-13 26 Apr 2019 181 min Action, Adventure, Drama, Sci-Fi Anthony Russo, Joe Russo Christopher Markus (screenplay by), Stephen Mc... Robert Downey Jr., Chris Evans, Mark Ruffalo, ... After the devastating events of Avengers: Infi... ... Nominated for 1 Oscar. Another 69 wins & 102 n... https://m.media-amazon.com/images/M/MV5BMTc5MD... 78 8.4 816,700 tt4154796 movie $858,373,000 Marvel Studios, Walt Disney Pictures N/A
2 The Avengers 2012 PG-13 04 May 2012 143 min Action, Adventure, Sci-Fi Joss Whedon Joss Whedon (screenplay), Zak Penn (story), Jo... Robert Downey Jr., Chris Evans, Mark Ruffalo, ... Earth's mightiest heroes must come together an... ... Nominated for 1 Oscar. Another 38 wins & 79 no... https://m.media-amazon.com/images/M/MV5BNDYxNj... 69 8 1,263,208 tt0848228 movie $623,357,910 Marvel Studios N/A
3 Avengers: Age of Ultron 2015 PG-13 01 May 2015 141 min Action, Adventure, Sci-Fi Joss Whedon Joss Whedon, Stan Lee (based on the Marvel com... Robert Downey Jr., Chris Hemsworth, Mark Ruffa... When Tony Stark and Bruce Banner try to jump-s... ... 8 wins & 49 nominations. https://m.media-amazon.com/images/M/MV5BMTM4OG... 66 7.3 748,735 tt2395427 movie $459,005,868 Marvel Studios N/A
What is the order of the following movies from highest to lowest amount of money made: Titanic, Avatar, Skyfall
from dataprep.connector import connect
import pandas as pd

# You can get ”omdb_access_token“ by registering as a developer http://www.omdbapi.com/apikey.aspx
conn_omdb = connect('omdb', _auth={'access_token':omdb_access_token})

df = await conn_omdb.query('by_id_or_title', t='titanic')
df = df.append(await conn_omdb.query('by_id_or_title', t='avatar'))
df = df.append(await conn_omdb.query('by_id_or_title', t='skyfall'))

df = df.sort_values('box_office', ascending=False)
df.reset_index(drop=True)
title year rated released runtime genre director writers actors plot ... awards poster metascore imdb_rating imdb_votes imdb_id type box_office
0 Avatar 2009 PG-13 18 Dec 2009 162 min Action, Adventure, Fantasy, Sci-Fi James Cameron James Cameron Sam Worthington, Zoe Saldana, Sigourney Weaver... A paraplegic Marine dispatched to the moon Pan... ... Won 3 Oscars. Another 86 wins & 130 nominations. https://m.media-amazon.com/images/M/MV5BMTYwOT... 83 7.8 1,120,847 tt0499549 movie $760,507,625
1 Titanic 1997 PG-13 19 Dec 1997 194 min Drama, Romance James Cameron James Cameron Leonardo DiCaprio, Kate Winslet, Billy Zane, K... A seventeen-year-old aristocrat falls in love ... ... Won 11 Oscars. Another 112 wins & 83 nominations. https://m.media-amazon.com/images/M/MV5BMDdmZG... 75 7.8 1,048,704 tt0120338 movie $659,363,944
2 Skyfall 2012 PG-13 09 Nov 2012 143 min Action, Adventure, Thriller Sam Mendes Neal Purvis, Robert Wade, John Logan, Ian Flem... Daniel Craig, Judi Dench, Javier Bardem, Ralph... James Bond's loyalty to M is tested when her p... ... Won 2 Oscars. Another 63 wins & 122 nominations. https://m.media-amazon.com/images/M/MV5BMWZiNj... 81 7.7 631,795 tt1074638 movie $304,360,277
Is "Anomalisa" a positive or negative movie?
from dataprep.connector import connect

# download nltk with command: pip3 install nltk
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')

# You can get ”omdb_access_token“ by registering as a developer http://www.omdbapi.com/apikey.aspx
conn_omdb = connect('omdb', _auth={'access_token':omdb_access_token})

df = await conn_omdb.query('by_id_or_title', t='anomalisa')

plot = df['plot']
sia = SentimentIntensityAnalyzer()
if sia.polarity_scores(plot[0])['compound'] > 0:
    print(df['title'][0], 'is a positive movie')
else:
    print(df['title'][0], 'is a negative movie')

Anomalisa is a negative movie

Youtube -- Collect Youtube's Content MetaData.

What are the top 10 Fitness Channels?
from dataprep.connector import connect, info

dc = connect('youtube', _auth={'access_token': auth_token})

df = await dc.query('videos', q='Fitness', part='snippet', type='channel', _count=10)
df[['title', 'description']]
id title description
0 Jordan Yeoh Fitness Hey! Welcome to my Youtube channel! I got noth...
1 FitnessBlender 600 free full length workout videos & counting...
2 The Fitness Marshall Get early access to dances by clicking here: h...
3 POPSUGAR Fitness POPSUGAR Fitness offers fresh fitness tutorial...
4 LiveFitness Hi, I am Nicola and I love all things fitness!...
5 TpindellFitness Strive for progress, not perfection.
6 Love Sweat Fitness My personal weight loss journey of 45 pounds c...
7 Martial Arts Fitness Welcome To My Channel. I love Martial Arts 🥇 ...
8 Zuzka Light My name is Zuzka Light, and my channel is all ...
9 Fitness Factory Lüdenscheid Schaut unter ff-luedenscheid.com Kostenlos übe...
Whats the top Playlists of a list of Singers?
from dataprep.connector import connect, info
import pandas as pd

dc = connect('youtube', _auth={'access_token': auth_token})

df = pd.DataFrame()
singers = [
  'taylor swift',
  'ed sheeran',
  'shawn mendes',
  'ariana grande',
  'michael jackson',
  'selena gomez',
  'lady gaga',
  'shreya ghoshal',
  'bruno mars',
  ]

for singer in singers:
  df1 = await dc.query('videos', q=singer, part='snippet', type='playlist',
                 _count=1)
  df = df.append(df1, ignore_index=True)

df[['title', 'description', 'channelTitle']]
id title description channelTitle
0 Taylor Swift Discography Sarah Bella
1 Ed Sheeran - New And Best Songs (2021) Best Of Ed Sheeran 2021 || Ed Sheeran Greatest... Full Albums!
2 Shawn Mendes: The Album 2018 (Full Album) WorldMusicStream
3 Ariana Grande - Positions (Full Album) October 30, 2020. lo115
4 Michael Jackson Mix Michael Jackson's Songs. Leo Meneses
5 Selena Gomez - Rare [FULL ALBUM 2020] selena gomez,selena gomez rare album,selena go... THUNDERS
6 Lady Gaga - Greatest Hits Lady Gaga - Greatest Hits 01 The Edge Of Glory... Gunther Ruymen
7 Shreya Ghoshal Tamil Hit Songs | #TamilSongs |... Sony Music South
8 The Best of Bruno Mars Warner Music Australia
What are the top 10 sports activities?
from dataprep.connector import connect, info
import pandas as pd
dc = connect('youtube', _auth={'access_token': auth_token})

df = await dc.query('videos', q='Sports', part='snippet', type='activity', _count=10)
df[['title', 'description', 'channelTitle']]
title description channelTitle
0 Sports Tak Sports Tak, as the name suggests, is all about... Sports Tak
1 Sports sport : an activity involving physical exertio... Sports
2 Greatest Sports Moments UPDATE: I AM IN THE PROCESS OF MAKING REVISION... WTD Productions
3 Viagra Boys - Sports (Official Video) Director: Simon Jung DOP: Paul Evans Producer:... viagra boys
4 Volleyball Open Tournament, Jagdev Kalan || 12... Volleyball Open Tournament, Jagdev Kalan || 12... Fine Sports
5 Beach Bunny - Sports booking/inquires: [email protected] hu... Beach Bunny
6 Top 100 Best Sports Bloopers 2020 Watch the Top 100 best sports bloopers from 20... Crazy Laugh Action
7 Memorable Moments in Sports History Memorable Moments in Sports History! SUBSCRİBE... Cenk Bezirci
8 Craziest “Saving Lives” Moments in Sports History Craziest “Saving Lives” Moments in Sports Hist... Highlight Reel
9 Most Savage Sports Highlights on Youtube (S01E01) I do these videos ever year or so, they are ba... Joseph Vincent

Weather

OpenWeatherMap -- Collect Current and Historical Weather Data

What is the temperature of London, Ontario?
from dataprep.connector import connect

owm_connector = connect("openweathermap", _auth={"access_token":access_token})
df = await owm_connector.query('weather',q='London,Ontario,CA')
df[["temp"]]
id temp
0 267.96
What is the wind speed in each provincial capital city?
from dataprep.connector import connect
import pandas as pd
import asyncio

conn = connect("openweathermap", _auth={'access_token':'899b50a47d4c9dad99b6c61f812b786e'}, _concurrency = 5)

names = ["Edmonton", "Victoria", "Winnipeg", "Fredericton", "St. John's", "Halifax", "Toronto", "Charlottetown", \
 "Quebec City", "Regina", "Yellowknife", "Iqaluit", "Whitehorse"]

query_list = [conn.query("weather", q = name) for name in names]
results = asyncio.gather(*query_list)
df = pd.concat(await results)
df['name'] = names
df[["name", "wind"]].reset_index(drop=True)
id name wind
0 Edmonton 6.17
1 Victoria 1.34
2 Winnipeg 2.57
3 Fredericton 4.63
4 St. John's 5.14
5 Halifax 5.14
6 Toronto 1.76
7 Charlottetown 5.14
8 Quebec City 3.09
9 Regina 4.12
10 Yellowknife 3.60
11 Iqaluit 5.66
12 Whitehorse 9.77

⬆️ Back to Index

Contributors ✨

Thanks goes to these wonderful people (emoji key):


Weiyuan Wu

💻 🚧

peiwangdb

💻 🚧

nick-zrymiak

💻 🚧

Pallavi Bharadwaj

💻

Hilal Asmat

📖

Wukkkinz-0725

💻 🚧

Yizhou

💻 🚧

Lakshay-sethi

💻 🚧

kla55

💻

hwec0112

💻

Yi Xie

💻 🚧

Yejia Liu

💻

sahmad11

💻