Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: argument of type 'azureml.dataprep.rslex.StreamInfo' is not iterable #1959

Open
lhrotk opened this issue Feb 19, 2024 · 4 comments

Comments

@lhrotk
Copy link

lhrotk commented Feb 19, 2024

Hi, I encountered below error, I have azureml-opendatasets==1.55.0

Calling to_spark_dataframe()
Traceback (most recent call last):
File "/mnt/c/users/cruiseli/OneDrive - Microsoft/Desktop/workspace/SynapseML-Utils/test_aml.py", line 30, in
nyc_tlc_df2 = nyc_tlc.to_spark_dataframe()
File "/home/cruise/mambaforge/lib/python3.10/site-packages/azureml/opendatasets/accessories/_loggerfactory.py", line 139, in wrapper
return func(*args, **kwargs)
File "/home/cruise/mambaforge/lib/python3.10/site-packages/azureml/opendatasets/accessories/open_dataset_base.py", line 164, in to_spark_dataframe
return self._to_spark_dataframe()
File "/home/cruise/mambaforge/lib/python3.10/site-packages/azureml/opendatasets/accessories/open_dataset_base.py", line 305, in _to_spark_dataframe
return self._blob_accessor.get_spark_dataframe(
File "/home/cruise/mambaforge/lib/python3.10/site-packages/azureml/opendatasets/dataaccess/_blob_accessor.py", line 303, in get_spark_dataframe
paths = [wasab_format % (self._blob_container_name, self._blob_account_name,
File "/home/cruise/mambaforge/lib/python3.10/site-packages/azureml/opendatasets/dataaccess/_blob_accessor.py", line 304, in
self._get_relative_path(path)) for path in target_paths]
File "/home/cruise/mambaforge/lib/python3.10/site-packages/azureml/opendatasets/dataaccess/_blob_accessor.py", line 470, in _get_relative_path
if "blob.core.windows.net" in url:
TypeError: argument of type 'azureml.dataprep.rslex.StreamInfo' is not iterable

Code to reproduce:

import azureml.core
from azureml.core import Datastore, Dataset
from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from azureml.core.authentication import InteractiveLoginAuthentication

import logging
import pandas as pd
import time
import sys

print("Testing opendatasets -- start")
from azureml.opendatasets import NycTlcYellow
from datetime import datetime
from dateutil import parser
 
end_date = parser.parse('2018-05-30')
start_date = parser.parse('2018-05-28')
nyc_tlc = NycTlcYellow(start_date=start_date, end_date=end_date)


print("Calling to_pandas_dataframe()")
ts = time.time()
nyc_tlc_df = nyc_tlc.to_pandas_dataframe()
te = time.time()
print("Time taken to perform to_pandas_dataframe():" + str(te-ts))

print("Calling to_spark_dataframe()")
ts2 = time.time()
nyc_tlc_df2 = nyc_tlc.to_spark_dataframe()
te2 = time.time()
nyc_tlc_df2.show(2, truncate = False)
print("Time taken to perform to_spark_dataframe():" + str(te2-ts2))

print("Testing opendatasets -- end")
@lhrotk lhrotk changed the title 'azureml.dataprep.rslex.StreamInfo' is not iterable TypeError: argument of type 'azureml.dataprep.rslex.StreamInfo' is not iterable Feb 19, 2024
@anliakho2
Copy link
Member

Thank you for reporting this issue, I have investigated it and found the underlying bug. This is now fixed and will be released in the next update to open-datasets package.

@anliakho2
Copy link
Member

@lhrotk The patch for opendatasets==1.55.0.post1 is now released and this should work now

@pausbyte
Copy link

Hi, @lhrotk has the issue been resolved for you? Asking because I'm receiving the same error using 1.55.0.post1...

@HarshitChandani
Copy link

Hey @anliakho2, I am also getting this same error even after upgrading the azureml.opendatasets to 1.55.0.post1...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants