Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low success rate on donwloading laion400m #400

Open
tchaton opened this issue Feb 3, 2024 · 26 comments
Open

Low success rate on donwloading laion400m #400

tchaton opened this issue Feb 3, 2024 · 26 comments

Comments

@tchaton
Copy link

tchaton commented Feb 3, 2024

Hey there, @rom1504,

I have been trying to download laion400m using the scripts from an EC2 instance m5n.8xlarge and the success rate is quite poor.

I am getting a success rate of 10 images for 10k requests with the default command in the README.

Any idea why I am doing wrong ?

Best,
T.C

@rom1504
Copy link
Owner

rom1504 commented Feb 3, 2024 via email

@tchaton
Copy link
Author

tchaton commented Feb 3, 2024

Oh interesting. I haven't. Let me try again. What's knot resolver ?

@tchaton
Copy link
Author

tchaton commented Feb 3, 2024

I am getting errors when trying to install knot resolver too.

~ wget https://secure.nic.cz/files/knot-resolver/knot-resolver-release.deb
--2024-02-03 11:16:00--  https://secure.nic.cz/files/knot-resolver/knot-resolver-release.deb
Resolving secure.nic.cz (secure.nic.cz)... failed: Temporary failure in name resolution.
wget: unable to resolve host addresssecure.nic.cz’
⚡ ~ sudo dpkg -i knot-resolver-release.deb
sudo: unable to resolve host ip-10-192-12-27: Temporary failure in name resolution
dpkg: error: cannot access archive 'knot-resolver-release.deb': No such file or directory

@tchaton
Copy link
Author

tchaton commented Feb 3, 2024

Here are the normal logs. Looks like wandb had a Network error (TransientError), entering retry loop

~ img2dataset --url_list the-eye.eu/public/AI/cah/laion400m-met-release/laion400m-meta/ --input_format "parquet"\
>          --url_col "URL" --caption_col "TEXT" --output_format webdataset\
>            --output_folder laion400m-data --processes_count 32 --thread_count 128 --image_size 256\
>              --save_additional_columns '["NSFW","similarity","LICENSE"]' --enable_wandb True
Starting the downloading of this file
Sharding file number 1 of 32 called /teamspace/studios/this_studio/the-eye.eu/public/AI/cah/laion400m-met-release/laion400m-meta/part-00000-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet
0it [00:00, ?it/s]File sharded in 1294 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
wandb: Currently logged in as: thomas-chaton. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.16.2
wandb: Run data is saved locally in /teamspace/studios/this_studio/wandb/run-20240203_111216-t4t3ohoz
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run woven-microwave-1
wandb: ⭐️ View project at https://wandb.ai/thomas-chaton/img2dataset
wandb: 🚀 View run at https://wandb.ai/thomas-chaton/img2dataset/runs/t4t3ohoz
wandb: Network error (TransientError), entering retry loop.
1it [04:07, 247.25s/it]worker  - success: 0.002 - failed to download: 0.998 - failed to resize: 0.000 - images per sec: 42 - count: 10000
total   - success: 0.002 - failed to download: 0.998 - failed to resize: 0.000 - images per sec: 42 - count: 10000
17it [04:11,  1.61s/it]wandb: Network error (TransientError), entering retry loop.
22it [04:14,  1.16it/s]wandb: Network error (TransientError), entering retry loop.
24it [04:15,  1.61it/s]worker  - success: 0.008 - failed to download: 0.992 - failed to resize: 0.000 - images per sec: 41 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 83 - count: 20000
worker  - success: 0.004 - failed to download: 0.996 - failed to resize: 0.000 - images per sec: 42 - count: 10000
total   - success: 0.004 - failed to download: 0.996 - failed to resize: 0.000 - images per sec: 124 - count: 30000
worker  - success: 0.007 - failed to download: 0.993 - failed to resize: 0.000 - images per sec: 42 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 165 - count: 40000
worker  - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 41 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 207 - count: 50000
worker  - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 41 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 245 - count: 60000
worker  - success: 0.007 - failed to download: 0.993 - failed to resize: 0.000 - images per sec: 41 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 285 - count: 70000
worker  - success: 0.006 - failed to download: 0.994 - failed to resize: 0.000 - images per sec: 41 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 326 - count: 80000
worker  - success: 0.007 - failed to download: 0.993 - failed to resize: 0.000 - images per sec: 41 - count: 10000
total   - success: 0.006 - failed to download: 0.994 - failed to resize: 0.000 - images per sec: 366 - count: 90000
worker  - success: 0.004 - failed to download: 0.996 - failed to resize: 0.000 - images per sec: 42 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 406 - count: 100000
worker  - success: 0.006 - failed to download: 0.994 - failed to resize: 0.000 - images per sec: 41 - count: 10000
total   - success: 0.005 - failed to download: 0.994 - failed to resize: 0.000 - images per sec: 447 - count: 110000
worker  - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 41 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 487 - count: 120000
worker  - success: 0.002 - failed to download: 0.998 - failed to resize: 0.000 - images per sec: 42 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 528 - count: 130000
worker  - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 42 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 569 - count: 140000
worker  - success: 0.007 - failed to download: 0.993 - failed to resize: 0.000 - images per sec: 41 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 609 - count: 150000
worker  - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 41 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 650 - count: 160000
worker  - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 41 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 690 - count: 170000
worker  - success: 0.004 - failed to download: 0.996 - failed to resize: 0.000 - images per sec: 41 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 731 - count: 180000
worker  - success: 0.006 - failed to download: 0.994 - failed to resize: 0.000 - images per sec: 41 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 772 - count: 190000
worker  - success: 0.007 - failed to download: 0.993 - failed to resize: 0.000 - images per sec: 41 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 812 - count: 200000
worker  - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 42 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 853 - count: 210000
worker  - success: 0.006 - failed to download: 0.994 - failed to resize: 0.000 - images per sec: 42 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 894 - count: 220000
worker  - success: 0.007 - failed to download: 0.993 - failed to resize: 0.000 - images per sec: 41 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 934 - count: 230000
worker  - success: 0.002 - failed to download: 0.999 - failed to resize: 0.000 - images per sec: 42 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 975 - count: 240000
28it [04:21,  1.13s/it]worker  - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 40 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 1004 - count: 250000
worker  - success: 0.004 - failed to download: 0.996 - failed to resize: 0.000 - images per sec: 40 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 1038 - count: 260000
worker  - success: 0.006 - failed to download: 0.994 - failed to resize: 0.000 - images per sec: 40 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 1071 - count: 270000
worker  - success: 0.006 - failed to download: 0.994 - failed to resize: 0.000 - images per sec: 40 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 1111 - count: 280000
31it [04:26,  1.32s/it]worker  - success: 0.008 - failed to download: 0.992 - failed to resize: 0.000 - images per sec: 39 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 1134 - count: 290000
worker  - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 39 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 1173 - count: 300000
worker  - success: 0.006 - failed to download: 0.994 - failed to resize: 0.000 - images per sec: 39 - count: 10000
total   - success: 0.005 - failed to download: 0.995 - failed to resize: 0.000 - images per sec: 1207 - count: 310000
32it [04:51,  8.13s/it]worker  - success: 0.037 - failed to download: 0.963 - failed to resize: 0.000 - images per sec: 35 - count: 10000
total   - success: 0.006 - failed to download: 0.994 - failed to resize: 0.000 - images per sec: 1133 - count: 320000

@rom1504
Copy link
Owner

rom1504 commented Feb 3, 2024 via email

@tchaton
Copy link
Author

tchaton commented Feb 3, 2024

Thanks @rom1504 The machine has 32 CPUs, so I thought it should be fine. I am running inside a docker container, so having some issues to install knot resolver.

I will keep you updated.

@rom1504
Copy link
Owner

rom1504 commented Feb 3, 2024 via email

@tchaton
Copy link
Author

tchaton commented Feb 3, 2024

Hey @rom1504 Any idea what I should be looking for on the docker or cloud provider side as possible source of issues?

Also, should I use knot or bind9?

@rom1504
Copy link
Owner

rom1504 commented Feb 3, 2024 via email

@tchaton
Copy link
Author

tchaton commented Feb 3, 2024

Thanks, @rom1504 I will check this out.

I managed to install knot on the host but it isn't visible inside the container and networking seems broken. Have you ever tried?

@tchaton
Copy link
Author

tchaton commented Feb 3, 2024

I am also curious what kind of numbers do you get without using knot resolver ?

@rom1504
Copy link
Owner

rom1504 commented Feb 3, 2024 via email

@tchaton
Copy link
Author

tchaton commented Feb 3, 2024

Hey @rom1504 I am trying to get it working on https://lightning.ai/, so it runs in docker. Yes, my success rate is far from this. So something is wrong.

@tchaton
Copy link
Author

tchaton commented Feb 3, 2024

@rom1504 Here is the PR I am working on: Lightning-AI/pytorch-lightning#19400 and the API:

I am trying to make data processing efficient while easy to hack around. Here is the example to download laion400m. Still need some extra optimizations.

import os
from multiprocessing.pool import ThreadPool
from lightning.data import optimize
from lightning.data.processing.readers import ParquetReader
from lightning.data.processing.image import download_image
from PIL import Image
from time import sleep

input_dir = "the-eye.eu/public/AI/cah/laion400m-met-release/laion400m-meta"
parquet_files = [os.path.join(input_dir, f) for f in os.listdir(input_dir) if f.endswith(".parquet")]

def process(row):
    image_id, url, text, height, width, image_license, nsfw, similarity = row
    img, err = download_image(url, 1, timeout=5)
    if err:
        return None, err

    try:
        return [image_id, Image.open(row[1]).resize((224, 224)), text, image_license, nsfw, similarity], None
    except Exception:
        return None, err

class Fetcher:

    def __init__(self, max_threads=32):
        self.max_threads = max_threads

    def __call__(self, df):
        rows = [list(row) for row in df.iter_rows() if row[0] is not None]
        with ThreadPool(self.max_threads) as thread_pool:
            for row, err in thread_pool.imap_unordered(process, rows):
                if err is not None:
                    continue

                yield row

optimize(
    fn=Fetcher(max_threads=16),
    inputs=parquet_files,
    output_dir="/teamspace/datasets/laion400m",
    num_workers=os.cpu_count(),
    reader=ParquetReader(num_rows=2048, to_pandas=False),
    chunk_bytes="64MB",
)

And the associated Streaming library I have been working on:

https://lightning.ai/lightning-ai/studios/benchmark-cloud-data-loading-libraries

If this is ok, I will make a PR to add a Lightning Data writer to img2dataset.

@rom1504
Copy link
Owner

rom1504 commented Feb 3, 2024 via email

@tchaton
Copy link
Author

tchaton commented Feb 3, 2024

It seemed Image downloading speeds were quite similar between optimize and img2dataset. But I need to be more principled and collect the same metrics to build a more educated comparison.

But first, I need to resolve the low downloading speed and low success rate behind so low.

But the StreamingDataset is faster than Webdataset though. Actually, you can try it yourself by duplicating my Studio: lightning.ai/lightning-ai/studios/benchmark-cloud-data-loading-libraries. It contains everything, python deps, code, data, etc...

I am happy to get on call to chat more about design and optimizations if you are interested.

@tchaton
Copy link
Author

tchaton commented Feb 3, 2024

The distribution is already fully handled by the optimize and map operators. Check this example: https://lightning.ai/lightning-ai/studios/prepare-the-tinyllama-1t-token-dataset?view=public&section=data+processing

Example to tokenize SlimPajama.

import json
from pathlib import Path
import zstandard as zstd
from lightning.data import optimize
from tokenizer import Tokenizer
from functools import partial
from lightning_sdk import Machine

# 1. Function to tokenize the text contained within the Slimpajama files
def tokenize_fn(filepath, tokenizer=None):
    with zstd.open(open(filepath, "rb"), "rt", encoding="utf-8") as f:
        for row in f:
            text = json.loads(row)["text"]
            if json.loads(row)["meta"]["redpajama_set_name"] == "RedPajamaGithub":
                continue  # exclude the GitHub data since it overlaps with starcoder
            text_ids = tokenizer.encode(text, bos=False, eos=True)
            yield text_ids

# 2. Generate the inputs (we are going to optimize all the compressed json files from SlimPajama dataset)
input_dir = "/teamspace/studios/SlimPajama_Dataset/data/train"
inputs = [str(file) for file in Path(input_dir).rglob("*.jsonl.zst")]

# 3. Store the optimized data wherever you want under "/teamspace/datasets" or "/teamspace/s3_connections"
outputs = optimize(
    fn=partial(tokenize_fn, tokenizer=Tokenizer("./checkpoints/Llama-2-7b-hf")), # Note: You can use HF tokenizer or any others
    inputs=inputs,
    output_dir="/teamspace/datasets/slimpajama/train/",
    chunk_size=(2049 * 8012),
    num_nodes=16,
    machine=Machine.DATA_PREP, # use 32 CPU machine
)

This remotely process the full dataset over 16 nodes and make it processable by the StreamingDataset.

image

Or this one to embed Wikipedia in 15 min: https://lightning.ai/lightning-ai/studios/embed-english-wikipedia-under-5-dollars

@tchaton
Copy link
Author

tchaton commented Feb 5, 2024

Hey @rom1504 I am able to get 1.1k images/sec.

I think I have a version of knot resolver that works. I am also using http2 from httpx client and I sorted to parquet files by URL to hopefully help slightly the DNS resolving.

But the ratio of success is around 60%, so quite far from yours though. I will try again img2dataset. There is possibly something with docker not well configured.

Best,
T.C

@rom1504
Copy link
Owner

rom1504 commented Feb 5, 2024

Be careful with sorting the urls as you risk to dos the hosts. I had randomly shuffled them in laion datasets to mitigate this.

Some people recently have had some success by calling knot with all unique domains to get its cache ready.

Usually I didn't hit issues with DNS when using knot though. Issues only happens in some environments with restricted DNS setup

But the ratio of success is around 60%, so quite far from yours though.

You can log the errors to try and understand what the cause is. In img2dataset there is a wandb table for it.

Hey @rom1504 I am able to get 1.1k images/sec.

Nice! How many cores are you using?

@tchaton
Copy link
Author

tchaton commented Feb 5, 2024

Hey @rom1504,

Be careful with sorting the urls as you risk to dos the hosts. I had randomly shuffled them in laion datasets to mitigate this.

Interesting. Yes, I didn't think of that. Good call !

Some people recently have had some success by calling knot with all unique domains to get its cache ready.

This is a good idea. I will see if there is a simple way for to add support for this.

Issues only happens in some environments with restricted DNS setup

I am capturing the errors and printing them. I will share what I am getting in couple of hours.

Nice! How many cores are you using?

I am using a 32 CPU machine, so slightly lower than what you told me to expect. I will try img2dataset again to get numbers.

@tchaton
Copy link
Author

tchaton commented Feb 5, 2024

# main ones
- [Errno 101] Network is unreachable,
- [Errno 99] Cannot assign requested address
- [Errno -2] Name or service not known

# the rest
- [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'site.aimbulance.com'. (_ssl.c:997)

@tchaton
Copy link
Author

tchaton commented Feb 5, 2024

Interestingly, the ratio with img2dataset is quite lower:

worker  - success: 0.105 - failed to download: 0.894 - failed to resize: 0.001 - images per sec: 308 - count: 10000
total   - success: 0.082 - failed to download: 0.918 - failed to resize: 0.000 - images per sec: 2979 - count: 96608
worker  - success: 0.146 - failed to download: 0.854 - failed to resize: 0.000 - images per sec: 313 - count: 10000
total   - success: 0.088 - failed to download: 0.912 - failed to resize: 0.000 - images per sec: 3288 - count: 106608
worker  - success: 0.128 - failed to download: 0.872 - failed to resize: 0.000 - images per sec: 300 - count: 10000
total   - success: 0.091 - failed to download: 0.909 - failed to resize: 0.000 - images per sec: 3497 - count: 116608
worker  - success: 0.124 - failed to download: 0.876 - failed to resize: 0.000 - images per sec: 311 - count: 10000
total   - success: 0.094 - failed to download: 0.906 - failed to resize: 0.000 - images per sec: 3797 - count: 126608
worker  - success: 0.174 - failed to download: 0.825 - failed to resize: 0.001 - images per sec: 343 - count: 10000
total   - success: 0.100 - failed to download: 0.900 - failed to resize: 0.000 - images per sec: 4097 - count: 136608
worker  - success: 0.159 - failed to download: 0.840 - failed to resize: 0.001 - images per sec: 178 - count: 5536
total   - success: 0.102 - failed to download: 0.898 - failed to resize: 0.000 - images per sec: 4263 - count: 142144
worker  - success: 0.090 - failed to download: 0.909 - failed to resize: 0.001 - images per sec: 317 - count: 10000
total   - success: 0.101 - failed to download: 0.898 - failed to resize: 0.000 - images per sec: 4563 - count: 152144
worker  - success: 0.149 - failed to download: 0.851 - failed to resize: 0.000 - images per sec: 313 - count: 10000
total   - success: 0.104 - failed to download: 0.896 - failed to resize: 0.000 - images per sec: 4863 - count: 162144
worker  - success: 0.082 - failed to download: 0.918 - failed to resize: 0.000 - images per sec: 305 - count: 10000
total   - success: 0.103 - failed to download: 0.897 - failed to resize: 0.000 - images per sec: 5163 - count: 172144
worker  - success: 0.120 - failed to download: 0.880 - failed to resize: 0.000 - images per sec: 304 - count: 10000
total   - success: 0.104 - failed to download: 0.896 - failed to resize: 0.000 - images per sec: 5463 - count: 182144
worker  - success: 0.102 - failed to download: 0.897 - failed to resize: 0.001 - images per sec: 316 - count: 10000
total   - success: 0.104 - failed to download: 0.896 - failed to resize: 0.000 - images per sec: 5763 - count: 192144
worker  - success: 0.099 - failed to download: 0.901 - failed to resize: 0.000 - images per sec: 305 - count: 10000
total   - success: 0.104 - failed to download: 0.896 - failed to resize: 0.000 - images per sec: 6063 - count: 202144
worker  - success: 0.194 - failed to download: 0.806 - failed to resize: 0.000 - images per sec: 318 - count: 10000
total   - success: 0.108 - failed to download: 0.892 - failed to resize: 0.000 - images per sec: 6363 - count: 212144
worker  - success: 0.152 - failed to download: 0.848 - failed to resize: 0.000 - images per sec: 308 - count: 10000
total   - success: 0.110 - failed to download: 0.890 - failed to resize: 0.000 - images per sec: 6644 - count: 222144
{
    "count": 10000,
    "successes": 900,
    "failed_to_download": 9093,
    "failed_to_resize": 7,
    "duration": 31.51988196372986,
    "start_time": 1707166867.7824914,
    "end_time": 1707166899.3023734,
    "status_dict": {
        "<urlopen error [Errno -2] Name or service not known>": 996,
        "<urlopen error [Errno -3] Temporary failure in name resolution>": 31,
        "success": 900,
        "Image decoding error": 7,
        "HTTP Error 404: Not Found": 105,
        "timed out": 1,
        "HTTP Error 403: Forbidden": 23,
        "<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for '0.realtorpage.io'. (_ssl.c:997)>": 1,
        "<urlopen error [Errno 99] Cannot assign requested address>": 7936
    }
}```

@tchaton
Copy link
Author

tchaton commented Feb 8, 2024

Hey @rom1504 I found this interesting issue: pola-rs/polars#14358. I need to add profiling. But it seems you got around this by creating shards from the parquet files to optimize the distribution: https://github.com/rom1504/img2dataset/blob/main/img2dataset/reader.py#L189.

This is a great idea. I am going to try this out.

@tchaton
Copy link
Author

tchaton commented Feb 9, 2024

Hey @rom1504 I started a distributed Job on 32 nodes to download the dataset. This is my first test run. I will keep you updated.

Screenshot 2024-02-09 at 15 57 50

@SomnusQue
Copy link

Sorry to bother you. Could you tell me how to download laion400M dataset? I use this code try to download:img2dataset --url_list laion400m-meta --input_format "parquet" --url_col "URL" --caption_col "TEXT" --output_format webdataset --output_folder laion400m-data --processes_count 16 --thread_count 128 --image_size 256 --save_additional_columns '["NSFW","similarity","LICENSE"]' --enable_wandb True, but sth wrong happened.

@tchaton
Copy link
Author

tchaton commented Feb 22, 2024

Hey @SomnusQue Here is the full blogpost explaining how to download the dataset: lightning.ai/lightning-ai/studios/download-stream-400m-images-text~01hg0zg8fyybp7p1sma6g9dkzm.

@rom1504 I would appreciate if you could have a read and give me your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants