Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using model weights on own dataset #227

Open
CSteele97 opened this issue Jun 8, 2020 · 26 comments
Open

Using model weights on own dataset #227

CSteele97 opened this issue Jun 8, 2020 · 26 comments
Assignees

Comments

@CSteele97
Copy link

CSteele97 commented Jun 8, 2020

Hi, @jakubczakon

I would like to use the model weights to detect buildings from my own imagery, but I'm not entirely sure how to do this. I notice there are two files on the following website (https://ui.neptune.ai/neptune-ai/Mapping-Challenge/e/MC-1057/artifacts) - but I am not sure which file is the model weights and how to implement it on my own imagery. I have also seen the 'Predict on new data' section of REPRODUCE_RESULTS but I do not know what the pipeline_name would be or the prediction_path.

I hope this makes sense, I am very new to machine learning so do not yet understand a lot of things.

I would really appreciate it if you could provide some instructions on how I can achieve this. Thank you.

@kamil-kaczmarek
Copy link
Contributor

hey @CSteele97,

  1. unet is for segmentation task. Please take a look at this section for more info about unet and second level model. In general, you simply load trained model and use it for your own task.
  2. prediction_path is the path where results will be stored as json file.

Hope this helps.

@kamil-kaczmarek
Copy link
Contributor

For the simple case of predicting on some new data, prepare sources and environment, then follow this section: https://github.com/neptune-ai/open-solution-mapping-challenge/blob/master/REPRODUCE_RESULTS.md#predict-on-new-data

@CSteele97
Copy link
Author

Hi @kamil-kaczmarek thank you for your reply.

In the case of the REPRODUCE_RESULTS section for predict on new data, would the pipeline_name therefore be unet, as this is the trained model?

Thank you

@kamil-kaczmarek
Copy link
Contributor

Hey @CSteele97,

There is a full command provided in the aforementioned section. It looks like this:

python main.py predict_on_dir \
--pipeline_name unet_tta_scoring_model \
--chunk_size 1000 \
--dir_path path/to/inference_directory \
--prediction_path path/to/predictions.json

There is a pipeline name provided: unet_tta_scoring_model

Cheers,
Kamil

@CSteele97
Copy link
Author

Thanks @kamil-kaczmarek

I have been trying to run the command you mentioned, but I get an error 'no module named neptune'. I have followed all the previous steps (without a Neptune registration) and am not sure why I am getting this error or how to resolve it.

I appreciate your time in helping me figure all of this out!

Thank you

@kamil-kaczmarek
Copy link
Contributor

did you install neptune?

@kamil-kaczmarek
Copy link
Contributor

It will be simplest workaround

@CSteele97
Copy link
Author

I have managed to solve the neptune issue using pip install neptune-cli, thanks

@CSteele97
Copy link
Author

I have tried to run the above command however I am now receiving 'Error: No such command 'predict_on_dir'

@kamil-kaczmarek
Copy link
Contributor

I see that you installed neptune-cli. This will very likely not work as neptune-cli is our heritage library that we no longer support.

The best solution here is to create an environment using conda. Here is full specification of the conda environment: https://github.com/neptune-ai/open-solution-mapping-challenge/blob/master/environment.yml
Conda docs about managing environments: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html

Regarding Error with predict_on_dir. Please make sure that you run this command from the repo root. I think that it will solve the problem. This method is defined in the main file: https://github.com/neptune-ai/open-solution-mapping-challenge/blob/master/main.py#L51

Hope this helps :)

@CSteele97
Copy link
Author

Thanks Kamil,

I have updated my environment which seems to now be working.

I have been running the command from the open-solution-mapping-challenge directory - is this correct?

Thank you

@kamil-kaczmarek
Copy link
Contributor

Hey @CSteele97,

Yep, it should work.

@CSteele97
Copy link
Author

Thanks Kamil,

I've tried running the command again from the aforementioned directory but it's still giving the predict_on_dir error - any idea why this might be?

@kamil-kaczmarek
Copy link
Contributor

Hey,

Can you paste full error massage?

@CSteele97
Copy link
Author

/anaconda3/envs/mapping/lib/python3.6/site-packages/sklearn/externals/joblib/init.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
warnings.warn(msg, category=FutureWarning)
/Users/open-solution-mapping-challenge/src/utils.py:132: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
/anaconda3/envs/mapping/lib/python3.6/site-packages/lightgbm/init.py:46: UserWarning: Starting from version 2.2.1, the library file in distribution wheels for macOS is built by the Apple Clang (Xcode_9.4.1) compiler.
This means that in case of installing LightGBM from PyPI via the pip install lightgbm command, you don't need to install the gcc compiler anymore.
Instead of that, you need to install the OpenMP library, which is required for running LightGBM on the system with the Apple Clang compiler.
You can install the OpenMP library by the following command: brew install libomp.
"You can install the OpenMP library by the following command: brew install libomp.", UserWarning)
/Users/open-solution-mapping-challenge/src/utils.py:132: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
Usage: main.py [OPTIONS] COMMAND [ARGS]...
Try 'main.py --help' for help.

Error: No such command 'predict_on_dir'.

@kamil-kaczmarek
Copy link
Contributor

Great thanks,

Can you also paste full command that you use?

@CSteele97
Copy link
Author

python main.py predict_on_dir
--pipeline_name unet_tta_scoring_model
--chunk_size 1000
--dir_path /test_images
--prediction_path /data/experiments/predictions.json

@jakubczakon
Copy link
Collaborator

Hi @CSteele97

I have just successfully run:

python main.py predict_on_dir \
   --pipeline_name unet_tta_scoring_model \
   --chunk_size 100 \
   --dir_path data/paper_images \
   --prediction_path data/paper_images_predictions.json

perhaps you didn't use the \ ?

@asahi417
Copy link

asahi417 commented Jun 29, 2020

Hi, I got a different error here, when I ran the above command. Any idea?

ValueError: No transformer cached unet

I'm actually not sure where I should put the released checkpoint. Currently I've put them as /data/experiments/mapping_challenge_baseline/checkpoints/scoring_model, /data/experiments/mapping_challenge_baseline/checkpoints/unet.

@asahi417
Copy link

I'm wondering that you've released checkpoints for scoring_model and unet, but to run the inference, it seems like we need transformer to produce inference based on those released checkpoints. How those can be generated?

@jakubczakon
Copy link
Collaborator

jakubczakon commented Jun 29, 2020

Hi @asahi417 those transformers that don't have any state are created on the fly so you only need unet and scoring_model.

Both of those trained models should be placed in the transformers subfolder of your /data/experiments/mapping_challenge_baseline/transformers so if you just put them there it should run an inference with no problems.

I tried to explain it in the Reproduce Results but I am not sure if it is clear:

project
|--   README.md
|-- ...
|-- data
    |-- raw
         |-- train 
            |-- images 
            |-- annotation.json
         |-- val 
            |-- images 
            |-- annotation.json
         |-- test_images 
            |-- img1.jpg
            |-- img2.jpg
            |-- ...
    |-- meta
         |-- masks_overlayed_eroded_{}_dilated_{} # it is generated automatically
            |-- train 
                |-- distances 
                |-- masks 
                |-- sizes 
            |-- val 
                |-- distances 
                |-- masks 
                |-- sizes 
    |-- experiments
        |-- mapping_challenge_baseline # this is where your experiment files will be dumped
            |-- checkpoints # neural network checkpoints
            |-- transformers # serialized transformers after fitting
            |-- outputs # outputs of transformers if you specified save_output=True anywhere
            |-- prediction.json # prediction on valid

I hope this helps.

@asahi417
Copy link

asahi417 commented Jul 1, 2020

Thanks, and I finally managed to run an inference with the released checkpoints, which is a huge progress! However, the inference is very random... Do you have any sense why it produces such a poor predictions?
06OC18363_000010-rotate-0-crop-4

06OC18363_000010-rotate-0-crop-1
06OC18363_000010-rotate-0-crop-3

@asahi417
Copy link

asahi417 commented Jul 1, 2020

Also, I'm wondering if it possible to finetune the released checkpoint to own dataset.

@jakubczakon
Copy link
Collaborator

jakubczakon commented Jul 1, 2020

Hi there,

I think there may be something wrong with the indices of your images in the prediction file. It seems that those predictions belong to different images right?
A simple way to debug is to run predict on folder with just one image in it.
I had this problem in the past but I haven't encountered it in a while.

You can easily fine-tune by overriding (or simply pasting) a snippet that loads weights when you train in steps/pytorch.models.py.

@asahi417
Copy link

asahi417 commented Jul 6, 2020

@jakubczakon Hi, thanks for your feedback. I've tried to export segmentation over single image, but still attained similar results... Could you take a look my code where I export segmentation map from coco-formatted prediction file, which was produced by your python main.py predict_on_dir script.

https://github.com/asahi417/open-solution-mapping-challenge-script

@Gokul-S-Kumar
Copy link

Gokul-S-Kumar commented Apr 11, 2021

/anaconda3/envs/mapping/lib/python3.6/site-packages/sklearn/externals/joblib/init.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
warnings.warn(msg, category=FutureWarning)
/Users/open-solution-mapping-challenge/src/utils.py:132: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
/anaconda3/envs/mapping/lib/python3.6/site-packages/lightgbm/init.py:46: UserWarning: Starting from version 2.2.1, the library file in distribution wheels for macOS is built by the Apple Clang (Xcode_9.4.1) compiler.
This means that in case of installing LightGBM from PyPI via the pip install lightgbm command, you don't need to install the gcc compiler anymore.
Instead of that, you need to install the OpenMP library, which is required for running LightGBM on the system with the Apple Clang compiler.
You can install the OpenMP library by the following command: brew install libomp.
"You can install the OpenMP library by the following command: brew install libomp.", UserWarning)
/Users/open-solution-mapping-challenge/src/utils.py:132: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
Usage: main.py [OPTIONS] COMMAND [ARGS]...
Try 'main.py --help' for help.

Error: No such command 'predict_on_dir'.

I solved this error in a different way. Inside the main.py script you will find a line before the function definition as @main.command() . You actually need to provide a string as the argument to this click method. The string should be the one that you use in the command line, i.e., predict_on_dir here. So the line before the predict_on_dir method should be @main.command('predict_on_dir'). Do the same for all other methods to run it from the command line using click.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants