Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing - performance warning - full index can result in a large number of pairs #187

Open
gajghaten opened this issue Feb 9, 2023 · 3 comments

Comments

@gajghaten
Copy link

gajghaten commented Feb 9, 2023

image

Using warnings.filterwarnings("ignore") does not disable this warning.

I use this function on a dataframe much bigger than the one in the example above and that results in a bunch of these warnings being displayed on the screen.

Could I please get help in disabling them? There are not many resources online on how to disable these kinds of warnings.

@rohitgarud
Copy link

@gajghaten I think the package uses logging to display warnings. Try the following code

import logging
logging.getLogger("recordlinkage").setLevel(logging.ERROR)

@gajghaten
Copy link
Author

gajghaten commented Feb 17, 2023

@rohitgarud Thanks! That worked!

The following helps too!

logger = logging.getLogger('recordlinkage')
logger.disabled = True

# Your code

logger.disabled = False

@rohitgarud
Copy link

@gajghaten Glad to help. But I think you should absolutely never use full index and always use blocking or sorted index. Full index will give (n choose 2 i.e. n*(n-1)/2) pairs, which increases quadratically with the number of records and slows down the record linkage or deduplication process significantly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants