Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change .PERIOD ,COMMA to . and , #80

Open
campConservative opened this issue Oct 17, 2022 · 2 comments
Open

Change .PERIOD ,COMMA to . and , #80

campConservative opened this issue Oct 17, 2022 · 2 comments

Comments

@campConservative
Copy link

I am using your punctuator for a school project I’m working on and I’m not good at python of ML. I realized when I punctuate my text it comes as follows:

“this is ninety nine percent invisible ,COMMA i'm ,COMMA roman mars .PERIOD it started with a place called the stone wall in gay bars had been raided by police for decades ,COMMA......

I tried to change the code below this way from data.py:

PUNCTUATION_VOCABULARY = [SPACE, ",", ".", "?", "!", ":", ";", "-"]
#PUNCTUATION_MAPPING = {}

But still comes up as ,COMMA, .PERIOD
Any help how can I fix this to show only , and .?
The demo site only shows , and .

Thank you

@ottokart
Copy link
Owner

You can just use a simple post-processing to convert the output to a more readable format:

sed -e 's/ ,COMMA/\,/g;s/ .PERIOD/\./g;s/ ?QUESTIONMARK/\?/g;s/ !EXCLAMATIONMARK/\!/g;s/ :COLON/\:/g;s/ ;SEMICOLON/\;/g;s/ -DASH/ \-/g' text.txt > text.clean.txt

where text.txt is the raw output with .PERIOD etc. and text.clean.txt is the clean output.

@campConservative
Copy link
Author

Ok yes this makes sense. Once I have the punctuation in place it's easy to convert, thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants