Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAQ: Better handeling of unicode value U+FE0F with Python+Javascript. #405

Merged
merged 4 commits into from
Nov 5, 2022

Conversation

kolibril13
Copy link
Contributor

Don't merge yet, it's only a draft pull request.
Attempt to solve #404.
@Joshix-1 can you have a look at this?
It's not yet a working solution, because sometimes this character is needed:

🦴 -> "1F9B4", can be found in https://raw.githubusercontent.com/hfg-gmuend/openmoji/master/color/72x72/1F9B4.png
🐿️ -> "1F43F-FE0F" , can be found in https://raw.githubusercontent.com/hfg-gmuend/openmoji/master/color/72x72/1F43F.png
👩‍⚕️ -> "1F469-200D-2695-FE0F" can be found in https://raw.githubusercontent.com/hfg-gmuend/openmoji/master/color/72x72/1F469-200D-2695-FE0F.png

Currently, the last example would break, because the FE0F should not be removed.
I think a distinction of cases is needed here. Am I right in the assumption, that all emojis that have more than two of these character sequences separated by a "-" should not have removed the last FE0F ?

@Joshix-1
Copy link

Joshix-1 commented Jul 17, 2022

🏳️ would break too as it is saved as 1F3F3-FE0F https://raw.githubusercontent.com/hfg-gmuend/openmoji/master/color/72x72/1F3F3-FE0F.png

If it doesn't really matter if it is a real emoji, wouldn't it be better to do it like GitHub and always remove the -FE0F?

If it gets only removed for the short sequences the following code should work for all emojis except the white flag:

emoji_code = "-".join(f"{ord(c):x}" for c in emoji).upper()
if len(emoji) == 2:
    emoji_code = emoji_code.removesuffix("-FE0F")
let emojiCode = [...emoji].map(e => e.codePointAt(0).toString(16)).join(`-`).toUpperCase();
if (emoji.length === 2) emojiCode = emojiCode.replace("-FE0F", "");

Am I right in the assumption, that all emojis that have more than two of these character sequences separated by a "-" should not have removed the last FE0F ?

I am not sure. I think none should have it removed, but I'm not sure.

@Joshix-1
Copy link

Another issue with the code I just noticed is, that it e.g. doesn't work with https://openmoji.org/library/emoji-0035-FE0F-20E3/
(The leading 0s are missing)
Fix for python:

"-".join(f"{ord(c):04x}" for c in emoji).upper()

Fix for js:

[...emoji].map(e => e.codePointAt(0).toString(16).padStart(4, '0')).join(`-`).toUpperCase()

@github-actions
Copy link

🏝 OpenMoji is on hold over summer (project maintainers are out of office until Oct 2022).

@kolibril13
Copy link
Contributor Author

🏳️ would break too as it is saved as 1F3F3-FE0F https://raw.githubusercontent.com/hfg-gmuend/openmoji/master/color/72x72/1F3F3-FE0F.png

For me, it does not break, I could also find this one: https://raw.githubusercontent.com/hfg-gmuend/openmoji/master/color/72x72/1F3F3.png

Another issue with the code I just noticed is, that it e.g. doesn't work with https://openmoji.org/library/emoji-0035-FE0F-20E3/
(The leading 0s are missing)

Thanks for noting this!
I've just written a new python script, that should now handle all cases properly, @Joshix-1, do you want to test this?

from PIL import Image
import requests

def get_emoji(emoji):
    emoji_code = "-".join(f"{ord(c):04x}" for c in emoji).upper()
    print(emoji_code)
    if len(emoji) == 2:
        emoji_code = emoji_code.removesuffix("-FE0F")
    url = f"https://raw.githubusercontent.com/hfg-gmuend/openmoji/master/color/72x72/{emoji_code}.png"
    print(url)
    im = Image.open(requests.get(url, stream=True).raw)
   # image = np.array(im.convert("RGBA")) 
    return im
imgs = []
imgs += [get_emoji("🦴")] # Code: "1F9B4" > all good
imgs += [get_emoji("🐿️")] #  Code: "1F43F-FE0F" > can be found under "1F43F" so "FE0F" has to be removed
imgs += [get_emoji("🏳️")] #  Code: "1F3F3-FE0F" > can be found eighter under "1F3F3" or under "1F3F3-FE0F"  so  "FE0F" can be removed.
imgs += [get_emoji("5️⃣") ] # Problem with missing zero?  > solved with 04x
imgs += [get_emoji("👩‍⚕️")] #  Code: "1F469-200D-2695-FE0F" > Here, FE0F does not have to be removed.

#only for debugging:
import matplotlib.pyplot as plt

plt.figure(figsize=(20,10))
columns = 5
for i, image in enumerate(imgs):
    plt.subplot(int(len(imgs) / columns + 1), columns, i + 1)
    plt.imshow(image)

image

@github-actions
Copy link

🏝 OpenMoji is on hold over summer (project maintainers are out of office until Oct 2022).

@Joshix-1
Copy link

For me, it does not break, I could also find this one: https://raw.githubusercontent.com/hfg-gmuend/openmoji/master/color/72x72/1F3F3.png

Yes, true (I just looked for files ending with -FE0F and didn't check if it is there without)
But it's weird that the flag has two representations and all the others don't.

I've just written a new python script, that should now handle all cases properly

Yes that looks better. I just don't like the += with the list. I think something like the following would be better

imgs = [
    get_emoji("🦴"),  # Code: "1F9B4" > all good
    get_emoji("🐿️"),  # Code: "1F43F-FE0F" > can be found under "1F43F" so "FE0F" has to be removed
    get_emoji("🏳️"),  # Code: "1F3F3-FE0F" > can be found eighter under "1F3F3" or under "1F3F3-FE0F"  so  "FE0F" can be removed.
    get_emoji("5️⃣"),  # Problem with missing zero?  > solved with 04x
    get_emoji("👩‍⚕️"),  # Code: "1F469-200D-2695-FE0F" > Here, FE0F does not have to be removed.
]

FAQ.md Outdated Show resolved Hide resolved
@hfg-gmuend hfg-gmuend deleted a comment from github-actions bot Nov 4, 2022
@b-g
Copy link
Member

b-g commented Nov 4, 2022

Hi @kolibril13, Hope you've had a nice summer! Sorry for the ultra late reply! It this PR ready to merge? :)

@kolibril13 kolibril13 marked this pull request as ready for review November 5, 2022 14:12
@kolibril13
Copy link
Contributor Author

Hi @b-g,
the summer was great, hope for you as well :)
I've just fixed the -FE0F issue in the javascript implementation, so the pr is ready to merge!

@kolibril13
Copy link
Contributor Author

Hi @b-g,
the summer was great, hope for you as well :)
I've just fixed the -FE0F issue in the JavaScript implementation, so the pr is ready to merge!

@b-g b-g merged commit fd2098e into hfg-gmuend:master Nov 5, 2022
@b-g
Copy link
Member

b-g commented Nov 5, 2022

Hi @kolibril13, Great! Many thanks! + Merged.

@kolibril13 kolibril13 deleted the patch-1 branch November 5, 2022 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants