Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emoji support #571

Open
HakanKaraoglu opened this issue Sep 24, 2021 · 15 comments
Open

Emoji support #571

HakanKaraoglu opened this issue Sep 24, 2021 · 15 comments

Comments

@HakanKaraoglu
Copy link

HakanKaraoglu commented Sep 24, 2021

I am trying to index the following twitter data in two different ways. The first option is a simple index option with the SolrNet library. When I do this the emojis don't show up properly in solr.

Tweet: "aşı oluyorum 😀😁😍😆😟😱😼😻🙌😀🤲😹🧠🤤 test yenisi"

image

           var solrDocument = new SolrDocument();

            solrDocument.Id = "440550419";
            solrDocument.Title = "aşı oluyorum 😀😁😍😆😟😱😼😻🙌😀🤲😹🧠🤤 test yenisi";
            solrDocument.Body = "aşı oluyorum 😀😁😍😆😟😱😼😻🙌😀🤲😹🧠🤤 test yenisi";
            solrDocument.WorkspaceId = 114;

            var url = $"http://localhost:8983/solr/2021";

            var connection = new SolrConnection(url);

            Startup.Init<SolrDocument>(connection);
            var solr = ServiceLocator.Current.GetInstance<ISolrOperations<SolrDocument>>();

            solr.Add(solrDocument);

however, I noticed that when I send the json with an httpclient, which is the other option, it looks fine in solr.

 var solrDocument = new SolrDocument();

            solrDocument.Id = "440550419";
            solrDocument.Title = "aşı oluyorum 😀😁😍😆😟😱😼😻🙌😀🤲😹🧠🤤 test yenisi";
            solrDocument.Body = "aşı oluyorum 😀😁😍😆😟😱😼😻🙌😀🤲😹🧠🤤 test yenisi";
            solrDocument.WorkspaceId = 114;

var list = new List<SolrDocument>();
            list.Add(solrDocument);

            var json = JsonConvert.SerializeObject(list);

            var _client = new HttpClient();
            var response = await _client.PostAsync("http://localhost:8983/solr/2021/update?commitWithin=1000", new StringContent(json, Encoding.UTF8, MediaTypeNames.Application.Json));

image

How can I fix this problem? So somehow the solrnet library can corrupt these emojis?

Solr version: 7.1.0
SolrNet version: 1.0.19

@HakanKaraoglu
Copy link
Author

@mausch Do you have any guidance on this?

@mausch
Copy link
Member

mausch commented Sep 29, 2021

@HakanKaraoglu
Copy link
Author

I just added a test that passed. Before that I've had similar tests in the codebase for over 10 years.

Yes, the test passes, but it does not index properly on the solr side, this is what I want to talk about. Did you test like this? I tested it and the result is as follows. Since we collect content from social media, emojis cause problems for us, so I wanted to open this topic.

const string name = "aşı 😀😁😍😆😟😱😼😻🙌😀🤲😹🧠🤤";
          await solr.AddAsync(new SolrDocument
          {
              Id = "440550422",
              Title = name
          });

          await solr.CommitAsync();

image

@HakanKaraoglu
Copy link
Author

@mausch
Can you test what I wrote? You will see that no emojis are displayed in Solr. This code may pass the test, but it's okay if it doesn't show emojis?

@hoerup
Copy link
Contributor

hoerup commented Oct 4, 2021

HakanKaraoglu just to make sure:
You do indeed have successfully added it directly to solr and got it back and thus verified that it is handled correctly in solr itself ?

@HakanKaraoglu
Copy link
Author

The code I wrote for the test is as above. If you see a mistake, I can correct it and try again.

@hoerup
Copy link
Contributor

hoerup commented Oct 4, 2021

My point is: In a setup with a client-library(SolrNet) and a server(solr) it can seem unfair to blame the client without verifying that the server is doing what it's supposed to do

so have you tried to inject a sample doc with emojis into solr either via solr UI or the cli tool $solrhome/bin/post ?

I haven't tried it but i would also suppose that since your emoji list includes some of the newer emojis, your solr installation would need to be based on at least java11 where unicode 10 was added

@HakanKaraoglu
Copy link
Author

yes, emojis appear when I update to the same document from the Solr UI. Even emojis appear when I throw them with http client.

@hoerup
Copy link
Contributor

hoerup commented Oct 4, 2021

@mausch HakanKaraoglu might have a point
I tried to update the AddAndQueryUnicode test and add his list of emojis - and it fails : https://github.com/hoerup/SolrNet/actions/runs/1304932806

@mausch
Copy link
Member

mausch commented Oct 4, 2021

Yep, I have also confirmed the issue a couple of days ago... very strange!

@HakanKaraoglu
Copy link
Author

@mausch Maybe this problem can be fixed with a new version.
@hoerup Thank you for supporting my question. :)

@HakanKaraoglu
Copy link
Author

@mausch any update ?

@omidontop
Copy link

Any updates on this issue?

@mausch
Copy link
Member

mausch commented Aug 19, 2022

Currently accepting PRs with a test+fix for this.
I don't even know if it's SolrNet at fault here. As I said before I have tested unicode scenarios since 2009 with no issues. It's emojis in particular that cause the issue it seems.

@mausch mausch changed the title Unicode character support Emoji support Aug 19, 2022
@mustafasariel
Copy link

I updated the SolrNet.Core library. 1.0.19->1.1.1, but the problem still persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants