Skip to content
This repository has been archived by the owner on May 30, 2019. It is now read-only.

"Lago", italian for lake, is considered two words, "L" and "ago" (abbr. of august) and interpreted as a date #206

Open
ibobo opened this issue Jan 16, 2017 · 3 comments

Comments

@ibobo
Copy link
Contributor

ibobo commented Jan 16, 2017

I found that duckling understands text in "CamelCase" and "UPPERCASElowercase" fashion, and this is good, but poses a problem when a valid word almost exactly contains an abbreviations. This is the case for some italian words, if entered in "title case", but I bet this can happen for other languages.

Some examples:

  • "Lago" -> "ago" is short for august; this breaks many locations, like "Lago di Como", "Lago di Garda" and the like...
  • "CaprI" -> "apr" is short for april; it's a "strange" casing but can happen

My proposal is to avoid breaking words at the "case barrier" if the whole text contains spaces or if the only uppercase words are a single character at the beginning of words (this is useful for texts formed by a single word).

This would break "SOMETHINGlike this" and "Atext" like this but would solve some more nasty problems.

@tedicela
Copy link

tedicela commented Jan 17, 2017

A similar problem is for "Vorrei fare UNA prenotazione per domani"(I'm not translating in english as it needs an italian to do this job or someone who speaks italian)

UNA -> it recognize this like it is 1 o'clock (as a datetime)

Take a look at this commit maybe you can fix it:
bb8444c

@ibobo
Copy link
Contributor Author

ibobo commented Jan 17, 2017

Yes, that commit should fix that, a latent time should not show up "alone" as a winning result (from what I could understand). That is a problem we're facing also, I hope my pull request #203 gets merged soon.

@ibobo
Copy link
Contributor Author

ibobo commented Jan 17, 2017

Btw, that problem is not related to the one in this ticket

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants