"Lago", italian for lake, is considered two words, "L" and "ago" (abbr. of august) and interpreted as a date #206

ibobo · 2017-01-16T12:34:50Z

I found that duckling understands text in "CamelCase" and "UPPERCASElowercase" fashion, and this is good, but poses a problem when a valid word almost exactly contains an abbreviations. This is the case for some italian words, if entered in "title case", but I bet this can happen for other languages.

Some examples:

"Lago" -> "ago" is short for august; this breaks many locations, like "Lago di Como", "Lago di Garda" and the like...
"CaprI" -> "apr" is short for april; it's a "strange" casing but can happen

My proposal is to avoid breaking words at the "case barrier" if the whole text contains spaces or if the only uppercase words are a single character at the beginning of words (this is useful for texts formed by a single word).

This would break "SOMETHINGlike this" and "Atext" like this but would solve some more nasty problems.

The text was updated successfully, but these errors were encountered:

tedicela · 2017-01-17T14:04:24Z

A similar problem is for "Vorrei fare UNA prenotazione per domani"(I'm not translating in english as it needs an italian to do this job or someone who speaks italian)

UNA -> it recognize this like it is 1 o'clock (as a datetime)

Take a look at this commit maybe you can fix it:
bb8444c

ibobo · 2017-01-17T15:40:13Z

Yes, that commit should fix that, a latent time should not show up "alone" as a winning result (from what I could understand). That is a problem we're facing also, I hope my pull request #203 gets merged soon.

ibobo · 2017-01-17T15:42:29Z

Btw, that problem is not related to the one in this ticket

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Lago", italian for lake, is considered two words, "L" and "ago" (abbr. of august) and interpreted as a date #206

"Lago", italian for lake, is considered two words, "L" and "ago" (abbr. of august) and interpreted as a date #206

ibobo commented Jan 16, 2017

tedicela commented Jan 17, 2017 •

edited

ibobo commented Jan 17, 2017

ibobo commented Jan 17, 2017

"Lago", italian for lake, is considered two words, "L" and "ago" (abbr. of august) and interpreted as a date #206

"Lago", italian for lake, is considered two words, "L" and "ago" (abbr. of august) and interpreted as a date #206

Comments

ibobo commented Jan 16, 2017

tedicela commented Jan 17, 2017 • edited

ibobo commented Jan 17, 2017

ibobo commented Jan 17, 2017

tedicela commented Jan 17, 2017 •

edited