Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Windows path (\ instead of /) #108

Open
tdurieux opened this issue Jul 11, 2022 · 7 comments
Open

Support for Windows path (\ instead of /) #108

tdurieux opened this issue Jul 11, 2022 · 7 comments

Comments

@tdurieux
Copy link

Hello,

I have an issue with windows paths such as VOLUME C:\\data\\db C:\\data\\configdb
In this case, dockerfile-ast automatically removes the \ from the argument but I want to keep the \ in order to have a valid path. Is there a reason why the \\ are removed from the arguments?
I saw that the escaped character is ignored here: https://github.com/rcjsuen/dockerfile-ast/blob/master/src/instruction.ts#L296 is it intentional?

@tdurieux tdurieux changed the title Support for windows path Support for Windows path (\ instead of /) Jul 11, 2022
@rcjsuen
Copy link
Owner

rcjsuen commented Jul 13, 2022

Is there a reason why the \\ are removed from the arguments?

@tdurieux Hi, thank you for reporting this. From the top of my head, I think the reasoning there was that if you had an EXPOSE \1234 statement, the port that actually gets exposed is 1234. I know this is how it is on Linux containers. Do you know what happens when this is done in a Windows container?

@tdurieux
Copy link
Author

Good question, I did not see people using \ in a random location. I only saw it before a new line. I will investigate the \ usage in dockerfiles.

ps: I am a researcher and I am analyzing a lot of dockerfiles and your lib is part of my stack, thanks a lot!
I have thousands of dockerfiles, I could investigate how \\ are used. I will come back to you once I finished my analysis.

@tdurieux
Copy link
Author

From my quick investigation inside 178507 dockerfiles, what I saw is that \ before a letter is almost all the time inside a windows path or inside a regex.
I think that it makes sense to keep the \ when the next char is a letter. I think it is not perfect because I feel that the \ should be kept when there is a regex but I don't know how to detect that consistently.

Below, you will find the number of occurrences of \\<char> inside the 178507 dockerfiles that I have:

{
  "\\\n": 897407,
  "\\n": 31850,
  "\\\"": 19864,
  "\\ ": 47747,
  "\\(": 3352,
  "\\)": 3371,
  "\\1": 2353,
  "\\$": 4344,
  "\\H": 66,
  "\\m": 554,
  "\\-": 1643,
  "\\.": 3669,
  "\\d": 1116,
  "WINDOWS_PATH": 4974, // according to this regex /(?<ParentPath>(?:[a-zA-Z]\:|\\\\[\w\s\.]+\\[\w\s\.$]+)\\(?:[\w\s\.]+\\)*)(?<BaseName>[\w\s\.]*?)/gim
  "\\/": 7606,
  "\\;": 1067,
  "\\F": 258,
  "\\C": 477,
  "\\S": 1232,
  "\\s": 4284,
  "\\c": 548,
  "\\P": 701,
  "\\o": 261,
  "\\b": 1005,
  "\\t": 1298,
  "\\*": 496,
  "\\&": 299,
  "\\h": 137,
  "\\l": 135,
  "\\U": 73,
  "\\{": 174,
  "\\3": 14,
  "\\2": 376,
  "\\i": 355,
  "\\j": 115,
  "\\p": 392,
  "\\K": 28,
  "\\|": 276,
  "\\}": 123,
  "\\\t": 170,
  "\\W": 511,
  "\\M": 846,
  "\\v": 470,
  "\\T": 195,
  "\\B": 191,
  "\\N": 185,
  "\\z": 87,
  "\\J": 73,
  "\\O": 41,
  "\\A": 132,
  "\\L": 64,
  "\\g": 1449,
  "\\u": 275,
  "\\0": 348,
  "\\r": 607,
  "\\D": 217,
  "\\'": 397,
  "\\E": 87,
  "\\w": 471,
  "\\I": 118,
  "\\`": 96,
  "\\[": 738,
  "\\]": 723,
  "\\k": 15,
  "\\+": 352,
  "\\#": 168,
  "\\a": 166,
  "\\\r": 531,
  "\\e": 317,
  "\\?": 111,
  "\\=": 174,
  "\\_": 394,
  "\\f": 25,
  "\\V": 149,
  "\\<": 46,
  "\\:": 111,
  "\\R": 318,
  "\\^": 12,
  "\\>": 76,
  "\\G": 37,
  "\\x": 240,
  "\\@": 32,
  "\\q": 41,
  "\\~": 3,
  "\\Q": 26,
  "\\%": 106,
  "\\!": 50,
  "\\7": 30,
  "\\4": 6,
  "\\y": 7,
  "\\Z": 1,
  "\\”": 2,
  "\\,": 3,
  "\\5": 3,
  "\\9": 1
}

If windows paths are ignored, you can see that the number of matches with \\<letter> drastically decreased

{
  "\\\n": 896348,
  "\\n": 30972,
  "\\\"": 19585,
  "\\ ": 47518,
  "\\(": 3282,
  "\\)": 3300,
  "\\1": 2266,
  "\\$": 3940,
  "\\H": 6,
  "\\m": 212,
  "\\-": 1643,
  "\\.": 3521,
  "\\d": 319,
  "WINDOWS_PATH": 4974,
  "\\;": 968,
  "\\s": 3413,
  "\\c": 221,
  "\\/": 6075,
  "\\b": 256,
  "\\&": 299,
  "\\h": 71,
  "\\l": 49,
  "\\U": 20,
  "\\{": 158,
  "\\3": 14,
  "\\2": 345,
  "\\i": 44,
  "\\j": 39,
  "\\K": 28,
  "\\*": 390,
  "\\t": 767,
  "\\|": 268,
  "\\}": 118,
  "\\\t": 170,
  "\\p": 180,
  "\\W": 247,
  "\\M": 519,
  "\\F": 152,
  "\\v": 228,
  "\\z": 80,
  "\\J": 44,
  "\\O": 21,
  "\\P": 156,
  "\\0": 294,
  "\\r": 359,
  "\\S": 585,
  "\\w": 120,
  "\\I": 81,
  "\\`": 96,
  "\\C": 168,
  "\\+": 352,
  "\\u": 100,
  "\\#": 166,
  "\\L": 15,
  "\\\r": 384,
  "\\e": 217,
  "\\?": 97,
  "\\=": 174,
  "\\'": 274,
  "\\[": 647,
  "\\]": 631,
  "\\<": 45,
  "\\:": 111,
  "\\_": 347,
  "\\R": 56,
  "\\^": 12,
  "\\o": 19,
  "\\V": 45,
  "\\A": 52,
  "\\g": 47,
  "\\@": 30,
  "\\D": 32,
  "\\E": 50,
  "\\q": 31,
  "\\~": 3,
  "\\>": 50,
  "\\a": 48,
  "\\%": 80,
  "\\x": 222,
  "\\N": 112,
  "\\B": 99,
  "\\T": 69,
  "\\!": 49,
  "\\G": 7,
  "\\f": 11,
  "\\k": 2,
  "\\7": 7,
  "\\4": 3,
  "\\Q": 6,
  "\\Z": 1,
  "\\”": 2,
  "\\,": 3,
  "\\5": 1
}

@rcjsuen
Copy link
Owner

rcjsuen commented Jul 27, 2022

@tdurieux Thank you for your information and apologies for the delay in responding.

I am hesitant to change the API. Would it be sufficient to provide you an API to just get the raw textual content of the arguments as a single string? Thus, given VOLUME C:\\data\\db C:\\data\\configdb there would be some API function to return C:\\data\\db C:\\data\\configdb. That should help unblock your work hopefully?

@tdurieux
Copy link
Author

tdurieux commented Jul 27, 2022

Hi @rcjsuen, I do think that you already provide this API 'getRawArguments'. I think my workaround would be to change the escape character for the commands that have paths as parameters.

Did you see my PR?

@rcjsuen
Copy link
Owner

rcjsuen commented Jul 27, 2022

I do think that you already provide this API 'getRawArguments'.

You're right. I do have getRawArgumentsContent() here...

Did you see my PR?

I see #107 but I thought it was only related to JSON. Am I missing something?

@tdurieux
Copy link
Author

#107 is indeed unrelated to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants