Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposing the composite type: arrays #211

Open
hinshun opened this issue Nov 22, 2020 · 5 comments
Open

Proposing the composite type: arrays #211

hinshun opened this issue Nov 22, 2020 · 5 comments
Labels
design Design for a feature

Comments

@hinshun
Copy link
Contributor

hinshun commented Nov 22, 2020

Currently, the types supported are string, int, bool, fs, option, and pipeline.

This proposal introduces composite types, allowing for []string, []int, []bool, []fs, []option, and []pipeline, as well as for loops.

Proposal

  1. Statements in array blocks append to return register instead of modify.
  2. Arrays are splatted automatically.

For arrays to work well in practice, we want to ensure we have data flow analysis to return compile errors for dead code, so that will be a prerequisite.

string regions() {
	"us-east-1"
	^^^^^^^^^^^
	dead code
	"us-west-2"
}

Here is a function returning an array. Instead of overriding the value of the return register, they are appended similar to declaring the elements of an array. (1. Statements in array blocks append to return register instead of modify)

[]string regions() {
	"us-east-1"
	"us-west-2"
}

You can invoke functions that also return the same type array and they are splatted automatically. (2. Arrays are splatted automatically)

[]string westRegions() {
	"us-west-1"
	"us-west-2"
}

[]string allRegions() {
	"us-east-1"
	westRegions
	^^^^^^^^^^^
	splatted
}

This is useful when combined with for loop, here's the proposed structure.

for (<type> <ident> in <expr>) {
    # statements affect register as usual
}

For example, when you need to publish an image to multiple registries:

fs publish() {
	build
	for (string region in regions) {
		dockerPush "registry-${region}.com/my-image"
	}
}

Since the for loop expects an expression, you can also provide a block literal as an expression.

fs publish() {
	build
	for (string region in []string {
        	"us-east-1"
		westRegions
	}) {
		dockerPush "registry-${region}.com/my-image"
	}
}

Note you can still do this inline, but follows regular block syntax (i.e. separate statements by ;).

fs publish() {
	build
	for (string region in []string { "us-east-1"; "us-west-2" }) {
	                                            ^
	                                            odd compared to other languages
	                                            but consistent within HLB
		dockerPush "registry-${region}.com/my-image"
	}
}

Elements or slices of an array can be accessed through array indexing. For example, if we need to bind the digest of one of the registries we pushed to.

fs registryPush(fs ref) {
	for (string region in regions[1:]) {
		dockerPush "registry-${region}.com/${ref}"
	}
	dockerPush "registry-${region[0]}.com/${ref}" as registryDigest
}

You can also pass arrays to functions that expect variadic arguments because they are splatted automatically.

# Builtin `stage` function that runs pipelines in parallel.
pipeline stage(variadic pipeline pipelines)

[]fs tests() {
	lint
	unitTest
}

pipeline testAll() {
	stage tests
}

This proposal also addresses the special handling of option blocks, where they behaved like arrays when arrays didn't exist. By changing with option -> with []option and function declarations for option -> []option in the linter.

fs npmInstall() {
	image "node:alpine"
	run "npm install" with []option {
	                       ^^^^^^^^
	                       not special anymore
	                       behaves as an array type
		dir "/in"
		mount manifest "/in"
		mount scratch "/in/node_modules" as nodeModules
	}
}

Suppose we have a builitn localWalk, we can combine this with for loop to produce a list of options for mounting decrypted secrets.

# Return filenames walking the directory at `localPath`.
[]string localWalk(string localPath)

[]option::run mountSecrets() {
	for (string filename in localWalk("./decrypted")) {
        	secret "./decrypted/${filename}" "/secrets/${filename}"
	}
}
@hinshun hinshun added the design Design for a feature label Nov 22, 2020
@coryb
Copy link
Contributor

coryb commented Nov 23, 2020

interesting ideas, the option vs []option seems fine

I do have a few clarifying questions and ideas, not sure they are terribly practical...

1) fs/pipeline array usage

so fs and pipeline are special and still operating on an implicit stack, but none of the other do?

can we do this? What does this return?

[]fs f2() {
  image "image"
  run "echo hi"
}

Similar for pipelines, what does this return:

[]pipeline p1() {
  stage fsA fsB
  stage fsC
}

2) array generation with for

Is this useful to reformat a list of strings? What is the return from this function? What if the return type was string instead of []string?

[]string hello(variadic string names) {
  for(string name in names) {
    format "hello %s" name
  }
}

3) how do we yield/produce a single value from a list?

ie can we sum up list of ints?

int sum(variadic int nums) {
  for(int num in nums) {
     # we can't assign, so not sure how to sum
  }
}

Maybe we just add a bunch of builtin's to sum, multiply, concat, etc?

4) array operators

Possible builtin operators we might need:

  • len not sure this is useful except maybe for slices array[1:len(array)-3] but I actually prefer allowing negative indexing, so the last index would be array[-1] instead of array[len(array) - 1]
  • reverse since we don't have indexable looping like for i := len(array); i >= 0; i-- { ... } we will likely want some sorting functions.
  • shift pop push it would be awkward, but we could use this for producing results, something like:
int sum(variadic int nums) {
 # last element in list becomes our register
  push nums 0
  # iterate over all but last element (skip our register)
  for(int num in nums[0:-2]) {
   # pop off our register, add it to num, then push register back on end
   # this also uses the non-existant `+` builtin
    push nums ((pop nums) + num)
  }
  # return our register
  pop nums
}

Not advocating this approach, but we might need some array manipulation builtins?

  • sort filter not sure how we change the ordering (other than reverse) without conditionals though.

5) compound arrays?

Can we do arrays of arrays? Compound type?

[]{string,fs} # sort of a map of fs?
[][]int # matrix of ints?

@hinshun
Copy link
Contributor Author

hinshun commented Nov 23, 2020

1) fs/pipeline array usage

so fs and pipeline are special and still operating on an implicit stack, but none of the other do?

can we do this? What does this return?

[]fs f2() {
  image "image"
  run "echo hi"
}

That's valid and equivalent of running this target:

pipeline f2() {
    stage fs {
        image "image"
    } fs {
        run "echo hi"
    }
}

I'm not sure how to combat this, one thought is to make use of the ref.StatFile("echo") to do some compile time checking.

Similar for pipelines, what does this return:

[]pipeline p1() {
  stage fsA fsB
  stage fsC
}

Good point, I think that's actually the correct way of writing pipelines and that my example is wrong. I'm not sure a single element pipeline is that useful.

2) array generation with for

Is this useful to reformat a list of strings? What is the return from this function? What if the return type was string instead of []string?

[]string hello(variadic string names) {
  for(string name in names) {
    format "hello %s" name
  }
}

So for this proposal, the statements in the body of a for loop behaves the same as its outer block. So in a function with an array return type, those statements are appending so you can think of it as a map function.

If the return type was string instead, then all elements except the last one is dead code and hopefully would return an error.

3) how do we yield/produce a single value from a list?

ie can we sum up list of ints?

int sum(variadic int nums) {
  for(int num in nums) {
     # we can't assign, so not sure how to sum
  }
}

Maybe we just add a bunch of builtin's to sum, multiply, concat, etc?

I didn't mention it in this proposal, but we could possible expose the return register as this or some keyword, that way you can utilize the for loop for reduce as well. Just an idea.

4) array operators

Possible builtin operators we might need:

  • len not sure this is useful except maybe for slices array[1:len(array)-3] but I actually prefer allowing negative indexing, so the last index would be array[-1] instead of array[len(array) - 1]
  • reverse since we don't have indexable looping like for i := len(array); i >= 0; i-- { ... } we will likely want some sorting functions.
  • shift pop push it would be awkward, but we could use this for producing results, something like:
int sum(variadic int nums) {
 # last element in list becomes our register
  push nums 0
  # iterate over all but last element (skip our register)
  for(int num in nums[0:-2]) {
   # pop off our register, add it to num, then push register back on end
   # this also uses the non-existant `+` builtin
    push nums ((pop nums) + num)
  }
  # return our register
  pop nums
}

Not advocating this approach, but we might need some array manipulation builtins?

  • sort filter not sure how we change the ordering (other than reverse) without conditionals though.

Yes I was thinking of len but I've been noodling some thoughts on generics. Something like:

int len<T>([]T slice)

But I don't have anything concrete atm. Another thought is that we should decouple real builtins like with, as, from stdlib stuff like mkfile, etc. Languages like golang don't have proper signatures for like append, so we could get away with len without generics.

I'm also not sure how far we should support functionality in this way, there's certainly better tools for this than a build language. I was hoping for loops was more of a macro than anything.

5) compound arrays?

Can we do arrays of arrays? Compound type?

[]{string,fs} # sort of a map of fs?
[][]int # matrix of ints?

Yeah I've been thinking about user-defined types of structs and multi-dimensional arrays and intentionally left them out... I'm not sure, probably not allowed in this current proposal.

@coryb
Copy link
Contributor

coryb commented Nov 23, 2020

[]fs f2() {
  image "image"
  run "echo hi"
}

That's valid and equivalent of running this target:

pipeline f2() {
    stage fs {
        image "image"
    } fs {
        run "echo hi"
    }
}

This is pretty confusing to me. I was originally thinking something more like this:

pipeline p2() {
    stage fs {
        image "image"
    } fs {
        image "image"
        run "echo hi"
    }
}
# or maybe this:
pipeline p3() {
    stage fs {
        image "image"
        run "echo hi"
    }
}

The run statement dangling by itself is confusing. As I write hlb I think about compound statements like that as logically being one llb statement (ie llb.Image("image").Run("echo hi").Root()) so I would guess that every time we reset the "root" fs it might produce a new output? So:

[] fs3() {
  image "image1"
  run "echo hi"
  image "image2"
  run "echo hi"
}

or more used like map:

[]fs fs4() {
  for(string name in []string{
    "image1"
    "image2"
  }) {
    image name
    run "echo hi"
  }
}

Maybe these would be equivalent to:

pipeline p4() {
    stage fs {
        image "image1"
        run "echo hi"
    } fs {
        image "image2"
        run "echo hi"
    }
}

So for this proposal, the statements in the body of a for loop behaves the same as its outer block. So in a function with an array return type, those statements are appending so you can think of it as a map function.

Sounds good to me.

Yes I was thinking of len but I've been noodling some thoughts on generics. Something like:

int len<T>([]T slice)

But I don't have anything concrete atm. Another thought is that we should decouple real builtins like with, as, from stdlib stuff like mkfile, etc. Languages like golang don't have proper signatures for like append, so we could get away with len without generics.

Yeah, I would personally skip len for now and just allow negative indexing (indexing from tail). len by itself will also need all the arithmetic operators working (ie arr[0:len(arr) - 2 + 1])

@copperlight
Copy link

Tangentially, have you considered adding a void return type?

In the managed artifacts workshop, there is more than one case of the following:

# @return a leftover filesystem. do not use.

@aaronlehmann
Copy link
Contributor

Statements in array blocks append to return register instead of modify.

I may be way off here, but it feels like the arrays tend to be used as stacks. Is there any reason not to make them first-class stacks instead of arrays that typically are used as stacks? Do you think we will get value out of random access, slicing, etc?

That's not to say that there's anything wrong with providing functionality for random access and slicing, but there's a certain beauty in stack-based languages like FORTH, so I think it's worth considering making the primitive a stack.

So for this proposal, the statements in the body of a for loop behaves the same as its outer block. So in a function with an array return type, those statements are appending so you can think of it as a map function.

If the return type was string instead, then all elements except the last one is dead code and hopefully would return an error.

Making the behavior so contextual feels like it would be confusing (to me, at least). What about having separate map and for statements that make this append vs. replace difference explicit?

I'm accustomed to thinking of for as an imperative construct, so having it act like map in certain cases feels a little strange to me.

Can we do arrays of arrays? Compound type?

My instinct is to design the syntax in a way that would eventually allow multidimensional arrays if we decide it's worth it, but start by only supporting one-dimensional arrays to keep things simple and get a better feel for how people use the feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design for a feature
Projects
None yet
Development

No branches or pull requests

4 participants