Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openstack: support for anti-affinity server groups. #577

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

flashvoid
Copy link
Contributor

Hello,

We made this feature for one of our clients and interested if it would be welcomed upstream.

The goal is to provide better results for cpu hungry applications on openstack clouds with cpu overcommit.

vm = self.nova_client.servers.create(node_name, image_id, flavor, **vm_start_args)
self._wait_for_status(vm, ["ACTIVE", "ERROR"], 10)

if vm.status == 'ERROR' and vm.fault['message'] == out_of_capacity:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the out_of_capacity message gets localized and we miss it? Is there an error code, exception class, or other more invariant means of checking what the issue is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point, unfortunately, there doesn't seem to be any way to assert the reason of an error other than the error message. The call is async, there is no distinct error code and error itself is similar to one when an entire cloud runs out of resources.

Maybe it makes sense to drop second conditional and just leave as it is. The logic would be to stop growing the group after first instance that fails to start, regardless of a reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just made an extra test, thats what openstack tells when scheduler fails to schedule an instance due to anti-affinity.

ipdb> pp vm.fault
{u'code': 500,
 u'created': u'2018-08-08T04:22:56Z',
 u'message': u'No valid host was found. There are not enough hosts available.'}

# due to some `nova_client.servers.create()` implementation weirdness,
# the first three args need to be spelt out explicitly and cannot be
# conflated into `**vm_start_args`
vm = self.nova_client.servers.create(node_name, image_id, flavor, **vm_start_args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch of the if misses the with ...__node_start_lock

Copy link
Contributor Author

@flashvoid flashvoid Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that was on purpose.
Cluster.start() allows two modes of node creation, sync and async. Async isn't compatible with aaf groups because we need feedback from the previous instance before making a decision about next.

So effectively, anti_affinity_group_prefix implies -p 1, and this line represents backward compatibility for people who want async and do not want to use aaf groups.

P.S. I would, however, make a point that async in case of openstack doesn't save much time, create_instance API is async by nature so -p 1 wouldn't make much difference.


def _get_next_anti_affinity_group(self):
group_name=self._make_next_group_name(self._get_current_anti_affinity_group().name)
return self.nova_client.server_groups.create(name=group_name, policies='anti-affinity')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work for any OpenStack user? Or does it require admin access?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No admin privileges needed in our cloud and as far as I can tell it is a default behavior to allow regular users to use this feature.

if groups:
group = sorted(groups, key=lambda g: g.name)[-1]
else:
group = self.nova_client.server_groups.create(name=self.anti_affinity_group_prefix + self.middle_token + "1", policies='anti-affinity')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "1" really a constant here? If yes, why not make it part of self.middle_token?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a constant and it isn't a part of self.middle_token because former is used in self._list_anti_affinity_groups() for substring matching.

@@ -189,6 +193,10 @@ def __init__(self, username, password, project_name, auth_url,
self._instances = {}
self._cached_instances = {}

self.anti_affinity_group_prefix = anti_affinity_group_prefix
# constant for anti-affinity routines.
self.middle_token = ".number."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor point: I would prefer if self.middle_token was self._middle_token so it does not look like it's part of the public API / parameters of the class. (I know the ElastiCluster sources are inconsistent on this :-))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will do

@riccardomurri
Copy link
Contributor

Thank you very much for this contibution! It would be fantastic to have it merged upstream; I think it would be extremely useful to whoever is running on mixed-workload oversubscribed clouds.

The code looks good to me; I have one important question: will it work for any OpenStack user or do anti-affinity groups require some kind of admin privileges?

I have noted a few other minor points in the sources; great if you have time to fix (or talk back) yourself, otherwise I can take care of that when merging.

Thanks again for taking the time to contribute!

@riccardomurri riccardomurri added this to the 1.4 milestone Aug 7, 2018
@riccardomurri
Copy link
Contributor

Thank you very much for following up! I'll be back on Friday: then I'll test the code and merge.

@riccardomurri
Copy link
Contributor

So the code looks fine and doesn't break existing functionality :-) I cannot really test the AAF feature, as it is not supported by our local OpenStack; I'll take your word that it works :-)

A couple more questions related to AAF:

  1. at the moment, the AAF prefix is set cloud-wide; does it make sense to set it cluster-wide? (i.e., different clusters on the same cloud use different prefixes) Otherwise, it seems to me that stopping a cluster will could tear down the AAF groups used for other clusters.

  2. I still don't get the prefix + middle token + number thing: I understand that you want to list all AAF groups "belonging" to an ElastiCluster session (hence the need for a unique substring, which at the moment is played by prefix+middle), and that you need to generate multiple AAF groups (hence the addition of the final number). But then why is the middle token constant? if it's constant, it plays no uniqueness role and can be omitted; if it serves the purpose of distinguishing ElastiCluster-generated AAF groups from ones that users may legitimately create on their own, then we should probably chose something more specific than .number. (e.g., _elasticluster_ to indicate origin, or some randomly-generated string to minimize probability of collisions).

@flashvoid
Copy link
Contributor Author

flashvoid commented Aug 19, 2018

@riccardomurri

I cannot really test the AAF feature, as it is not supported by our local OpenStack; I'll take your word that it works :-)

I will be happy to set up an account for you on our public cloud so you could test things. Once we work out the reviews here.

then we should probably choose something more specific than .number.

I agree, will fix the middle token to be _elasticloud_

at the moment, the AAF prefix is set cloud-wide; does it make sense to set it cluster-wide?

Well, I was addressing specific use-case and might have missed the bigger picture (after all that what reviews are for :-) ). What would be the desired way to pass cluster-wide configuration down to the openstack provider? would you recommend passing it with the Node object?

@riccardomurri
Copy link
Contributor

Hello again!

Sorry this had completely fallen off my radar -- I have restructured the code a bit to work with parallel starting of instances, and for using a single configuration item (boolean use_anti_affinity_groups):

https://github.com/gc3-uzh-ch/elasticluster/compare/master...riccardomurri:pr/%23577?expand=1

I think we still need to record the AAF group names somehow; I understand "delete all" will only be performed by OpenStack if all server groups are empty, but I think it would be cleaner if every instance just cleaned up its own server group.

@riccardomurri
Copy link
Contributor

I think it would be cleaner if every instance just cleaned up its own server group.

This is now implemented in my pr/#577 branch, on top of your commits. Would you have time to test it? I do not have access to an OpenStack installation with AAF groups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants