[workspace-user] Hypervisor backend - DHCP/ebtables errors; Destroy Errors
tfreeman at mcs.anl.gov
Tue Feb 13 22:13:42 CST 2007
On Tue, 13 Feb 2007 16:42:11 -0800
Duncan Penfold-Brown <dpb at uvic.ca> wrote:
> Hi again -
> After having found a way to get the workspace-service functioning
> properly on my services node, I am running into some problems with the
> workspace-control program backend on my hypervisor node. The problems
> arise when the workspace-control program begins trying to insert DHCP
> and ebtables rules into the dhcpd.conf file and ebtables service. I have
> had a bit of trouble with the dhcp server failing to stop or start
> correctly, but this is the error I usually receive (from the create
> logfile on the hypervisor node):
> Chain DHCP-vif0.0 already exists.
> Chain workspace-5-1 already exists.
> ERROR: could not create workspace-5-1 chain
> ERROR: ebtables addition failed
> I am wondering if the potential bug of 'not rolling back the dhcp and
> ebtables information if a workspace deploy fails' is causing this error.
> If so, how might I go about avoiding it?
I have noticed this myself while coding some other work and have already put a
fix in (indeed, by putting a rollback in when something else fails after the
DHCP/ebtables policies have been inserted).
Apologies, during the original coding of the feature I didn't test every error
situation (I tested with a working VM/metadata and there was never an error
after the DHCP addition which is just towards the end of the create process).
> running: '/usr/bin/sudo /usr/sbin/xm destroy workspace-5' ::: returns256
> ::: output: Error: an integer is required
(by the way, this "integer is required" message is just the error message when
the VM isn't actually running -- if it were running, destroy by name will work)
> 2007-02-13 16:15:06,016 - WARNING - workspace.xen_v2 (902) - cannot
> remove dhcp/ebtables rules without create params
> It looks like because the workspace deploy command never finished, the
> file /opt/workspace/persistence/workspace-N does not exist,
> and so the workspace cannot be correctly destroyed
Yes. It wouldn't matter if it were not for the DHCP/ebtables rules (which is
why the fix in place is to rollback those additions when there is a problem in
the brief period between DHCP/ebtables add and the final creation).
>, and proper clean-up cannot
> occur. It also ties up the networking association that was used to
> instantiate the workspace.
It's an annoying bug.
> The destroy command does not finish until it reaches a java timeout.
This part is surprising, the destroy should just exit out with an error return
code. I would like to investigate this, I will try to recreate a failure
situation at the right time (without the rollback fix in place) to see if it
hangs for me (and if it does, see how this could possibly be hanging).
Besides the actual code fix, you can do a rollback manually to get that IP
address back (anything manual is very BAD of course but this should work):
sudo /opt/workspace/bin/dhcp-config.sh rem workspace-5-1 192.168.17.10
More information about the workspace-user