Beyond Virtual Machines and Hypervisors: Overview of Bare Metal Provisioning with OpenStack Cloud

本文主要是介绍Beyond Virtual Machines and Hypervisors: Overview of Bare Metal Provisioning with OpenStack Cloud,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

http://www.mirantis.com/blog/bare-metal-provisioning-with-openstack-cloud/

Many people refer to ‘cloud’ and ‘virtualization’ in the same breath, and from there assume that the cloud is all about managing the virtual machines that run on your hypervisor. CurrentlyOpenStack supports Virtual Machine management through a number of hypervisors, the most widespread being KVM and Xen.

As it turns out, in certain circumstances, using virtualization is not optimal—for example, if there are substantial requirements for performance (e.g., I/O and CPU) that are not compatible with the overhead of virtualization. However, it’s still very convenient to utilize OpenStack features such as instance management, image management, authentication services and so forth for IaaS use cases that require provisioning on bare metal. In addressing these cases we implemented a driver for OpenStack compute, Nova, to support bare-metal provisioning.

Review of the status of bare-metal provisioning in OpenStack

When we undertook our first bare-metal provisioning implementation, there was code implemented byUSC/ISI to support bare-metal provisioning on Tilera hardware. We weren’t going to be targeting Tilera hardware, but the other bits of the bare-metal implementation were pretty useful. NTT Docomo also had code to support a more generic scheme using PXE boot and an IPMI-based power manager, but unfortunately it took some time to open source it, so we had to start development of the generic backend before the NTT Docomo code was open sourced.

A blueprint on bare-metal provisioning can be found on the OpenStack Wiki here:General Bare Metal Provisioning Framework.

Bare-metal provisioning framework architecture

Our driver implements the standard driver interface for the OpenStack hypervisor driver, with the difference that it doesn’t actually talk to any hypervisor. Instead it manages a pool of physical nodes. Each physical node could be used to provision only one “Virtual” (sorry for the pun) Machine (VM) instance. When a new provisioning request arrives, the driver chooses a physical host from a pool to place this VM on and it stays there until destroyed. The operator can add, remove, and modify the physical nodes in the pool.


bare-metal provisioning architecture

 

The main components related to the bare-metal provisioning support are:

  • nova-computewith the bare-metal driver: The bare-metal driver itself consists of several components:
    • The power manager is responsible for operations such as setting boot devices, powering up and down nodes, etc. It’s robust enough to support several management protocol implementations (we developed two, based on IPMIoool and FreeIPMI to support a wider range of hardware).
    • The network manager interacts with the rack switch and is responsible for switching nodes back and forth between the service and projects’ networks (the service network is used to deploy the bare-metal instance via PXE/TFTP). Currently we have an implementation for the Juniper switches. More details on that will be provided in another post devoted to networking support.
    • dnsmasq is a Netboot environment for instance provisioning.
  • nova-baremetal-agent: This is the agent that is supposed to be run onbootstrap-linux (see the next bullet) and executes various provisioning tasks spawned by the bare-metal driver.
  • bootstrap-linux: A tiny Linux image to be booted over the network and perform basic initialization. It is based on theTiny Core Linux and contains a basic set of packages such as Python to runnova-baremetal-agent (which is implemented in Python) and curl to be able to download an image from Glance. Additionally, it contains an init script that downloads nova-baremetal-agent using curl and executes it.
  • nova-baremetal-service: A service that is responsible for orchestration of the provisioning tasks (tasks are applied by nova-baremetal-agent directly to the bare-metal server it is running on).

Let’s see what each component actually does in the course of provisioning a new VM (i.e., when you callnova boot). I won’t focus on the details of this request until it reachesnova-compute and the spawn request has reached our bare-metal driver.

The following diagram illustrates this workflow:


bare-metal provisioning flow

 

  1. The driver chooses a free physical node from the pool.
  2. It’s plugged into the service network (there is a detailed blog post on networking forthcoming, so I will skip that for now).
  3. The driver places a spawn task for the agent, which contains all the necessary information, such as what image to boot from.
  4. The driver issues IPMI commands to enable network boot for a node and power it up.
  5. Bootstrap Linux boots over the network from an image served by dnsmasq.
  6. Bootstrap Linux initialization scripts fetch an agent code from nova-baremetal-service (which provides a REST interface for that).
  7. nova-baremetal-agent polls the nova-baremetal-service REST service for tasks.
  8. nova-baremetal-service sees a task for this node and sends a response with the task, which includes a URL for the image from Glance and the authentication token to be able to fetch it.
  9. nova-baremetal-agent fetches an image from the URL specified in the task and ‘dd’s it to the hard drive and then informsnova-baremetal-service that it’s done with the task.
  10. As soon as nova-baremetal-service is notified about task completion, it informs the driver that it’s time to reboot the node.
  11. The driver sees that the provisioning is almost complete, so it switches network to the project’s network.
  12. It sets booting from the hard drive and reboots the node.
  13. The node is up.

Configuration

A typical configuration for the compute will look like this:

 

 

But before the system becomes useful, we have to register switches and nodes. Information about them is stored in the database. We have created an extension for OpenStack REST API to manage these objects and two CLI clients for it:nova-baremetal-switchmanager and nova-baremetal-nodemanager. Let’s use them to show how to add new switches and nodes.

Switches could be added using a command like this:

You have to specify the IP address of the switch, credentials for the manager user, which switch driver to use, and an optional description.

nova-baremetal-switchmanager also supports other essential commands like list and delete. Once we have at least one switch, we can start adding nodes:

As you can see, it has a few more options: IP address of the node, MAC address of its first network interface (used to identify the node), number of CPUs, amount of RAM in Mb, HDD capacity in Gb, IPMI information, switch ID of the switch it’s connected to, and a name of the port on the switch.

After successful execution of this command, the specified node will be added to the pool. Withnova-baremetal-nodemanager you can also list and remove nodes in the pool with list and delete commands respectively.

Summary

Bare metal has proved to be a useful and stable feature for our customers. It has other specific features, such as networking management and image preparation, that we will cover in upcoming posts.

http://www.mirantis.com/blog/baremetal-provisioning-multi-tenancy-placement-control-isolation/

In a previous post, we introduced the bare-metal uses cases for OpenStack Cloud, using its capabilities. Here, we’re going to talk about how you can apply some of these approaches to a scenario mixing virtualization with isolation of key components.

Isolation requirements are pretty common for OpenStack deployments. And in fact, one can just say: “Without proper resource isolation you can wave goodbye to the public cloud”. OpenStack tries to fulfill this need in a number of ways. This involves (among many other things):

  • GUI & API authentication with Keystone
  • private images in Glance
  • security groups

However, if we go under the hood of OpenStack, we will see a bunch of well known open source components, such as KVM, iptables, bridges, iSCSI shares. How does OpenStack treat these components in terms of security? I could say that it does hardly anything here. It is up to the sysadmin to go to each compute node and harden the underlying components on his own.

At Mirantis, one OpenStack deployment we dealt with had especially heavy security requirements. There was a need for all the systems to comply with several governmental standards involved in processing sensitive data. Still we had to provide multitenancy. To observe the standards we decided that for “sensitive” tenants, isolated compute nodes with a hardened config should be provided.

The component responsible for distribution of the instances across OpenStack cluster is nova-scheduler. Its most sophisticated scheduler type, called FilterScheduler allows to enforce many policies on instance placement based on “filters”. For a given user request to spawn an instance, filters determine a set of compute nodes capable of running it. There are a number of filters already provided with the default nova-scheduler installation (they are listed here). However none of them fully satisfied our requirements, so we decided to implement our own, and called it “PlacementFilter”.

The main goal of the PlacementFilter is to “reserve” a whole compute node only for one tenant’s instances, thus making them isolated from other tenants’ instances on the hardware level. Upon tenant creation it can be specified if it is isolated from others or not (default). For isolated tenants only designated compute-nodes should be used for VM instances provisioning. We define and assign these nodes to specific tenants manually, by creating a number of  host aggregates. In short – host aggregates is a way to group compute-nodes with similar capabilities/purpose. The goal of the PlacementFilter is to pick a proper aggregate (set of compute nodes) for a given tenant. Usual (non-isolated) tenants will be using “shared” compute-nodes for VMs provisioning. In this deployment we were using OpenStack to also provision baremetal nodes. Bare-metal nodes are isolated by their nature so there’s no need to designate them to pool of isolated nodes for isolated tenants. (In fact, this post builds a bit on one of my previous posts about bare-metal provisioning)

Solution architecture

During the initial cloud configuration, all servers dedicated for compute should be split into 3 pools:

  • servers for multi-tenant VMs
  • servers for the single-tenant VMs
  • servers for bare-metal provisioning

Such grouping is required to introduce two types of tenants: “isolated tenant” and “common tenant”. For “isolated tenants” aggregates are used to create dedicated sets of compute nodes. The aggregates are later taken into account in the scheduling phase by the PlacementFilter.

The PlacementFilter has two missions:

  • schedule VM on a compute node dedicated to the specific tenant or on one of default compute nodes if tenant is non-isolated
  • schedule VM on a bare-metal host if a bare-metal instance was requested (no aggregate is required here, as bare-metal instance is isolated from other instances by nature – on the hardware level)

Placement filter passes only bare-metal hosts if a ‘bare_metal’ value was given for ‘compute_type’ parameter in scheduler_hints.

NOTE: We can instruct the scheduler to take into account our provisioning requirements by giving it so-called “hints” (“–hint”  option to “nova” command); e.g., to specify compute node’s CPU architecture: –hint arch=i386. In the above case, the hint for bare-metal will be: nova boot …. –hint compute_type=bare_metal

If a non bare-metal instance is requested – filter searches aggregate for the project this instance belongs to, and passes only hosts from its aggregate. If aggregate for project is not found,  then a host from the default aggregate is chosen.

The following diagram illustrates how the PlacementFilter works for both bare-metal and virtual instances:


(1) A member of project#1 requests an instance on his own isolated set of compute nodes. The instance lands within  his dedicated host aggregate.
(2) A member of project#1 requests a bare-metal instance. This time no aggregate is needed as bare-metal nodes are by nature isolated on the hardware level, so the bare-metal node is taken from the general pool.
(3) Instances of tenants not assigned any host aggregate, land in the default “public” aggregate, where compute nodes can be shared among the tenant instances.

PlacementFilter setup

This is the procedure we follow to implement instance placement control:

这篇关于Beyond Virtual Machines and Hypervisors: Overview of Bare Metal Provisioning with OpenStack Cloud的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/485995

相关文章

VMWare报错“指定的文件不是虚拟磁盘“或“The file specified is not a virtual disk”问题

《VMWare报错“指定的文件不是虚拟磁盘“或“Thefilespecifiedisnotavirtualdisk”问题》文章描述了如何修复VMware虚拟机中出现的“指定的文件不是虚拟... 目录VMWare报错“指定的文件不是虚拟磁盘“或“The file specified is not a virt

Spring Security--Architecture Overview

1 核心组件 这一节主要介绍一些在Spring Security中常见且核心的Java类,它们之间的依赖,构建起了整个框架。想要理解整个架构,最起码得对这些类眼熟。 1.1 SecurityContextHolder SecurityContextHolder用于存储安全上下文(security context)的信息。当前操作的用户是谁,该用户是否已经被认证,他拥有哪些角色权限…这些都被保

RabbitMQ练习(AMQP 0-9-1 Overview)

1、What is AMQP 0-9-1 AMQP 0-9-1(高级消息队列协议)是一种网络协议,它允许遵从该协议的客户端(Publisher或者Consumer)应用程序与遵从该协议的消息中间件代理(Broker,如RabbitMQ)进行通信。 AMQP 0-9-1模型的核心概念包括消息发布者(producers/publisher)、消息(messages)、交换机(exchanges)、

OpenStack离线Train版安装系列—3控制节点-Keystone认证服务组件

本系列文章包含从OpenStack离线源制作到完成OpenStack安装的全部过程。 在本系列教程中使用的OpenStack的安装版本为第20个版本Train(简称T版本),2020年5月13日,OpenStack社区发布了第21个版本Ussuri(简称U版本)。 OpenStack部署系列文章 OpenStack Victoria版 安装部署系列教程 OpenStack Ussuri版

OpenStack离线Train版安装系列—2计算节点-环境准备

本系列文章包含从OpenStack离线源制作到完成OpenStack安装的全部过程。 在本系列教程中使用的OpenStack的安装版本为第20个版本Train(简称T版本),2020年5月13日,OpenStack社区发布了第21个版本Ussuri(简称U版本)。 OpenStack部署系列文章 OpenStack Victoria版 安装部署系列教程 OpenStack Ussuri版

OpenStack离线Train版安装系列—1控制节点-环境准备

本系列文章包含从OpenStack离线源制作到完成OpenStack安装的全部过程。 在本系列教程中使用的OpenStack的安装版本为第20个版本Train(简称T版本),2020年5月13日,OpenStack社区发布了第21个版本Ussuri(简称U版本)。 OpenStack部署系列文章 OpenStack Victoria版 安装部署系列教程 OpenStack Ussuri版

OpenStack离线Train版安装系列—0制作yum源

本系列文章包含从OpenStack离线源制作到完成OpenStack安装的全部过程。 在本系列教程中使用的OpenStack的安装版本为第20个版本Train(简称T版本),2020年5月13日,OpenStack社区发布了第21个版本Ussuri(简称U版本)。 OpenStack部署系列文章 OpenStack Victoria版 安装部署系列教程 OpenStack Ussuri版

OpenStack镜像制作系列5—Linux镜像

本系列文章主要对如何制作OpenStack镜像的过程进行描述记录 CSDN:OpenStack镜像制作教程指导(全) OpenStack镜像制作系列1—环境准备 OpenStack镜像制作系列2—Windows7镜像 OpenStack镜像制作系列3—Windows10镜像 OpenStack镜像制作系列4—Windows Server2019镜像 OpenStack镜像制作

OpenStack镜像制作系列4—Windows Server2019镜像

本系列文章主要对如何制作OpenStack镜像的过程进行描述记录  CSDN:OpenStack镜像制作教程指导(全) OpenStack镜像制作系列1—环境准备 OpenStack镜像制作系列2—Windows7镜像 OpenStack镜像制作系列3—Windows10镜像 OpenStack镜像制作系列4—Windows Server2019镜像 OpenStack镜像制作系

OpenStack镜像制作系列2—Windows7镜像

本系列文章主要对如何制作OpenStack镜像的过程进行描述记录 CSDN:OpenStack镜像制作教程指导(全) OpenStack镜像制作系列1—环境准备 OpenStack镜像制作系列2—Windows7镜像 OpenStack镜像制作系列3—Windows10镜像 OpenStack镜像制作系列4—Windows Server2019镜像 OpenStack镜像制作系列