问题描述
- 我正在尝试使用 null_resource 使用 Terraform 的 remote-exec 供应器供应多个 Windows EC2 实例.
$ terraform -vTerraform v0.12.6提供者.aws v2.23.0provider.null v2.1.2
- 最初,我使用三个远程执行配置程序(其中两个涉及重新启动实例)没有 null_resource 并且对于单个实例,一切正常.
- 然后我需要增加计数并基于多个链接,最终使用 null_resource.因此,我已将问题减少到我什至无法使用 null_resource 为超过 2 个 Windows EC2 实例运行一个远程执行配置程序的程度.
- Originally, I was working with three remote-exec provisioners (Two of them involved rebooting the instance) without null_resource and for a single instance, everything worked absolutely fine.
- I then needed to increase the count and based on several links, ended up using null_resource. So, I have reduced the issue to the point where I am not even able to run one remote-exec provisioner for more than 2 Windows EC2 instances using null_resource.
重现错误信息的 Terraform 模板:
//VARIABLES
variable "aws_access_key" {
default = "AK"
}
variable "aws_secret_key" {
default = "SAK"
}
variable "instance_count" {
default = "3"
}
variable "username" {
default = "Administrator"
}
variable "admin_password" {
default = "Password"
}
variable "instance_name" {
default = "Testing"
}
variable "vpc_id" {
default = "vpc-id"
}
//PROVIDERS
provider "aws" {
access_key = "${var.aws_access_key}"
secret_key = "${var.aws_secret_key}"
region = "ap-southeast-2"
}
//RESOURCES
resource "aws_instance" "ec2instance" {
count = "${var.instance_count}"
ami = "Windows AMI"
instance_type = "t2.xlarge"
key_name = "ec2_key"
subnet_id = "subnet-id"
vpc_security_group_ids = ["${aws_security_group.ec2instance-sg.id}"]
tags = {
Name = "${var.instance_name}-${count.index}"
}
}
resource "null_resource" "nullresource" {
count = "${var.instance_count}"
connection {
type = "winrm"
host = "${element(aws_instance.ec2instance.*.private_ip, count.index)}"
user = "${var.username}"
password = "${var.admin_password}"
timeout = "10m"
}
provisioner "remote-exec" {
inline = [
"powershell.exe Write-Host Instance_No=${count.index}"
]
}
// provisioner "local-exec" {
// command = "powershell.exe Write-Host Instance_No=${count.index}"
// }
// provisioner "file" {
// source = "testscript"
// destination = "D:/testscript"
// }
}
resource "aws_security_group" "ec2instance-sg" {
name = "${var.instance_name}-sg"
vpc_id = "${var.vpc_id}"
// RDP
ingress {
from_port = 3389
to_port = 3389
protocol = "tcp"
cidr_blocks = ["CIDR"]
}
// WinRM access from the machine running TF to the instance
ingress {
from_port = 5985
to_port = 5985
protocol = "tcp"
cidr_blocks = ["CIDR"]
}
tags = {
Name = "${var.instance_name}-sg"
}
}
//OUTPUTS
output "private_ip" {
value = "${aws_instance.ec2instance.*.private_ip}"
}
观察:
- 使用一个 remote-exec 配置器,如果 count 设置为 1 或 2,它可以正常工作.如果 count 为 3,则无法预测所有配置器每次都会在所有实例上运行.但是,可以肯定的是,Terraform 永远不会完成并且不会显示输出变量.它一直显示null_resource.nullresource[count.index]:仍在创建..."
- 对于 local-exec 供应商 - 一切正常.使用 count 的值为 1、2 和 7 进行测试.
- 对于 file provisioner,它在 1、2 和 3 上工作正常,但在 7 上没有完成,但文件已复制到所有 7 个实例上.它一直显示null_resource.nullresource[count.index]:仍在创建..."
- 此外,在每次尝试中,remote-exec 配置程序都能够连接到实例,而不管 count 的值如何,只是它不会触发内联命令并随机选择跳过该命令并开始显示仍在创建..."消息.
- 我已经被这个问题困扰了很长一段时间了.在调试日志中也找不到任何重要的东西.我知道不建议将 Terraform 用作配置管理工具,但是,如果实例计数仅为 1(即使没有 null_resource),即使使用复杂的配置脚本,一切都可以正常工作,这表明 Terraform 应该很容易处理这样的基本配置要求.
- TF_DEBUG 日志:
- count=2,TF 成功完成并显示 Apply complete!.
- count=3,TF 在所有三个实例上运行远程执行,但是确实如此不完整且不显示输出变量.卡在仍在创建..."
- count=3,TF 仅在两个实例上运行远程执行并跳过nullresource[1] ,未完成且未显示输出变量.卡在仍在创建..."
- 任何指点将不胜感激!
- With one remote-exec provisioner, it works fine if count is set to 1 or 2. With count 3, it's unpredictable that all the provisioners will run everytime on all the instances. However one thing is for sure that Terraform never completes and does not show the output variables. It keeps showing "null_resource.nullresource[count.index]: Still creating..."
- For the local-exec provisioner - Everything works fine. Tested with count's value as 1, 2 and 7.
- For file provisioner its working fine for 1, 2 and 3 however does not finish for 7 but the file was copied on all the 7 instances. It keeps showing "null_resource.nullresource[count.index]: Still creating..."
- Also, in every attempt, remote-exec provisioner is able to connect to the instances irrespective of count's value and it's just that, it's doesnt trigger the inline command and randomly chooses to skip that and starts showing "Still creating..." message.
- I have been stuck with this issue for quite some time now. Couldnt find anything significant in debug logs as well. I know Terraform is not recommended to be used as a config mgmt tool however, everything's working fine even with complex provisioning scripts if the instance count is just 1 (Even without null_resource) which indicates that it should be easily possible for Terraform to handle such a basic provisioning requirement.
- TF_DEBUG logs:
- count=2, TF completes successfully and shows Apply complete!.
- count=3, TF runs the remote-exec on all the three instances however does not complete and doesn't not show the outputs variables. Stuck at "Still creating..."
- count=3, TF runs the remote-exec only on two instances and skips on nullresource[1] , does not complete and doesn't not show the outputs variables. Stuck at "Still creating..."
- Any pointers will be greatly appreciated!
推荐答案
更新:最终的诀窍是将 Terraform 降级到 v11.14
根据这个 问题评论.
Update: what eventually did the trick was downgrading Terraform to v11.14
as per this issue comment.
您可以尝试一些事情:
- 内联
remote-exec
:
resource "aws_instance" "ec2instance" {
count = "${var.instance_count}"
# ...
provisioner "remote-exec" {
connection {
# ...
}
inline = [
# ...
]
}
}
现在可以参考 self
在 connection
块中获取实例的私有 IP.
Now you can refer to self
inside the connection
block to get the instance's private IP.
- 将
触发器
添加到null_resource
:
resource "null_resource" "nullresource" {
triggers {
host = "${element(aws_instance.ec2instance.*.private_ip, count.index)}" # Rerun when IP changes
version = "${timestamp()}" # ...or rerun every time
}
# ...
}
您可以使用 triggers
属性 重新创建 null_resource
从而重新执行 remote-exec
.
You can use the triggers
attribute to recreate null_resource
and thus re-execute remote-exec
.
这篇关于当 instance_count 大于 2 时使用 remote-exec 配置程序时 Terraform 卡住的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,WP2