<?xml version="1.0" encoding="utf-8" ?><rss version="2.0" xmlns:tt="http://teletype.in/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Egor Shulga</title><generator>teletype.in</generator><description><![CDATA[Egor Shulga]]></description><image><url>https://img3.teletype.in/files/ec/6f/ec6fae7d-a85e-4050-8f8f-aafa54504265.png</url><title>Egor Shulga</title><link>https://blog.egorshulga.eu.org/</link></image><link>https://blog.egorshulga.eu.org/?utm_source=teletype&amp;utm_medium=feed_rss&amp;utm_campaign=egorshulga</link><atom:link rel="self" type="application/rss+xml" href="https://teletype.in/rss/egorshulga?offset=0"></atom:link><atom:link rel="next" type="application/rss+xml" href="https://teletype.in/rss/egorshulga?offset=10"></atom:link><atom:link rel="search" type="application/opensearchdescription+xml" title="Teletype" href="https://teletype.in/opensearch.xml"></atom:link><pubDate>Wed, 06 May 2026 11:19:13 GMT</pubDate><lastBuildDate>Wed, 06 May 2026 11:19:13 GMT</lastBuildDate><item><guid isPermaLink="true">https://blog.egorshulga.eu.org/azure-private-endpoints</guid><link>https://blog.egorshulga.eu.org/azure-private-endpoints?utm_source=teletype&amp;utm_medium=feed_rss&amp;utm_campaign=egorshulga</link><comments>https://blog.egorshulga.eu.org/azure-private-endpoints?utm_source=teletype&amp;utm_medium=feed_rss&amp;utm_campaign=egorshulga#comments</comments><dc:creator>egorshulga</dc:creator><title>Azure Private Endpoints &amp; DNS: what happens under the hood</title><pubDate>Sun, 29 Jun 2025 11:58:54 GMT</pubDate><description><![CDATA[Private Endpoints allow establishing connectivity, which occurs exclusively over the backbone Azure networks, without the requests ever emerging to the public Internet.]]></description><content:encoded><![CDATA[
  <p id="k2aZ">Private Endpoints allow establishing connectivity, which occurs exclusively over the backbone Azure networks, without the requests ever emerging to the public Internet.</p>
  <p id="JVvi">... or at least that&#x27;s what they say.</p>
  <p id="0osB">There are multiple non-trivial steps required to enable it, and under the hood Azure is also performing some implicit configuration. It is easy to make a mistake in the configuration, which breaks connectivity. There are also some limitations, which could only be learned the hard way.</p>
  <p id="Bn5Q">So, what is required to configure a Private Endpoint? And what each step actually imply?</p>
  <nav>
    <ul>
      <li class="m_level_1"><a href="#63mT">Step 1: Create a Private Endpoint</a></li>
      <li class="m_level_1"><a href="#HYu5">Step 2: Create a Private DNS Zone</a></li>
      <li class="m_level_1"><a href="#DU1l">Step 3: Link the Private DNS Zone to the desired VNet</a></li>
      <li class="m_level_1"><a href="#eITF">Step 4: Register the Private DNS Zone under the Private Endpoint</a></li>
      <li class="m_level_1"><a href="#dyAE">Step 4 (alternative): Add A record manually</a></li>
    </ul>
  </nav>
  <hr />
  <h2 id="63mT">Step 1: Create a Private Endpoint</h2>
  <p id="cmib">This automatically creates a Network Interface Card (NIC) and connects it to the desired VNet. The NIC is assigned with a private IP from the range of the VNet.</p>
  <p id="rFfa">💥 Important side effect: when the first Private Endpoint is created for a resource, then Azure automatically amends records in the public DNS.</p>
  <p id="xxOg">🔗 DNS resolution <em>before</em> creating a Private Endpoint:</p>
  <ol id="gNBw">
    <li id="uCxB">Resource hostname → <code>CNAME</code> record for the Service Endpoint (hostname of the actual hosting server, e.g., <code>blob.ams23prdstr16a.store.core.windows.net</code>).</li>
    <li id="WJ5n">Service Endpoint -&gt; <code>A</code> record for the public IP of the hosting server.</li>
  </ol>
  <p id="ASzT">🔗 DNS resolution <em>after </em>creating a Private Endpoint:</p>
  <ol id="MTlS">
    <li id="S9oR">Resource hostname → <code>CNAME</code> record for <code>privatelink.{resource hostname}</code>.</li>
    <li id="jdTY"><code>privatelink.{resource hostname}</code> -&gt; <code>CNAME</code> record for the Service Endpoint.</li>
    <li id="NvHx">Service Endpoint -&gt; <code>A</code> record for the public IP.</li>
  </ol>
  <p id="ZpDC">⚠️ At this point, the target resource is already reachable via the private IP, but the hostname of the resource does not resolve to it yet.</p>
  <h2 id="HYu5">Step 2: Create a Private DNS Zone</h2>
  <p id="AEsi">The name of the Private DNS Zone must match the type of the target resource — check out <a href="https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-dns#commercial" target="_blank">the table in the documentation</a> to find out the correct one.</p>
  <p id="AHBl">There can be only one Private DNS Zone of each type in the same Resource Group — as the name of the Zone is a part of the Resource ID.</p>
  <h2 id="DU1l">Step 3: Link the Private DNS Zone to the desired VNet</h2>
  <p id="d8Y7">One Private DNS Zone can be linked to multiple VNets.</p>
  <p id="5yIp">One VNet can be linked only to one Private DNS Zone of each type.</p>
  <h2 id="eITF">Step 4: Register the Private DNS Zone under the Private Endpoint</h2>
  <p id="H7Vb">... or, in Azure Portal terms: add the <em>Private DNS Zone</em> into a <em>Private DNS Zone Group </em>under the <em>Private Endpoint.</em></p>
  <p id="bA6u">💥 Important side effect: Azure automatically creates an <code>A</code> record in the <em>Private DNS Zone</em> for the <em>private IP</em> of the <em>NIC</em> of the <em>Private Endpoint</em>.</p>
  <p id="gF4i">⚠️ It is impossible to add multiple <em>Private DNS Zones</em> of the same type into the same <em>Private DNS Zone Group </em>under a <em>Private Endpoint</em> (<a href="https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-dns-integration#virtual-network-workloads-without-azure-private-resolver:~:text=Important-,A%20single%20private%20DNS%20zone%20is%20required%20for%20this%20configuration,-.%20Creating%20multiple%20zones" target="_blank">docs</a>)<em>.</em></p>
  <p id="Nzdk">⚠️ It is impossible to add multiple <em>Private DNS Zone Groups </em>under a <em>Private Endpoint </em>(<a href="https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-dns-integration#virtual-network-workloads-without-azure-private-resolver:~:text=adding%20multiple%20dns%20zone%20groups%20to%20a%20single%20private%20endpoint%20isn&#x27;t%20supported" target="_blank">docs</a>).</p>
  <p id="zPk1">⚠️ It is possible to add only up to 5 <em>Private DNS Zones</em> into a <em>Private DNS Zone Group </em>under a <em>Private Endpoint </em>(<a href="https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-dns-integration#virtual-network-workloads-without-azure-private-resolver:~:text=Each%20DNS%20zone%20group%20can%20support%20up%20to%20five%20DNS%20zones." target="_blank">docs</a>).</p>
  <h2 id="dyAE">Step 4 (alternative): Add <code>A</code> record manually</h2>
  <p id="FFlR">When any of the limitations above are hit, then the only way to enable the correct DNS resolution is to manually manipulate the Private DNS Zone.</p>
  <p id="ZOUc">It is possible to find out which private IP was assigned to the NIC of the Private Endpoint. Then, it is required to manually add an <code>A</code> record for this IP — thus, at last enabling the correct DNS resolution from inside the VNet.</p>
  <hr />
  <p id="RNJs">🔗 DNS resolution after step 4 (see also <a href="https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-dns-integration#virtual-network-workloads-without-azure-private-resolver:~:text=The%20following%20screenshot%20illustrates%20the%20DNS%20resolution%20sequence%20from%20virtual%20network%20workloads%20using%20the%20private%20DNS%20zone%253A" target="_blank">an image in the docs</a>):</p>
  <ol id="p9ib">
    <li id="LgTW">(public DNS) Resource hostname → <code>CNAME</code> record for <code>privatelink.{resource hostname}</code>.</li>
    <li id="aFoo">(private DNS)<code>privatelink.{resource hostname}</code> -&gt; <code>A</code> record for the private IP.</li>
  </ol>
  <hr />
  <p id="hKcA">The connectivity using the Private Endpoints is tricky, because it requires not only to connect resources to VNets, but also to establish the correct DNS resolution. For some of the steps, Azure is implicitly performing some configurations in the public global DNS — which is not trivial and is not really expected to happen. Then, there are other side effects and limitations of configuring the Private DNS Zone. At the end, it is still possible to link everything together and achieve connectivity which does not ever leave the backbone Azure networks.</p>

]]></content:encoded></item><item><guid isPermaLink="true">https://blog.egorshulga.eu.org/terraform-custom-resource</guid><link>https://blog.egorshulga.eu.org/terraform-custom-resource?utm_source=teletype&amp;utm_medium=feed_rss&amp;utm_campaign=egorshulga</link><comments>https://blog.egorshulga.eu.org/terraform-custom-resource?utm_source=teletype&amp;utm_medium=feed_rss&amp;utm_campaign=egorshulga#comments</comments><dc:creator>egorshulga</dc:creator><title>Terraform: Custom Resources (without Go)</title><pubDate>Sat, 22 Mar 2025 15:00:34 GMT</pubDate><description><![CDATA[The approach bridges the gap between Terraform and other tooling that is not available via some custom provider. It can be used instead of implementing some custom provider, which allows to stay in the technology stack already adopted by the team.  The approach plays nicely in the cases, which the concepts provisioning, deprovisioning and drift mitigation could be applied to. There are some things that one needs to know when using it, but in general some new case can be implemented with the pattern only once, and as long there is no need to change it drastically, it will continue to live (it is even resilient to external impact – which is covered by the drift mitigation).]]></description><content:encoded><![CDATA[
  <nav>
    <ul>
      <li class="m_level_1"><a href="#5q1B">Create</a></li>
      <li class="m_level_1"><a href="#ZdiQ">Destroy</a></li>
      <li class="m_level_1"><a href="#Wzwp">Drift Mitigation</a></li>
    </ul>
  </nav>
  <p id="B4mi">Usually, when we want to cover some infrastructure with Terraform, we try to find an existing provider with a resource we need. For most usual cases (for major clouds, for most-used resources) we do find necessary implementations, but sometimes that happens, that we need either a not-so-popular resource, or something highly specialized, that no-one else bothered before us to it.</p>
  <p id="ibGx">The normal way would be then to implement a custom Terraform provider. But what if we don&#x27;t introduce Go into our codebase (as that is the only language the providers could be written with)? What if we don&#x27;t want to set up the full-blown tool chain for development, including CI/CD? What if we also don&#x27;t want to solve the trouble of publishing the resource into some registry (public registry, or a self-hosted one)?</p>
  <p id="nqXJ">As long as there are SDKs or tools that can create the needed resources for us, we can stay with the already known experiences and technologies, while using Terraform only for gluing everything together, so that we could manage the lifecycle of all resources (including their interdependencies) in one place.</p>
  <p id="shBk">Important: the new to-be-implemented custom resource should follow the declarative approach, and it should have characteristics of a resource. I.e., it should have clear lifecycle states, it should be possible <em>to provision </em>the resource, <em>to destroy </em>it, <em>to read </em>its current state (for drift detection). The resource may be virtual, or it may represent only a part of another resource. But as long as it complies with the <em>Declarative </em>approach (when we describe some desired state of resources – as opposed to the <em>Imperative </em>approach, when we write down the steps needed to achieve the desired state), it could be a good candidate for a custom resource.</p>
  <p id="Vt0j">So, to create a custom resource, we need to cover with some custom code the usual lifecycle states of a Terraform resource: provisioning (create), deprovisioning (destroy) and drift mitigation (refresh).</p>
  <section style="background-color:hsl(hsl(0,   0%,  var(--autocolor-background-lightness, 95%)), 85%, 85%);">
    <p id="PJTB">The approach expressed below targets Azure, and it also utilizes PowerShell 7+, but it only serves as an example, and the approach could be used in other cases and with other clouds as well.</p>
  </section>
  <p id="qeKf">When using Terraform with Azure, we employ <a href="https://aka.ms/tf/providermessaging" target="_blank">the recommended approach</a>: </p>
  <ul id="s3Mg">
    <li id="lEGn">At first, we try to find a definition for a needed resource in the <a href="https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs" target="_blank">AzureRM</a> provider (implemented and maintained by Hashicorp).</li>
    <li id="bmir">Then, if the needed resource is not available, we try to find it in <a href="https://learn.microsoft.com/en-us/azure/developer/terraform/overview-azapi-provider" target="_blank">AzAPI</a> (a thin wrapper over Azure APIs, which is maintained by Microsoft).</li>
    <li id="1IDc">And we fall back to the solution of custom resources only when neither AzureRM not AzAPI support what we need.</li>
  </ul>
  <p id="kQ3x">Let&#x27;s take as an example the task of managing the AppSettings of an Azure App Service <em>separately </em>from the App Service itself. For example, the need for it may arise in the following cases:</p>
  <ul id="Z7Y6">
    <li id="RsUu">When the App Service is created in one Terraform project, but its configuration is extended from another Terraform project (and there are some reasons to keep this separation).</li>
    <li id="WNh3">When there is a need to break a cyclic dependency: two App Services depend on each other, because they require some info from one another to complete their configuration.</li>
  </ul>
  <p id="Bq06">Example: App Service 1 wants to call App Service 2, and we want to implement authentication using their SystemAssigned managed identities (which are created automatically during provisioning of the App Services). App Service 1 needs to know the <code>principal_id</code> of App Service 2, so it could request a token for it. App Service 2 in its turn needs to know <code>principal_id</code>s of all callers (in this example, App Service 1), so it could validate them.</p>
  <figure id="QSUH" class="m_original">
    <img src="https://img2.teletype.in/files/d5/a1/d5a15118-1a0b-4a2c-81f2-fb0ec2bf1f68.png" width="773" />
  </figure>
  <p id="v33T">The needed resource is not supported by AzureRM (there is only <a href="https://github.com/hashicorp/terraform-provider-azurerm/issues/1212" target="_blank">a closed feature-request</a>). In AzAPI there is <a href="https://learn.microsoft.com/en-us/azure/templates/microsoft.web/sites/config/appsettings?pivots=deployment-language-terraform" target="_blank">a page</a> on the resource of AppSettings, but it seems to be auto-generated, and it also lacks examples of usage. Anyway, we still need an example to illustrate the approach, so we will proceed with creating a custom resource for it 😉</p>
  <p id="VOQ7">We will use this configuration of App Services (link to the repo with complete code, see at the end of the article).</p>
  <pre id="xUgw" data-lang="hcl"># apps.tf
# Declarations of Resource Group and AppService Plan are skipped.

resource &quot;azurerm_linux_web_app&quot; &quot;appService1&quot; {
  name                = &quot;app-service1&quot;
  location            = azurerm_resource_group.resourceGroup.location
  resource_group_name = azurerm_resource_group.resourceGroup.name
  service_plan_id     = azurerm_service_plan.appServicePlan.id

  site_config {}

  identity {
    type = &quot;SystemAssigned&quot;
  }

  app_settings = {
    EXAMPLE_SETTING_1 = 42
  }
}

resource &quot;azurerm_linux_web_app&quot; &quot;appService2&quot; {
  name                = &quot;app-service2&quot;
  location            = azurerm_resource_group.resourceGroup.location
  resource_group_name = azurerm_resource_group.resourceGroup.name
  service_plan_id     = azurerm_service_plan.appServicePlan.id

  site_config {}

  identity {
    type = &quot;SystemAssigned&quot;
  }

  app_settings = {
    EXAMPLE_SETTING_2 = 43
  }
}</pre>
  <h2 id="5q1B">Create</h2>
  <p id="6V1m">For reusability of the created custom resource, we will introduce a module. First, we will create a script that performs the creation of necessary App Settings. We will use PowerShell which invokes Az CLI, but as said before, any tool or programming language can be used.  It will be invoked by Terraform in the same way it can be invoked from terminal, so we only need to make sure that the necessary tooling is available on the host machine.</p>
  <pre id="PRsW" data-lang="powershell"># additional-app-settings/assets/create.ps1

[CmdletBinding()]
param (
  [Parameter(Mandatory)] [string] ${subscription-id},
  [Parameter(Mandatory)] [string] ${resource-group-name},
  [Parameter(Mandatory)] [string] ${app-service-name},
  [Parameter(Mandatory)] [hashtable] ${app-settings}
)

$settings = (${app-settings}.Keys | ForEach-Object { &quot;$($_)=$(${app-settings}[$_])&quot; }) -join &quot; &quot;

az webapp config appsettings set &#x60;
  --subscription ${subscription-id} &#x60;
  --resource-group ${resource-group-name} &#x60;
  --name ${app-service-name} &#x60;
  --settings $settings</pre>
  <p id="DJoR">To include the script into Terraform lifecycle, we will use the fake built-in resource <a href="https://developer.hashicorp.com/terraform/language/resources/terraform-data" target="_blank">terraform_data</a> with a custom <em>provisioner</em>.</p>
  <pre id="N0u5" data-lang="hcl"># additional-app-settings/main.tf
# See link to the repo below for configuration of the module (providers and inputs).

locals {
  # The property id is marked as &#x27;known after apply&#x27; during initial creation.
  # This avoids deadlocking the implemented custom refresh mechanism.
  # We parse the id to retrieve name and resource group name.
  appService        = provider::azurerm::parse_resource_id(var.appService.id)
  resourceGroupName = local.appService.resource_group_name
  appServiceName    = local.appService.resource_name
}

resource &quot;terraform_data&quot; &quot;appSettings&quot; {
  triggers_replace = {
    subscriptionId    = var.subscriptionId
    resourceGroupName = local.resourceGroupName
    appServiceName    = local.appServiceName
    appSettings       = var.appSettings
  }

  input = {
    subscriptionId    = var.subscriptionId
    resourceGroupName = local.resourceGroupName
    appServiceName    = local.appServiceName
    appSettings       = jsonencode(var.appSettings)
  }

  provisioner &quot;local-exec&quot; {
    when        = create
    interpreter = [&quot;pwsh&quot;, &quot;-Command&quot;]
    command     = &lt;&lt;-EOT
      ${path.module}/assets/create.ps1 &#x60;
        -subscription-id $env:subscriptionId &#x60;
        -resource-group-name $env:resourceGroupName &#x60;
        -app-service-name $env:appServiceName &#x60;
        -app-settings ($env:appSettings | ConvertFrom-Json -AsHashtable)
    EOT
    environment = self.input
    quiet       = true # Silences printing of the invoked command. All other output is not silenced.
  }
}</pre>
  <p id="N9FO">Note the following:</p>
  <ul id="hqIG">
    <li id="J59q">All inputs of the module are added into <code>triggers_replace</code>, which will make sure that any changes of the parameters are noticed and reconciled (although this is achieved via recreation).</li>
    <li id="wBGy">The properties <code>interpreter</code> and <code>command</code> together fulfil the task of invocation of custom code. If your script is written in Bash, you may use <code>[&quot;/bin/bash&quot;, &quot;-c&quot;]</code>.</li>
    <li id="bvRV">Parameters to the invoked script are passed via properties <code>input</code> and <code>environment</code>. This will become relevant for the destroy-time provisioner (explained below).</li>
  </ul>
  <p id="RfcU">Let&#x27;s instantiate the created module and check out it works.</p>
  <pre id="qWzB" data-lang="hcl"># app-settings.tf

module &quot;appService1AppSettings&quot; {
  source = &quot;./additional-app-settings&quot;

  subscriptionId = data.azurerm_client_config.current.subscription_id
  appService     = azurerm_linux_web_app.appService1
  appSettings = {
    CALLEE = azurerm_linux_web_app.appService2.identity[0].principal_id
  }
}

module &quot;appService2AppSettings&quot; {
  source = &quot;./additional-app-settings&quot;

  subscriptionId = data.azurerm_client_config.current.subscription_id
  appService     = azurerm_linux_web_app.appService2
  appSettings = {
    CALLER = azurerm_linux_web_app.appService1.identity[0].principal_id
  }
}</pre>
  <p id="sNNI">When we try to invoke <code>terraform apply</code> at this point, we will see that it works – the necessary app settings are created successfully. But if we try to run it once again, we will see that Terraform detected them as a drift and wants to remove them:</p>
  <pre id="wi7h">  # azurerm_linux_web_app.appService1 will be updated in-place
  ~ resource &quot;azurerm_linux_web_app&quot; &quot;appService1&quot; {
      ~ app_settings                                   = {
          - &quot;CALLEE&quot;           = &quot;808d076e-0d68-45e6-80aa-d7e194ddaed6&quot; -&gt; null
            # (1 unchanged element hidden)
        }
        # (28 unchanged attributes hidden)

        # (2 unchanged blocks hidden)
    }

  # azurerm_linux_web_app.appService2 will be updated in-place
  ~ resource &quot;azurerm_linux_web_app&quot; &quot;appService2&quot; {
      ~ app_settings                                   = {
          - &quot;CALLER&quot;           = &quot;384fe864-f61e-4335-bb1b-65198b89e872&quot; -&gt; null
            # (1 unchanged element hidden)
        }
        # (28 unchanged attributes hidden)

        # (2 unchanged blocks hidden)
    }    </pre>
  <p id="Z4sh">To mitigate that, we need to add the following section to the declarations of App Services:</p>
  <pre id="dn7j" data-lang="hcl"># apps.tf

resource &quot;azurerm_linux_web_app&quot; &quot;appService1&quot; {
  ...
  
  lifecycle {
    ignore_changes = [app_settings[&quot;CALLEE&quot;]]
  }
}

resource &quot;azurerm_linux_web_app&quot; &quot;appService2&quot; {
  ...
  
  lifecycle {
    ignore_changes = [app_settings[&quot;CALLER&quot;]]
  }
}</pre>
  <section style="background-color:hsl(hsl(0,   0%,  var(--autocolor-background-lightness, 95%)), 85%, 85%);">
    <p id="ICX3">That is the most unfortunate disadvantage of this solution. In the case when we need to configure some AppSettings in the resource itself, but some other with a separate module, we have to know the names of all additional AppSettings in advance and to ignore them in the App Services. Otherwise, Terraform will be trying to delete them every time.<br /><br />It is also possible to shift management of AppSettings completely out of the resource of App Service, and then to completely ignore the property <code>app_settings</code>. Terraform then will not know anything about the AppSettings, which also means that there will be completely no drift detection for them.</p>
  </section>
  <h2 id="ZdiQ">Destroy</h2>
  <p id="7YcQ">Deprovisioning phase will be covered by another script which is to be invoked by a destroy-time provisioner.</p>
  <pre id="6ZeL" data-lang="powershell"># additional-app-settings/assets/destroy.ps1

[CmdletBinding()]
param (
  [Parameter(Mandatory)] [string] ${subscription-id},
  [Parameter(Mandatory)] [string] ${resource-group-name},
  [Parameter(Mandatory)] [string] ${app-service-name},
  [Parameter(Mandatory)] [hashtable] ${app-settings}
)

$settings = ${app-settings}.Keys -join &quot; &quot;

az webapp config appsettings delete &#x60;
  --subscription ${subscription-id} &#x60;
  --resource-group ${resource-group-name} &#x60;
  --name ${app-service-name} &#x60;
  --setting-names $settings  </pre>
  <pre id="DVtu" data-lang="hcl"># additional-app-settings/main.tf

resource &quot;terraform_data&quot; &quot;appSettings&quot; {
  ...

  input = {
    subscriptionId    = var.subscriptionId
    resourceGroupName = local.resourceGroupName
    appServiceName    = local.appServiceName
    appSettings       = jsonencode(var.appSettings)
  }

  ...

  provisioner &quot;local-exec&quot; {
    when        = destroy
    interpreter = [&quot;pwsh&quot;, &quot;-Command&quot;]
    command     = &lt;&lt;-EOT
      ${path.module}/assets/destroy.ps1 &#x60;
        -subscription-id $env:subscriptionId &#x60;
        -resource-group-name $env:resourceGroupName &#x60;
        -app-service-name $env:appServiceName &#x60;
        -app-settings ($env:appSettings | ConvertFrom-Json -AsHashtable)
    EOT
    environment = self.input
    quiet       = true
    on_failure  = continue
  }
}</pre>
  <p id="MSGp">The destroy-time provisioner imposes some differences as compared to the create-time provisioner:</p>
  <ul id="oND2">
    <li id="se8M">The destroy-time provisioner cannot reference any local variables, input parameters or other resources. Instead, they always use the captured state of the existing (just-to-be-destroyed) resource. Thus, we capture all the necessary values in the available property <code>input</code>, and then access them in the provisioner block via the special <code>self</code> object.</li>
    <li id="oFzl">We use <code>environment</code> to inject the values into the script. With this, we follow <a href="https://developer.hashicorp.com/terraform/language/resources/provisioners/local-exec#command" target="_blank">the recommendation</a> against the code injection attack.</li>
    <li id="oIor">The property <code>environment</code> expects the type <code>map(string)</code>. When we need to pass some complex object (in our case – a map of key-values pairs of AppSettings), we need to serialize it before passing and deserialize it in the script (thus the invocations of <code>jsonencode()</code> and <code>ConvertFrom-Json</code>).</li>
    <li id="TrbX">And we also don&#x27;t want to be too strict in case of possible failures during the destruction of the resource. Supporting all cases which could go wrong is tricky (maybe the App Service itself has been deleted - we don&#x27;t want to cause a complete deadlock of Terraform), thus, we relax the requirement of successfulness with <code>on_failure = continue</code>.</li>
  </ul>
  <section style="background-color:hsl(hsl(0,   0%,  var(--autocolor-background-lightness, 95%)), 85%, 85%);">
    <p id="KQcu">Important: if we ever decide to decommission the resource, we need to be extremely careful. Destroy-time provisioners run only when they are present in the code at the time of the destruction. Multistep approach should be then utilized: <code>count = 0</code>, and only then deletion of the resource.</p>
  </section>
  <p id="23Yy">We can now test if the destroy-time provisioner works:</p>
  <pre id="Zpz9">terraform apply -replace module.appService1AppSettings.terraform_data.appSettings</pre>
  <h2 id="Wzwp">Drift Mitigation</h2>
  <p id="n2R6">One of the powerful Terraform features is drift mitigation. Every resource of every normal provider implements a special Read method, which is invoked during the refresh phase.</p>
  <p id="6gOC">Unfortunately, provisioners can be invoked only be of one of two types: <code>create</code> or <code>destroy</code>. There is no special provisioner type to hook into the refresh phase of <code>terraform apply</code>. For the drift mitigation, we will have to employ something else.</p>
  <p id="c60G">The implemented resource already reacts to changes of the input parameters (via <code>triggers_replace</code>). We need to add another &#x27;synchronization pulse&#x27; to mark the resource for recreation based on external changes. Luckily, there is a provider <a href="https://registry.terraform.io/providers/pseudo-dynamic/value/latest" target="_blank">pseudo-dynamic/value</a> with a resource that implements exactly the capability we need.</p>
  <p id="VSUQ">At the first step, we need to read the current AppSettings of the App Service. We can achieve it with a data-resource.</p>
  <pre id="eEI7" data-lang="hcl"># additional-app-settings/refresh.tf

data &quot;azurerm_linux_web_app&quot; &quot;appService&quot; {
  resource_group_name = local.resourceGroupName
  name                = local.appServiceName
}</pre>
  <p id="BpLR">Then, we need to find out, if the current AppSettings are in desired state (if all the necessary AppSettings are present and if the values are the same as we expect them to be).</p>
  <pre id="0qOr" data-lang="hcl"># additional-app-settings/refresh.tf

locals {
  currentAppSettings = data.azurerm_linux_web_app.appService.app_settings

  areAppSettingsInDesiredState = alltrue([
    for desiredKey, desiredValue in var.appSettings :
    contains(keys(local.currentAppSettings), desiredKey) ?
    local.currentAppSettings[desiredKey] == desiredValue :
    false # We can&#x27;t use the logical operator &#x27;&amp;&amp;&#x27; here due to a bug in short-circuiting.
  ])
}</pre>
  <p id="0qOr">Afterward, we will configure the reaction, when the AppSettings no longer appear to be in the desired state. We use the resource <a href="https://registry.terraform.io/providers/pseudo-dynamic/value/latest/docs/resources/replaced_when" target="_blank">value_replaced_when</a> for that.</p>
  <pre id="S23o" data-lang="hcl"># additional-app-settings/refresh.tf

resource &quot;value_replaced_when&quot; &quot;driftDetected&quot; {
  condition = !local.areAppSettingsInDesiredState
}</pre>
  <p id="ylqH">This resource is special: it expects only one boolean value as <code>condition</code>, and it produces a new random value every time when during an invocation of <code>terraform apply</code> the <code>condition</code> is <code>false</code>. Otherwise, it <em>locks</em> the previously produced value and <em>does not change it</em>. This fancy behavior then plays well with the property <code>triggers_replace</code>, as it causes recreation of the resource every time, when anything inside it changes.</p>
  <pre id="PJFT" data-lang="hcl"># additional-app-settings/main.tf

resource &quot;terraform_data&quot; &quot;appSettings&quot; {
  triggers_replace = {
    ...
    driftDetectionTrigger = value_replaced_when.driftDetected.value
  }

  ...
}</pre>
  <p id="znWw">With this, we made our custom resource to detect and react on any drift: be it someone accidentally changing an AppSetting, or even maliciously removing it.</p>
  <section style="background-color:hsl(hsl(0,   0%,  var(--autocolor-background-lightness, 95%)), 85%, 85%);">
    <p id="hWxK">There is a minor inconvenience: although one <code>terraform apply</code> correctly detects and mitigates the drift, there is a need to execute it another time - just so that the resource<code>value_replaced_when.driftDetected</code> could settle its <code>condition</code>.</p>
  </section>
  <p id="Vz80">Now, there is just one feature missing. When Terraform detects some drift during the refresh phase, it reports it in the log, so we could verify and explicitly approve it. We could achieve it with another fancy resource that prints custom warnings to the console logs.</p>
  <pre id="mtTD" data-lang="hcl"># additional-app-settings/main.tf

data &quot;validation_warnings&quot; &quot;appSettingsAreNotInDesiredState&quot; {
  dynamic &quot;warning&quot; {
    for_each = var.appSettings
    iterator = each
    content {
      condition = !contains(keys(local.currentAppSettings), each.key)
      summary   = &quot;AppSetting ${each.key} is not present, so it will be added&quot;
    }
  }

  dynamic &quot;warning&quot; {
    for_each = var.appSettings
    iterator = each
    content {
      condition = (
        contains(keys(local.currentAppSettings), each.key) ?
        local.currentAppSettings[each.key] != each.value :
        false
      )
      summary = &quot;AppSetting ${each.key} does not have desired value, so it will be updated&quot;
    }
  }
}</pre>
  <section style="background-color:hsl(hsl(0,   0%,  var(--autocolor-background-lightness, 95%)), 85%, 85%);">
    <p id="9a8q">Although this approach emulates the refresh phase, it <em>does not</em> entirely follow the regular phases of Terraform. The implementation relies on an additional data resource, meaning that its reading will happen every time. The flag <code>-refresh=false</code> does not take any effect – the drift will be detected and mitigated regardless.</p>
  </section>
  <p id="cES7"></p>
  <hr />
  <p id="KxF9"></p>
  <p id="atOZ">The approach expressed above could be used as a pattern. It bridges the gap between Terraform and other tooling that is not available via some custom provider. It can be used instead of implementing some custom provider, which allows to stay in the technology stack already adopted by the team.</p>
  <p id="pVe4">The approach plays nicely in the cases, which the concepts <em>provisioning</em>, <em>deprovisioning</em> and <em>drift mitigation</em> could be applied to. There are some things that one needs to know when using it, but in general some new case can be implemented with the pattern only once, and as long there is no need to change it drastically, it will continue to live (it is even resilient to external impact – which is covered by the drift mitigation).</p>
  <p id="YzQs">The complete executable code could be found in this repo: <a href="https://github.com/egorshulga/terraform-custom-resource" target="_blank">https://github.com/egorshulga/terraform-custom-resource</a>.</p>

]]></content:encoded></item><item><guid isPermaLink="true">https://blog.egorshulga.eu.org/k8s-oracle-cloud-always-free</guid><link>https://blog.egorshulga.eu.org/k8s-oracle-cloud-always-free?utm_source=teletype&amp;utm_medium=feed_rss&amp;utm_campaign=egorshulga</link><comments>https://blog.egorshulga.eu.org/k8s-oracle-cloud-always-free?utm_source=teletype&amp;utm_medium=feed_rss&amp;utm_campaign=egorshulga#comments</comments><dc:creator>egorshulga</dc:creator><title>K8s in Oracle Cloud Always Free tier (with Terraform)</title><pubDate>Tue, 05 Apr 2022 17:37:16 GMT</pubDate><media:content medium="image" url="https://img1.teletype.in/files/cd/99/cd99c7a7-250b-43a1-888a-9e11c7e0da5f.png"></media:content><description><![CDATA[<img src="https://miro.medium.com/max/700/1*YMBXHVRVjRbG_TirpQ_m3Q.png"></img>Upd. March 2022: I've been banned at the Oracle Cloud for having Belarus as the origin country. All attempts to restore the access were rejected with no explanation. Still, I hold this article as a nice exercise, although now I have to warn readers of possible consequences of using Oracle Cloud.]]></description><content:encoded><![CDATA[
  <p id="D8YC">Upd. March 2022: I&#x27;ve been banned at the Oracle Cloud because of the origin country. All attempts to restore the access were rejected with no explanation. Still, I hold this article as a nice exercise, although now I have to warn readers of possible consequences of using Oracle Cloud.</p>
  <p id="xGjL">Upd. 2023: free domain zone .ga was taken over by the Gabonese government. It seems, all access to previously registered domains is lost.</p>
  <p id="r87W">Upd. 2024: free registrar Freenom stopped operations. All top-level domains ceased to exist.</p>
  <hr />
  <p id="AVHa"></p>
  <p id="gOig">Oracle Cloud offers really good terms in the Always Free tier. As of January 2022 it includes 4 CPUs and 24 GB of memory for ARM-based VMs.</p>
  <p id="24ef">There are 2 options: we can use free resources while staying on the Always Free tier, or we can upgrade to the Pay-as-You-Go subscription. There are some differences in available resources limits, that is why there will be differences in cluster architecture. This post describes approach for provisioning cluster in the Always Free tier.</p>
  <blockquote id="XwG2">N.B.: one won’t get charged in the Always Free tier, even after trial is over. One may get chargedafter upgrading to Pay-as-You-Go subscription.</blockquote>
  <p id="5a56">Oracle Cloud has a resources of a managed K8s cluster, but unfortunately it is not available for Always Free tenancies (the limit is set to 0, all limits in this article are valid as of January 2022).</p>
  <figure id="3brA" class="m_original">
    <img src="https://miro.medium.com/max/700/1*YMBXHVRVjRbG_TirpQ_m3Q.png" width="700" />
  </figure>
  <p id="O9zZ">That means that for the Always Free tier we need to stick to the completely manual process of compute resources provisioning and K8s cluster deployment. So this post presents a way of provisioning a manually managed K8s cluster on Oracle Cloud ARM VMs. It describes architecture considerations, and also workarounds for issues that appeared on the road.</p>
  <blockquote id="Bqyb">TL;DR: reproducible Terraform scripts and steps to get started could be found <a href="https://github.com/egorshulga/oci-always-free-k8s" target="_blank">here</a>.</blockquote>
  <h2 id="6b2d">Compute resources provisioning</h2>
  <p id="0P7Z">We will be consuming all available compute resources in our cluster. We need a designated node for K8s control-plane (this will be a <em>leader</em> node), and multiple <em>worker </em>nodes. Each node will have 1 OCPU, which means that we can provision 1 leader node and 3 worker nodes. In our cluster leader node will have 3 GB of RAM, and each worker node will have 7 GB of RAM (making a total of 24 GB memory used).</p>
  <blockquote id="5tVg">N.B.: K8s issues a warning when a node has less then 2 CPUs. Our cluster is not a production one, and the goal is to maximize the number of nodes in the cluster. That is why we silence the error at K8s deployment.</blockquote>
  <p id="4ff3">The most often issue that Always Free tenancies face when provisioning VMs, is the <em>Out of host capacity</em> error.</p>
  <figure id="Ffm5" class="m_original">
    <img src="https://miro.medium.com/max/700/0*4_shpjm7cj2y21rZ.png" width="700" />
  </figure>
  <p id="aa23">The thing is that free compute resources resources are limited, and this error means that we got in the case when it has run out. Oracle says, that it is constantly adding capacity to its data centers, so we should just try another attempt in a couple of days.</p>
  <p id="7e8c">Sometimes it helps to switch to another availability domain, if the region has it. Always Free tenancies can only provision resources in their home region only, that is why it should be selected carefully. A list of regions with appropriate availability domains could be found <a href="https://docs.oracle.com/en-us/iaas/Content/General/Concepts/regions.htm#:~:text=the%20following%20table%20lists%20the%20regions%20in%20the%20oracle%20cloud%20infrastructure%20commercial%20realm" target="_blank">here</a>.</p>
  <p id="88f0">At some point I also noticed, that if there are two accounts in the same region, but one of them has a Pay-as-You-Go subscription, and another does not, then the first one gets some priority in provisioning, so it was possible to provision ARM VMs, while the other one could not (it took another week to become available for the second account).</p>
  <h2 id="ba09">Network architecture</h2>
  <p id="6a2e">We need to provision a <a href="https://docs.oracle.com/en-us/iaas/Content/Network/Tasks/managingVCNs_topic-Overview_of_VCNs_and_Subnets.htm" target="_blank">Virtual Cloud Network</a> (VCN) to allow instances to connect to the internet, as well as become accessible from it. VCNs have subnets, which could be public or private. To open incoming and outgoing connectivity for resources in public subnets, a) VCN must have an <a href="https://docs.oracle.com/en-us/iaas/Content/Network/Tasks/managingIGs.htm" target="_blank">Internet gateway</a>, and b) each resource must have a <a href="https://docs.oracle.com/en-us/iaas/Content/Network/Tasks/managingpublicIPs.htm" target="_blank">public IP</a> assigned. To open outgoing connectivity for resources in private subnets, VCN must have a <a href="https://docs.oracle.com/en-us/iaas/Content/Network/Tasks/NATgateway.htm" target="_blank">NAT gateway</a>. Incoming connectivity is initially unavailable.</p>
  <p id="100f">So, desired network architecture will be as follows: VCN with 2 subnets, public and private, and compute resources are assigned to the private subnet. The trouble with this approach is that it works for Pay-as-You-Go tenancies only. As of January 2022 Oracle <em>does not</em> allow provisioning of NAT gateways in the Always Free tier, which leads to <em>unavailable outgoing connectivity </em>for nodes in private subnets.</p>
  <p id="58ca">To overcome this limitation, our VCN will have just a single public subnet, and in order to open outgoing connectivity, each compute resource will be assigned with a public IP. Luckily, Oracle does not limit availability of ephemeral public IPs.</p>
  <p id="8270">Now each node becomes independently accessible from the internet (e.g. we can SSH to all of them). But we want to have as single efficient entry point for apps deployed to the cluster (as pods). We will achieve it by using a load balancer.</p>
  <h2 id="3a3e">Load balancing</h2>
  <p id="4286">Oracle Cloud provides 2 types of load balancer. The first one works on the OSI level 7, which basically makes it a reverse proxy. E.g., it can handle SSL termination. But when we are creating a load balancer of this type, we need to select its shape. Load balancer shapes specify available bandwidth, and Always Free tenancies are eligible for a single 10 Mbps load balancer.</p>
  <p id="075f">Another load balancer type is called Network Load Balancer (NLB). It works on OSI levels 3 and 4, and it can balance requests by IP-port pairs only. But this type <em>does not have any specification for bandwidth limit, </em>that is why we’ll use it in our cluster. We will put the NLB into the public subnet, and we will assign a reserved public IP, so it will become available from the Internet.</p>
  <p id="88c2">To enable load balancing, we also need to specify the following:</p>
  <ul id="h3OG">
    <li id="7016"><em>Listeners</em>, which represent ports that are available from the Internet</li>
    <li id="de7e"><em>Backend sets, </em>which represent sets of resources the requests are balanced to.</li>
    <li id="abea">For each backend set we need to add appropriate <em>backends, </em>which are target links to compute resources.</li>
  </ul>
  <p id="af73">In our case we’ll make the NLB listen to the following TCP ports:</p>
  <ul id="SVlX">
    <li id="f087">80 — for HTTP traffic forwarded to worker nodes.</li>
    <li id="40c5">443 — for HTTPS traffic to worker nodes.</li>
    <li id="efba">6443 — kubectl traffic to the leader node (for remote K8s management and apps deployment).</li>
  </ul>
  <p id="7e08">We also configure appropriate VCN ingress rules, to allow traffic to reach appropriate nodes.</p>
  <h2 id="c717">K8s deployment</h2>
  <p id="a481">We use <em>kubeadm </em>to make a completely silent installation of K8s components. First we need to deploy a control plane. To support automatic joining of worker nodes, a) each node has private in-cluster DNS name, and b) we generate a discovery token (<em>kubeadm token generate</em>), which is copied from the leader node to all worker nodes. After that we invoke <em>kubeadm init. </em>After control plane is up, we can set up worker nodes with <em>kubeadm join. </em>We need to allow TCP port 10250 in the inner-cluster communication, because that’s a management port for <em>kubelet</em> (K8s agent running on each node).</p>
  <p id="f3a3">K8s requires an overlay network plugin for the pods communications, and we’ll use Flannel for it. It works on a designated port on each node in the cluster (UDP 8472), that’s why we need to open this port in the VCN rules.</p>
  <p id="4a9b">We will also deploy some useful infrastructure in the cluster. First, we need an ingress controller (we’ll use <a href="https://kubernetes.github.io/ingress-nginx/" target="_blank">one based on nginx</a>), which will be used for exposing web apps using a route-based approach. We’ll deploy a NodePort Service to listen on ports 30080 and 30443 for HTTP and HTTPS respectively (BTW, these are the ports that are registered as targets in NLB). With that said, we have complete network architecture in our cluster.</p>
  <figure id="5Bdu" class="m_custom">
    <img src="https://miro.medium.com/max/411/1*NL4fYuCg7G_WLLI6DceZUQ.png" width="411" />
  </figure>
  <p id="74f3">Using this ingress controller, we’ll deploy a dashboard. Once it is available, we can open it in browser: <em>https://{cluster-public-ip}/dashboard.</em></p>
  <p id="9407">We’ll also deploy a cert-manager, which helps with issuance of Let’sEncrypt HTTPS certificates. After its deployment is complete, we will deploy <em>ClusterIssuer</em> for Let’sEncrypt. There is a small peculiarity, as it takes some time for the cert-manager to become available, and until that attempts to create a ClusterIssuer will fail with a cryptic error, and we can’t know about cert-manager readiness via some K8s API call. That’s why we retry creation of ClusterIssuer until it succeeds (usually it takes a minute or so).</p>
  <p id="946b">It works in conjunction with ingress-controller. To enable or, Ingress resource must be set up with appropriate public DNS name as a host.</p>
  <h2 id="8030">Bonus: free public domain name</h2>
  <p id="1872">We can register a free domain name at the <a href="http://freenom.com/" target="_blank">Freenom</a> registrar. It is reserved for a year (after it elapses, we should manually prolong it, we can do it for free as well).</p>
  <p id="5198">Once we have it, we can use configure the domain to target to the reserved public IP of the NLB. Go to <em>Services - My Domains - Manage Domain -</em> <em>Manage Freenom DNS. </em>We can add multiple 3rd level domains, and target it to the same public IP.</p>
  <figure id="QCdT" class="m_custom">
    <img src="https://miro.medium.com/max/700/1*CU54eeLJkl6rnfXM8i5bLw.png" width="700" />
  </figure>
  <p id="3113">On the image above you can see an example. The intention is to make cluster-specific apps available under <em>cluster</em> subdomain, while regular apps are to become available under the domain itself. As now we have a public domain, we can issue a proper LetsEncrypt HTTPS certificate for the app. These rules are to be setup using Ingress resources in the cluster. That’s how it could be done for the dashboard.</p>
  <figure>
    <script src="https://gist.github.com/egorshulga/9e0eaf7a2d2afb0184677d14211ca334.js"></script>
  </figure>
  <blockquote id="DleL">N.B.: as it is free, we are left with almost no warranty. I registered a domain in the zone .ga, and to my surprise I found out that it was not available in some locations (precisely, it could not be resolved from the US West coast, from New Zealand, from Singapore). I wrote to .ga zone support, and after a couple of days the issue got resolved (I did not get any response though).<br /><br />Domain in another zone did not have such issues.</blockquote>
  <figure id="nfX3" class="m_custom">
    <img src="https://miro.medium.com/max/700/1*A0ZbOK6yiBiXS0GcH2GTZQ.png" width="700" />
    <figcaption>a.ns.ga: that’s how it SHOULD NOT be</figcaption>
  </figure>
  <p id="1874">So that’s how we can provision compute and network resources in the Oracle Cloud to deploy a K8s cluster with public IP and load balancing, while staying in the Always Free tier.</p>

]]></content:encoded></item></channel></rss>