Skip to content

deploymentScript failures producing "A service error occured, ... Please try again later" #19447

@ejschoen

Description

@ejschoen

Bicep version
0.42.1

Describe the bug
Claude wrote me a deployment script to install Helm and helm charts under bicep. This is done after the cluster is running and after a straight Kubernetes YAML manifest has been deployed successfully. No variation of the module it tried worked. In all cases, I got this error (5 different correlation ids are included at the end of this issue):

 {
    "error": {
      "additionalInfo": null,
      "code": "DeploymentScriptBootstrapScriptExecutionFailed",
      "details": null,
      "message": "A service error occurred, the container group resource failed to start script execution. Correlation Id: 915fbe1e-abc8-4ad8-8cb0-ee41883b4231. Please try again later, if the issue persists
  contact technical support for further investigation.",
      "target": null
    },
    "state": "Failed"
  }

To Reproduce
I can't send the entire bicep package. Here's the script written for Helm chart deployment.

@export()
type HelmChartOptions = {
  @description('Helm release name.')
  releaseName: string
  @description('Chart name. For OCI, appended to repository. For traditional repos, the name in the repo.')
  chart: string
  @description('Repository URL. oci://... for OCI registries, https://... for traditional Helm repos. Omit for an absolute chart reference.')
  repository: string?
  @description('Chart version (semver). Defaults to latest.')
  version: string?
  @description('Target namespace.')
  namespace: string
  @description('Create the namespace if it does not exist. Defaults to true.')
  createNamespace: bool?
  @description('Values overrides merged into the chart values.')
  values: object?
  @description('Wait for resources to become ready. Defaults to true.')
  wait: bool?
  @description('Timeout for wait (e.g. "15m"). Defaults to "15m".')
  timeout: string?
}

param Charts HelmChartOptions[]
param Location string
param ClusterName string
param HelmDeployerIdentityId string
param Project string
param OtherTags object = {}
param ForceUpdateTag string = utcNow()
@description('Name of the storage account used by the deploymentScript for its script input/output share. Must be in the same resource group and have allowSharedKeyAccess=true. Typically the project storage account.')
param StorageAccountName string

resource scriptStorage 'Microsoft.Storage/storageAccounts@2024-01-01' existing = {
  name: StorageAccountName
}

@batchSize(1)
resource helmDeploymentScripts 'Microsoft.Resources/deploymentScripts@2023-08-01' = [
  for (chart, i) in Charts: {
    name: 'helm-${chart.releaseName}'
    location: Location
    kind: 'AzureCLI'
    identity: {
      type: 'UserAssigned'
      userAssignedIdentities: {
        '${HelmDeployerIdentityId}': {}
      }
    }
    tags: union(OtherTags, {
      Project: Project
      Creator: az.deployer().userPrincipalName
    })
    properties: {
      azCliVersion: '2.64.0'
      timeout: 'PT1H'
      retentionInterval: 'P1D'
      cleanupPreference: 'OnSuccess'
      forceUpdateTag: ForceUpdateTag
      storageAccountSettings: {
        storageAccountName: scriptStorage.name
        storageAccountKey: scriptStorage.listKeys().keys[0].value
      }
      environmentVariables: [
        { name: 'RG', value: resourceGroup().name }
        { name: 'CLUSTER', value: ClusterName }
        { name: 'RELEASE', value: chart.releaseName }
        { name: 'CHART', value: chart.chart }
        { name: 'REPOSITORY', value: chart.?repository ?? '' }
        { name: 'VERSION', value: chart.?version ?? '' }
        { name: 'NAMESPACE', value: chart.namespace }
        { name: 'CREATE_NS', value: (chart.?createNamespace ?? true) ? 'true' : 'false' }
        { name: 'VALUES_JSON', value: string(chart.?values ?? {}) }
        { name: 'WAIT', value: (chart.?wait ?? true) ? 'true' : 'false' }
        { name: 'TIMEOUT', value: chart.?timeout ?? '15m' }
      ]
      scriptContent: '''
set -euxo pipefail

# Allow AAD role propagation on just-created UAMI and fresh role assignments.
sleep 45

HELM_VERSION=v3.16.3
echo "[helm-deploy] downloading helm $HELM_VERSION binary"
curl -fsSL --max-time 120 -o /tmp/helm.tgz "https://get.helm.sh/helm-${HELM_VERSION}-linux-amd64.tar.gz"
python3 -c "import tarfile; tarfile.open('/tmp/helm.tgz').extractall('/tmp')"
mv /tmp/linux-amd64/helm /tmp/helm
chmod +x /tmp/helm
export PATH="/tmp:$PATH"
helm version

echo "[helm-deploy] getting AKS admin kubeconfig for $CLUSTER in $RG"
az aks get-credentials -g "$RG" -n "$CLUSTER" --admin --overwrite-existing --file /tmp/kubeconfig
export KUBECONFIG=/tmp/kubeconfig

CHART_REF=""
if [[ "$REPOSITORY" == oci://* ]]; then
  if [[ "$REPOSITORY" == *.azurecr.io* ]]; then
    REGISTRY_HOST=$(printf '%s' "$REPOSITORY" | sed -E 's|oci://([^/]+).*|\1|')
    REGISTRY_NAME=$(printf '%s' "$REGISTRY_HOST" | cut -d. -f1)
    echo "[helm-deploy] logging in to ACR $REGISTRY_NAME"
    TOKEN=$(az acr login -n "$REGISTRY_NAME" --expose-token --output tsv --query accessToken)
    helm registry login "$REGISTRY_HOST" \
      --username 00000000-0000-0000-0000-000000000000 \
      --password "$TOKEN"
  fi
  CHART_REF="${REPOSITORY%/}/$CHART"
elif [ -n "$REPOSITORY" ]; then
  helm repo add "$RELEASE-repo" "$REPOSITORY"
  helm repo update
  CHART_REF="$RELEASE-repo/$CHART"
else
  CHART_REF="$CHART"
fi

printf '%s' "$VALUES_JSON" > /tmp/values.json

CREATE_NS_FLAG=""
if [ "$CREATE_NS" = "true" ]; then
  CREATE_NS_FLAG="--create-namespace"
fi

VERSION_FLAG=""
if [ -n "$VERSION" ]; then
  VERSION_FLAG="--version $VERSION"
fi

WAIT_FLAG=""
if [ "$WAIT" = "true" ]; then
  WAIT_FLAG="--wait --timeout $TIMEOUT"
fi

echo "[helm-deploy] helm upgrade --install $RELEASE $CHART_REF -n $NAMESPACE"
helm upgrade --install "$RELEASE" "$CHART_REF" \
  --namespace "$NAMESPACE" $CREATE_NS_FLAG \
  -f /tmp/values.json \
  $VERSION_FLAG $WAIT_FLAG

echo "[helm-deploy] done"
'''
    }
  }
]

Additional context
There are the correlation ids of the failures:

  36cd8100-ae65-490f-90ae-217cd77c365b
  a460e061-6776-441f-b978-63d4273cc847
  c63195d7-5c6f-4aed-b729-959dc95fe1d3
  c2bf81ff-63fa-42d9-b768-21b319597540
  915fbe1e-abc8-4ad8-8cb0-ee41883b4231

Other items to note from Claude:

  • Subscription ec9c07f9-e5c5-444a-9dfa-ffafd9c85e8b
  • Region: centralus
  • Plain ACI works (with and without the same UAMI attached) — so it's deploymentScripts-specific
  • Template already uses explicit storageAccountSettings

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions