Skip to content

Conversation

RomanBednar
Copy link
Contributor

Do not merge, this PR is for collaboration purposes only and will be closed later.

Notes

  • Documentation

    • Sections to remove from our documentation:
      - 5.9.5. AWS EFS CSI cross account support
      - 5.9.6. Creating and configuring access to EFS volumes in AWS
      - These two sections should be consolidated into a single comprehensive section, as they are interdependent and cannot function independently
    • A statement should be added indicating that this functionality is fully supported for both STS and non-STS clusters
    • Code blocks with comments should be replaced with asides/alerts/callouts for improved readability
    • Section 5.9.8. Creating static PVs with Amazon Elastic File Storage should mention that when using Cross Account feature the volumeHandle has to reference EFS ID of the filesystem in Account B
  • Manual testing procedures:

    • Test standalone cluster only (no ROSA) with STS and without STS enabled, the docs "branch" to steps:
      • 5.9.5.5.1 - for non-STS clusters
      • 5.9.5.5.2 - for STS clusters
    • Follow the guide precisely, utilizing copy-paste methodology where applicable
    • Verify successful execution of each command before proceeding to the next step
    • Document any unclear instructions for future refinement
  • CI configuration considerations:

    • Ideally, CI coverage should align with our documentation; however, current implementation presents some discrepancies:

Testing

  • Install EFS operator from OperatorHub as usual, for STS use ccoctl to create AWS Role

  • Create testing namespace:

oc create ns testns
  • Create PVC:
cat << EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: efs-crossaccount-pvc
  namespace: testns
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: efs-sc-cross
  resources:
    requests:
      storage: 1Gi
EOF
  • Create pod to mount the volume:
cat << EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: example-1
  labels:
    app: httpd
  namespace: testns
spec:
  volumes:
    - name: task-pv-storage
      persistentVolumeClaim:
        claimName: efs-crossaccount-pvc
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: httpd
      image: 'image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest'
      ports:
        - containerPort: 8080
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: task-pv-storage
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
EOF
  • Check if there are no events/errors and that volume mounted

cat << EOF | oc create -f -
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap 
  fileSystemId: fs-0b55afffece6c2d6c 
  directoryPerms: "700" 
  gidRangeStart: "1000" 
  gidRangeEnd: "2000" 
  basePath: "/dynamic_provisioning"
  volumeBindingMode: Immediate
EOF

@openshift-ci openshift-ci bot requested review from gnufied and tsmetana June 23, 2025 09:42
Copy link
Contributor

openshift-ci bot commented Jun 23, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RomanBednar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 23, 2025
README-tmp.md Outdated
## 5.9.5.1 Requirements

- The EFS CSI Operator must be installed prior to beginning this procedure. Follow the _Installing the AWS EFS CSI Driver Operator_ procedure.
- Both the Red Hat OpenShift Service on AWS cluster and EFS file system must be located in the same AWS region.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Both the Red Hat OpenShift Service on AWS cluster and EFS file system must be located in the same AWS region.
- Both the Openshift cluster and EFS file system must be located in the same AWS region.

Red Hat OpenShift Service on AWS cluster stands for ROSA, if it will be used for doc team reference better to correct it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed it - fixed.


- The EFS CSI Operator must be installed prior to beginning this procedure. Follow the _Installing the AWS EFS CSI Driver Operator_ procedure.
- Both the Red Hat OpenShift Service on AWS cluster and EFS file system must be located in the same AWS region.
- Ensure that the two VPCs referenced in this guide utilize different network CIDR ranges.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also another scenario 2 accounts use the same shared-vpc, not sure it is worth to do some tests about this.
cc to @ropatil010

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, similar to the "same availability zone" feature mentioned in another comment here - we don't document it now and probably should not add new features in scope of this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked and thanks for the double confirm!

README-tmp.md Outdated
export AWS_ACCOUNT_B="<ACCOUNT_B_NAME>"
```

5. Ensure that your AWS CLI is configured with JSON output format as the default for both accounts:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. Ensure that your AWS CLI is configured with JSON output format as the default for both accounts:
5. Ensure that your AWS CLI is configured with JSON output format as the default for both accounts(the output format could be override by `export AWS_DEFAULT_OUTPUT="json"`):

Maybe better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is one of the methods of changing the output format, however I intentionally avoided explaining it here since we would be documenting AWS here, which is not desired. For this purposes I've added a link in the paragraph below to official AWS doc (which also explains AWS_DEFAULT_OUTPUT var): https://github.com/openshift/csi-operator/pull/401/files#diff-5c4f1f33bb4c41d6c3e57b67e95dc097ebf581e3fdc2f5bbf848f15135179035R61

README-tmp.md Outdated
done
```

5. Enable DNS resolution for Account A to access Account B's VPC:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not add the step(optional step?) in CI, from the upstream doc it seems also needs if we want to ensure that your EFS Mount Target is in the same availability zone with node, it also needs extra config in secret right?

If you would like to ensure that your EFS Mount Target is in the same availability zone as your EKS Node, then ensure you have completed the prerequisites for cross-account DNS resolution and include the crossaccount key with value true. For example, kubectl create secret generic x-account --namespace=kube-system --from-literal=awsRoleArn='arn:aws:iam::123456789012:role/EFSCrossAccountAccessRole' --from-literal=crossaccount='true' instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, it seems like this would be a prerequisite to using crossaccount='true' which is indeed reflected in a secret -> but this is something we do not document currently. So I would remove this step for now unless we get an RFE just so we don't introduce this change and bloat the testing even more.

README-tmp.md Outdated

```bash
for SUBNET in $(aws ec2 describe-subnets \
--query 'Subnets[*].{SubnetId:SubnetId}' \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--query 'Subnets[*].{SubnetId:SubnetId}' \
--filters "Name=vpc-id,Values=${AWS_ACCOUNT_B_VPC_ID}" \
--query 'Subnets[*].{SubnetId:SubnetId}' \

We'd better filter with vpc id since current way will query all subnets in the region.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, corrected.

export STORAGE_CLASS_NAME=efs-sc-cross
```

2. Create a secret that references the role ARN in Account B:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we'd better also mention that openshift-cluster-csi-drivers is the operator installed namespace, sometimes customer install it in some other namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should mention it - changing this to variable.

@RomanBednar RomanBednar changed the title Draft: AWS standalone cross account doc draft Draft: AWS standalone cross account doc Jul 8, 2025
@RomanBednar
Copy link
Contributor Author

@Phaow Thank you for the comments and review, I've addressed them now - please take a look and let me know if there's anything else. When done I suggest we test this manually to make sure there are no mistakes before handoff to docs team.

@Phaow
Copy link
Contributor

Phaow commented Jul 9, 2025

LGTM, @ropatil010 as the feature main QE owner, could you please cross check and verify the doc works as expected? Thanks!

@RomanBednar RomanBednar force-pushed the cross-account-doc-revamp branch from fa7af77 to 276d1f8 Compare July 9, 2025 13:24
@RomanBednar RomanBednar force-pushed the cross-account-doc-revamp branch from 276d1f8 to 95e640e Compare July 9, 2025 13:29
@ropatil010
Copy link

Hi @RomanBednar, After testing the steps as mentioned in PR, observed 2 issues.

  1. aws ec2 describe-subnets
    --filters "Name=vpc-id,Values=${AWS_ACCOUNT_B_VPC_ID}"
    --region ${AWS_REGION}
    | jq -r '.[].SubnetId'
    Error: jq: error (at :132): Cannot index array with string "SubnetId"

Need to update to:
aws ec2 describe-subnets
--filters "Name=vpc-id,Values=${AWS_ACCOUNT_B_VPC_ID}"
--region ${AWS_REGION}
| jq -r '.Subnets.[].SubnetId'

  1. CSI_DRIVER_NAMESPACE == openshift-cluster-csi-drivers
    If used others like default, hit error while volume provisioning
    Warning ProvisioningFailed 2s (x6 over 33s) efs.csi.aws.com_ip-10-0-65-108_145f242e-8745-47cf-93c9-2b6c41e94e25 failed to provision volume with StorageClass "efs-sc-cross": error getting secret my-efs-cross-account in namespace default: secrets "my-efs-cross-account" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:aws-efs-csi-driver-controller-sa" cannot get resource "secrets" in API group "" in the namespace "default"

@Phaow
Copy link
Contributor

Phaow commented Jul 10, 2025

CSI_DRIVER_NAMESPACE == openshift-cluster-csi-drivers
If used others like default, hit error while volume provisioning
Warning ProvisioningFailed 2s (x6 over 33s) efs.csi.aws.com_ip-10-0-65-108_145f242e-8745-47cf-93c9-2b6c41e94e25 failed to provision volume with StorageClass "efs-sc-cross": error getting secret my-efs-cross-account in namespace default: secrets "my-efs-cross-account" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:aws-efs-csi-driver-controller-sa" cannot get resource "secrets" in API group "" in the namespace "default"

@ropatil010 So you create the secret in default while install the csi driver operator in openshift-cluster-csi-drivers, right? We should only support the secret and the operator create/install in the same namespace. We should declare this in doc, I guess if you install the operator also in the default ns, it also works(the sa could get the secret in the same ns it located).

@ropatil010
Copy link

CSI_DRIVER_NAMESPACE == openshift-cluster-csi-drivers
If used others like default, hit error while volume provisioning
Warning ProvisioningFailed 2s (x6 over 33s) efs.csi.aws.com_ip-10-0-65-108_145f242e-8745-47cf-93c9-2b6c41e94e25 failed to provision volume with StorageClass "efs-sc-cross": error getting secret my-efs-cross-account in namespace default: secrets "my-efs-cross-account" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:aws-efs-csi-driver-controller-sa" cannot get resource "secrets" in API group "" in the namespace "default"

@ropatil010 So you create the secret in default while install the csi driver operator in openshift-cluster-csi-drivers, right? We should only support the secret and the operator create/install in the same namespace. We should declare this in doc, I guess if you install the operator also in the default ns, it also works(the sa could get the secret in the same ns it located).

Agree @Phaow , Thanks for input.
This has been mentioned in this PR earlier which i missed it.

@RomanBednar
Copy link
Contributor Author

@ropatil010 For the issues:

  1. has been fixed
  2. as you already found the will be visible a comment in the final docs for the variable holding namespace name:
    export CSI_DRIVER_NAMESPACE="openshift-cluster-csi-drivers" #Change this if your driver is installed in non-default namespace

Thanks for the feedback and let us know when you're done with the testing so we can proceed.

EOF
```

3. In AWS Account A, attach the AWS-managed policy "AmazonElasticFileSystemClientFullAccess" to the OpenShift Container Platform cluster master role:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently tested on OCP standalone and OCP HCP kind clusters.
Note: If running on HCP kind cluster no need of executing this 3rd steps. Since HCP cluster has only worker nodes.
I have tested with all mentioned steps on HCP kind cluster except this 3rd step and we can provision volume, able to read/write/execute.
We can mark this doc on both OCP standalone and OCP HCP kind clusters.
cc: @jsafrane @Phaow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you test without the step volumeReclaimPolicy work as expected? I mean in hcp clusters delete the pvc then the pv should be deleted to be consistent with the volumeReclaimPolicy:Delete. The step is used for driver controller reclaim volumes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess on HCP Kind clusters we have to add permission to worker role nodes instead of master nodes. I tested using this and it worked fine.
EFS_POLICY_ARN=arn:aws:iam::aws:policy/AmazonElasticFileSystemClientFullAccess for NODE in $(oc get nodes --selector=node-role.kubernetes.io/worker | tail -n +2 | awk '{print $1}') do INSTANCE_PROFILE=$(aws ec2 describe-instances --filters "Name=private-dns-name,Values=${NODE}" --query 'Reservations[].Instances[].IamInstanceProfile.Arn' --output text | awk -F'/' '{print $NF}') WORKER_ROLE_ARN=$(aws iam get-instance-profile --instance-profile-name ${INSTANCE_PROFILE} --query 'InstanceProfile.Roles[0].Arn' --output text) WORKER_ROLE_NAME=$(echo ${WORKER_ROLE_ARN} | awk -F'/' '{print $NF}') echo "Assigning policy ${EFS_POLICY_ARN} to role ${WORKER_ROLE_NAME} for node ${NODE}" aws iam attach-role-policy --role-name ${WORKER_ROLE_NAME} --policy-arn ${EFS_POLICY_ARN} done

`
oc get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE
pvc-fd9dabc9-a47f-4431-a435-267040c753a6 1Gi RWX Delete Bound testropatil/mypvc-fs efs-sc-cross 10s

oc delete deployment --all -n testropatil
deployment.apps "mydep-fs" deleted
oc delete pvc --all -n testropatil
persistentvolumeclaim "mypvc-fs" deleted
oc get pv
No resources found
`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is up to which nodes the driver controller pod located(hcp clusters the driver controller located at workers), thanks for the doulbe checking!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! I've updated the doc to explain which node selector to use in this step: deea616

Copy link
Contributor

openshift-ci bot commented Jul 16, 2025

@RomanBednar: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure deea616 link true /test e2e-azure
ci/prow/hypershift-e2e-aks deea616 link true /test hypershift-e2e-aks

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

provisioningMode: efs-ap
fileSystemId: ${CROSS_ACCOUNT_FS_ID}
directoryPerms: "700"
gidRangeStart: "1000"
Copy link

@ropatil010 ropatil010 Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @RomanBednar,
Can we remove these 2 fields gidRangeStart, gidRangeEnd. Not sure if customer may hit issue while creating new volume "2000+" since it may reach gitRangeEnd: "2000"?

Others LGTM. Thank you.!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, gid range parameters serve no purpose in this configuration. However I've checked the current standalone documentation and the storage class has the same parameters in that example. So I'd rather keep it unchanged here for the sake of consistency.

@ropatil010
Copy link

Added test results here: https://issues.redhat.com/browse/STOR-2378
Based on the testing adding label.
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 24, 2025
@RomanBednar
Copy link
Contributor Author

/hold

We won't be merging this PR - after official docs are finished we'll close this.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants