Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 Bug]: Selenium-grid Autoscaling setup #2354

Open
katukna opened this issue Aug 14, 2024 · 8 comments
Open

[🐛 Bug]: Selenium-grid Autoscaling setup #2354

katukna opened this issue Aug 14, 2024 · 8 comments
Labels
I-autoscaling-k8s Issue relates to autoscaling in Kubernetes, or the scaler in KEDA R-awaiting-answer

Comments

@katukna
Copy link

katukna commented Aug 14, 2024

What happened?

I have implemented Selenium-grid autoscaling on our AKS cluster using deployment files. Currently we have selenium-hub, selenium-node-chrome. Autoscaling is not enabled yet. We have two pods of selenium-node-chrome and We try to run three threads against these node-chrome pods, two get executed and one stays in the queue and fails after sometime. Its fair as autoscaling is not enabled and No. of concurrent sessions is set to "1". I am having a hard time understanding on how to setup this Autoscaling using the KEDA. Is there any clear documentation to how the autoscaling can be setup.

Command used to start Selenium Grid with Docker (or Kubernetes)

N/A

Relevant log output

N/A

Operating System

Linux

Docker Selenium version (image tag)

4.18.0

Selenium Grid chart version (chart version)

No response

Copy link

@katukna, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@VietND96
Copy link
Member

Hi, are you deploying on the AKS cluster using your own YAML manifest files? Can you refer to these YAML to see any clues? - https://github.com/SeleniumHQ/docker-selenium/releases/tag/4.23.1-20240813

@katukna
Copy link
Author

katukna commented Aug 15, 2024

Yes @VietND96 ... I was using my own yaml files. I have gone through the documentation README but I didn't get much info on how node-chrome autoscaling is based on. Is it done based on number of queue's which appears on the selenium-hub UI?

@katukna
Copy link
Author

katukna commented Aug 15, 2024

I have selenium-hub, selenium-node-chrome deployments and service files and ScaledObject. Attached the yaml files

selenium-hub.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: selenium-hub
  name: selenium-hub
  namespace: selenium-grid
spec:
  replicas: 1
  selector:
    matchLabels:
      app: selenium-hub
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: selenium-hub
    spec:
      containers:
      - image: selenium/hub:4.23.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /wd/hub/status
            port: 4444
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: selenium-hub
        ports:
        - containerPort: 4444
          protocol: TCP
        - containerPort: 4443
          protocol: TCP
        - containerPort: 4442
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /wd/hub/status
            port: 4444
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          limits:
            cpu: 500m
            memory: 1000Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

selenium-hub-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app: selenium-hub
  name: selenium-hub
  namespace: selenium-grid
spec:
  ports:
  - name: port0
    port: 4444
    protocol: TCP
    targetPort: 4444
  - name: port1
    port: 4443
    protocol: TCP
    targetPort: 4443
  - name: port2
    port: 4442
    protocol: TCP
    targetPort: 4442
  - name: node
    port: 5555
    protocol: TCP
    targetPort: 5555
  - name: port3
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: selenium-hub
  sessionAffinity: None
  type: ClusterIP

selenium-node-chrome-deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: selenium-node-chrome
  name: selenium-node-chrome
  namespace: selenium-grid
spec:
  replicas: 2
  selector:
    matchLabels:
      app: selenium-node-chrome
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: selenium-node-chrome
    spec:
      containers:
      - env:
        - name: SE_EVENT_BUS_HOST
          value: selenium-hub
        - name: SE_EVENT_BUS_SUBSCRIBE_PORT
          value: "4443"
        - name: SE_EVENT_BUS_PUBLISH_PORT
          value: "4442"
        image: selenium/node-chrome:4.23.1
        imagePullPolicy: IfNotPresent
        name: selenium-node-chrome
        ports:
        - containerPort: 5555
          protocol: TCP
        resources:
          limits:
            cpu: 500m
            memory: 1000Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /dev/shm
          name: dshm
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir:
          medium: Memory

selenium-node-chrome-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    name: selenium-node-chrome
  name: selenium-node-chrome
  namespace: selenium-grid
spec:
  ports:
  - name: nodeport
    port: 5555
    protocol: TCP
    targetPort: 5555
  - name: node-port-grid
    port: 4444
    protocol: TCP
    targetPort: 4444
  - name: no-vnc
    port: 7900
    protocol: TCP
    targetPort: 7900
  selector:
    app: selenium-node-chrome
  sessionAffinity: None
  type: ClusterIP

Chrome-ScaledObject.yaml

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  namespace: selenium-grid
  name: chrome-scale-deployment
  labels:
    deploymentName: selenium-node-chrome
spec:
  minReplicaCount: 2
  maxReplicaCount: 5
  scaleTargetRef:
    name: selenium-node-chrome
  triggers:
    - type: selenium-grid
      metadata:
        url: 'https://selenium-hub.example.com:4444/graphql'
        browserName: 'chrome'
        unsafeSsl : 'true'

@edsherwin
Copy link

@katukna Have you resolved your issue, or have you found a solution? Please share as I'm encountering this issue in my AKS. Thanks

@katukna
Copy link
Author

katukna commented Sep 3, 2024

Not yet @edsherwin ... Let me know if you found out a way to do it?

@VietND96
Copy link
Member

In Chrome-ScaledObject.yaml, can you try to update metadata.url point to hub svc (e.g svc_name.namespace) instead of public dns/loadbalancer IP with https. For example

  triggers:
    - type: selenium-grid
      metadata:
        url: 'http://selenium-hub.selenium-grid:4444/graphql'
        browserName: 'chrome'
        unsafeSsl : 'true'

@VietND96 VietND96 added I-autoscaling-k8s Issue relates to autoscaling in Kubernetes, or the scaler in KEDA and removed needs-triaging labels Oct 11, 2024
@VietND96
Copy link
Member

VietND96 commented Dec 4, 2024

There were few fixes recently on autoscaling with KEDA, in the scaler logic. You can refer to this https://github.com/SeleniumHQ/docker-selenium/tree/trunk/.keda to preview the fix and verify

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-autoscaling-k8s Issue relates to autoscaling in Kubernetes, or the scaler in KEDA R-awaiting-answer
Projects
None yet
Development

No branches or pull requests

3 participants