Overview Stefanini Group is seeking a Senior Compute Engineer specialized in Red Hat OpenShift to strengthen our Compute Operations team and provide Level 3 (L3) expert support for enterprise customers running critical workloads on container and virtualization platforms. This role focuses on day-to-day operations, stability, and continuous improvement of OpenShift-based platforms. The engineer will act as the highest escalation point for complex incidents and problems, support platform lifecycle activities (upgrades, patching, performance tuning), and contribute to platform modernization initiatives including VMware-to-OpenShift virtualization transformation programs. The ideal candidate combines strong troubleshooting skills, deep infrastructure understanding, and hands-on OpenShift expertise, with the ability to work in a structured operational environment (ITIL/ managed services), while also supporting automation and standardization. Responsibilities Level 3 Operations & Technical Escalation (Core Responsibility) Act as the L3 escalation point for complex technical issues related to: Red Hat OpenShift clusters (control plane, worker nodes, networking, storage, authentication) OpenShift Virtualization (KubeVirt) and VM-based workloads hosted on OpenShift Linux OS level issues impacting cluster stability or workloads Own and drive resolution of: Major Incidents (P1/P2) with deep technical investigation and rapid recovery focus Recurring incidents through Problem Management (root cause analysis and permanent fixes) Lead deep troubleshooting activities: cluster degradation, node failures, API instability, etcd performance issues Networking issues (ingress, routes, DNS, CNI, service connectivity) Storage issues (persistent volumes, performance bottlenecks, CSI failures) Workload failures (pods, operators, deployments, stateful applications) Provide clear technical updates during incidents, including impact assessment, recovery plan/workaround, risks and next steps Platform Lifecycle Management (Upgrades, Patching, Stability) Plan and execute OpenShift lifecycle activities such as version upgrades (cluster upgrades and operator upgrades), patching, security hardening and certificate management Validate platform readiness before changes: capacity, compatibility, performance, known issues Maintain high availability and resilience: backup/restore strategy support (including etcd backup practices), disaster recovery readiness and operational runbooks Ensure operational compliance with defined maintenance windows and change governance VMware-to-OpenShift Virtualization Transformation Support Support enterprise modernization initiatives involving migration from traditional virtualization platforms (VMware) to OpenShift Virtualization Contribute to: Migration approach definition and technical design support Workload onboarding, validation, and stabilization on OpenShift Performance tuning and operational model definition for VM-based workloads on OpenShift Ensure production-grade operational readiness: monitoring, alerting, backup, patching and support model aligned with managed services standards Standardization, Automation & Operational Improvement Develop and maintain operational documentation, including troubleshooting guides, standard operating procedures (SOPs), build standards and reference architectures, operational runbooks for recurring tasks Support automation initiatives using tools such as Ansible / Automation Platform (preferred), GitOps practices (ArgoCD) where applicable and scripting (Bash / Python) Proactively identify improvements to increase platform stability, recovery speed (MTT), repeatability and reduction of human error Monitoring, Observability & Performance Management Support and improve observability across the platform, including OpenShift monitoring stack (Prometheus / Alertmanager / Grafana) and log management (EFK / Loki or enterprise logging platforms) Troubleshoot performance issues related to compute resource constraints, scheduling, and resource requests/limits, cluster scaling and capacity planning Collaborate with customer stakeholders and internal teams to define alert thresholds, reduce noise and false positives, and improve operational dashboards and health reporting Security & Compliance Support Operate the platform securely aligned with enterprise expectations: RBAC best practices Integration with enterprise identity providers (LDAP / AD / SSO) Secure cluster configuration and segregation Support vulnerability remediation and platform hardening initiatives Collaborate with Security teams for audits, compliance requests, and evidence collection Qualifications Mandatory Technical Skills : Strong hands-on experience with Red Hat OpenShift administration and operations Strong Linux background (RHEL preferred), including troubleshooting OS performance, services, networking, and storage Solid understanding of Kubernetes fundamentals: pods, deployments, services, ingress, namespaces, RBAC, operators Experience troubleshooting infrastructure-related issues across compute, network, storage, and platform services Experience working in production environments with uptime and SLA commitments Mandatory Professional Skills : Proven ability to operate as Level 3 support, including deep troubleshooting, structured root cause analysis and ownership until resolution Ability to communicate clearly with customers (technical and non-technical stakeholders) and internal teams (L1/L2/architects/project teams) Strong documentation discipline and operational mindset Preferred/ Nice-to-Have Skills : Experience with OpenShift Virtualization (KubeVirt) and VM-based workloads Experience supporting VMware environments and understanding virtualization concepts: vSphere architecture, clusters, HA/DRS, storage/datastores, VM lifecycle Experience with automation tools: Ansible / Red Hat Ansible Automation Platform GitOps tools (ArgoCD) Infrastructure as Code practices Experience with enterprise storage and CSI integrations Experience with enterprise networking topics (DNS, routing, firewall constraints, load balancing) Experience with public cloud OpenShift deployments (optional): ROSA / ARO / OCP on AWS/Azure/GCP Certifications (Preferred) : Red Hat Certified Specialist in OpenShift Administration (preferred) Red Hat Certified Engineer (RHCE) (strong advantage) Kubernetes certifications (CKA/CKAD) (nice to have) Working Model & Operational Expectations Work in an operational environment following ITIL practices (Incident / Problem / Change Management) and managed services delivery model and SLA commitments Participate in on-call rotation, planned maintenance windows and technical escalation duty as required Provide clear handovers and updates to ensure continuity across shifts/regions Diversity, Inclusion & Closing Notes We value plurality and equity and encourage candidates from diverse backgrounds. This role is posted by Stefanini Group, a global tech consulting company of Brazilian origin. If you believe you are a fit, please apply through official channels. Important Advisory We will never ask for payment during the recruitment process. If you suspect a scam, contact recruitmentEMEA@stefanini.com for verification. #J-18808-Ljbffr Stefanini EMEA
...Pipeline Maintenance Technician Maintains and operates pipeline assets including, but not limited to valve manifolds, pig launchers and receivers, and control and pump stations. Responsibilities Essential duties and responsibilities include the following. Other...
Job Title Job Description Here. This is where you will find the detailed information about the job role, responsibilities, and requirements. The content will be formatted in a clean and readable manner, ensuring that all the necessary details are provided without any...
...Bachelors degree in Mechanical Engineering or related field ~3+ years of experience in mechanical design, preferably in automation or manufacturing ~ Proficiency in 3D CAD (SolidWorks preferred) and 2D drafting ~ Experience working with ERP systems is a plus...
...Description Two-day (16 hours) work week The Nurse Practitioner is a critical part of the treatment team and works collaboratively with a Licensed... ...Unencumbered DEA Job Types: Contract, Part-time Ability to commute/relocate: Modesto, CA 95354: Reliably...
...annually. Specifically, you will: Effectively and efficiently manage Production processes to ensure the quality of our product,... ...media samples) Train and verify effectiveness of relevant food safety & quality programs to operators and leadership Oversee the...