Site Reliability Engineering Manager
DellBellevueUpdate time: May 20,2020
Job Description

Are you passionate about operational excellence for large scale platforms? The right person for this role will bring a software engineering perspective to delivering quality operations at scale, driving automation in every aspect of the job. In your work, you will have the opportunity to shape operational excellence of VMware Skyline product.

We're seeking a seasoned Site Reliability Engineering (SRE) Manager to join and manage our creative, passionate and extremely talented SRE team. You will build and maintain highly complex, integrated systems typically crossing between on-prem and cloud. Your efforts will keep the Skyline machine humming smoothly as we strive to scale this to handle growing customer base.

Key Responsibilities 

  • Own end-to-end availability (SLO/SLA), reliability, and performance of Skyline Platform
  • Ensure operational scalability, stability, and quality of Skyline, including establishing 24x7 pager duty coverage within the team for critical services and escalation workflows
  • Monitor, troubleshoot and resolve Production grade issues for SaaS platform and applications 
  • Lead the post-mortem process, managing Root Cause Analysis (RCA) and driving lessons learned back for improvement
  • Build great software, including coding to increase manageability of the platform and automate anything and everything
  • Maintaining a knowledge base of known issues and solutions 
  • Work with cutting edge technology in the cloud to build and maintain CI/CD pipelines for build, deploy, code coverage.  Ability to Install, configure, update and troubleshoot cloud Microservices.
  • Collaborate with Engineering teams, influencing and contributing to product design. establishing requirements for manageability and operations, and ensuring it’s implemented.
  • Establishing and driving operation scale and excellence with metrics.
  • Grow and mentor a team of engineers, building their talents and exposing them to new challenges and experience.

Desired Skills and Experience 

  • Develop tight relationships with software development teams
  • Able to write clear and consumable documentation 
  • Strong communication skills and ability to work effectively across multiple business and technical global teams
  • Foster a healthy and collaborative culture

Qualifications:

  • 10+ years combined experience with both software development and system administration/operations
  • 5+ years leading teams with responsibility for Site Reliability functions of scalable software platforms, with emphasis on a software-driven approach to operations and management
  • Strong experience building, analyzing and troubleshooting scalable distributed systems
  • Hands on technical experience with supporting SaaS based applications
  • Expertise building and deploying software using cloud services, Cloud Foundry and AWS platform including VPCs, ECS/EKS, etc.
  • Expertise with Docker, Kubernetes (or other orchestration tools), and Jenkins
  • Experience with Load and Performance of SaaS applications and platforms.
  • Experience with CI/CD pipeline configuration, deployment, and support
  • Experience leading and collaborating with globally distributed development teams

Category : Engineering and Technology
Subcategory: Product Dev Management
Experience: Manager and Professional
Full Time/ Part Time: Full Time
Posted Date: 2020-05-15



VMware Company Overview: At VMware, we believe that software has the power to unlock new opportunities for people and our planet. We look beyond the barriers of compromise to engineer new ways to make technologies work together seamlessly. Our cloud, mobility, and security software form a flexible, consistent digital foundation for securely delivering the apps, services and experiences that are transforming business innovation around the globe. At the core of what we do are our people who deeply value execution, passion, integrity, customers, and community. Shape what’s possible today at http://careers.vmware.com.

Equal Employment Opportunity Statement: VMware is an Equal Opportunity Employer and Prohibits Discrimination and Harassment of Any Kind: VMware is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment. All employment decisions at VMware are based on business needs, job requirements and individual qualifications, without regard to race, color, religion or belief, national, social or ethnic origin, sex (including pregnancy), age, physical, mental or sensory disability, HIV Status, sexual orientation, gender identity and/or expression, marital, civil union or domestic partnership status, past or present military service, family medical history or genetic information, family or parental status, or any other status protected by the laws or regulations in the locations where we operate. VMware will not tolerate discrimination or harassment based on any of these characteristics. VMware encourages applicants of all ages. Vmware will provide reasonable accommodation to employees who have protected disabilities consistent with local law.

Get email alerts for the latest"Site Reliability Engineering Manager jobs in Bellevue"