Senior Site Reliability Engineer
OracleSg-singaporeUpdate time: August 2,2021
Job Description
Join the Java Management Service site reliability team as we grow the service by rolling out to more regions globally and increasing JMS capabilities, improving JMS's reliability and scalability. The key role is to help the team to build, maintain and operate JMS in the cloud and provide a stable, secure and performant in line with the expectations of an enterprise class cloud service. A unique position where an engieer is able to provide a full application lifecycle and deployment support and automating the process, reducing the toil and ensure a smooth delivery of JMS across the globe.
Key Responsibilities
- Engage in and improve the whole JMS lifecycle of applications deployment and operation
- Improve the existing continuous deployment pipeline for a wide range of functionalities across geographically separated zones
- Improve JMS Observability platform, Security and Incident management to meet the SLAs and SLOs defined for all Oracle cloud services
- Architect highly available and scalable service.
- Skills to troubleshoot and trace symptoms back to the root cause
- Document and present methodologies to operations, engineering, and executive teams
- Educate the wider engineering organization on design and operational best practices for distributed computing
- Helping to meet the SLAs for internal and external services and continual improvement of operational processes (weekly ops meetings, metrics, etc)
- Build tools and automation to improve system's observability, availability, reliability, performance/latency, monitoring, emergency response.
Skillset
- Strong track record of implementing OCI/AWS/GCP/Azure services in a variety of distributed computing environments, with good understanding on Docker, Kubernetes
- Understanding of CNI/CNCF landscape is good to have
- Strong knowledge of runtimes of Storage/RDBMS and No-SQL databases.
- Experience in implementing multi cloud networking and deployment architecture.
- Good understanding of the L3/4/7 network layers (including SDN)
- Hand on design, coding on any one of - Python, Shell, Go or Java.
- Strong debugging/troubleshooting skills.
- Experience on implementing observability platforms using any of products suites like DataDog, NewRelic, ELK, Prometheus preferably using Grafana.
- Strong Experience with infrastructure automation and monitoring tools- Terraform, Helm, Ansible, Puppet, Chef, etc.
- Experience with modern cloud development practices (microservices architectures, REST interfaces, etc. )
- Deep working knowledge on Linux servers and networking prefrably Oracle Linux
As a member of the software engineering division, you will take an active role in the definition and evolution of standard practices and procedures. You will be responsible for defining and developing software for tasks associated with the developing, designing and debugging of software applications or operating systems.
Work is non-routine and very complex, involving the application of advanced technical/business skills in area of specialization. Leading contributor individually and as a team member, providing direction and mentoring to others. BS or MS degree or equivalent experience relevant to functional area. 7 years of software engineering or related experience.
Get email alerts for the latest"Senior Site Reliability Engineer jobs in Sg-singapore"
