Senior Site Reliability Engineer - Jobs in Sg-singapore

Senior Site Reliability Engineer

OracleSg-singaporeUpdate time: August 2,2021

Job Description

Join the Java Management Service site reliability team as we grow the service by rolling out to more regions globally and increasing JMS capabilities, improving JMS's reliability and scalability. The key role is to help the team to build, maintain and operate JMS in the cloud and provide a stable, secure and performant in line with the expectations of an enterprise class cloud service. A unique position where an engieer is able to provide a full application lifecycle and deployment support and automating the process, reducing the toil and ensure a smooth delivery of JMS across the globe.

Key Responsibilities

Engage in and improve the whole JMS lifecycle of applications deployment and operation
Improve the existing continuous deployment pipeline for a wide range of functionalities across geographically separated zones
Improve JMS Observability platform, Security and Incident management to meet the SLAs and SLOs defined for all Oracle cloud services
Architect highly available and scalable service.
Skills to troubleshoot and trace symptoms back to the root cause
Document and present methodologies to operations, engineering, and executive teams
Educate the wider engineering organization on design and operational best practices for distributed computing
Helping to meet the SLAs for internal and external services and continual improvement of operational processes (weekly ops meetings, metrics, etc)
Build tools and automation to improve system's observability, availability, reliability, performance/latency, monitoring, emergency response.

Skillset

Strong track record of implementing OCI/AWS/GCP/Azure services in a variety of distributed computing environments, with good understanding on Docker, Kubernetes
Understanding of CNI/CNCF landscape is good to have
Strong knowledge of runtimes of Storage/RDBMS and No-SQL databases.
Experience in implementing multi cloud networking and deployment architecture.
Good understanding of the L3/4/7 network layers (including SDN)
Hand on design, coding on any one of - Python, Shell, Go or Java.
Strong debugging/troubleshooting skills.
Experience on implementing observability platforms using any of products suites like DataDog, NewRelic, ELK, Prometheus preferably using Grafana.
Strong Experience with infrastructure automation and monitoring tools- Terraform, Helm, Ansible, Puppet, Chef, etc.
Experience with modern cloud development practices (microservices architectures, REST interfaces, etc. )
Deep working knowledge on Linux servers and networking prefrably Oracle Linux

Design, develop, troubleshoot and debug software programs for databases, applications, tools, networks etc.

As a member of the software engineering division, you will take an active role in the definition and evolution of standard practices and procedures. You will be responsible for defining and developing software for tasks associated with the developing, designing and debugging of software applications or operating systems.

Work is non-routine and very complex, involving the application of advanced technical/business skills in area of specialization. Leading contributor individually and as a team member, providing direction and mentoring to others. BS or MS degree or equivalent experience relevant to functional area. 7 years of software engineering or related experience.

Apply on Company Website See all jobs at Oracle

Get email alerts for the latest"Senior Site Reliability Engineer jobs in Sg-singapore"

You can cancel email alerts at any time.

Send to a friend