Job Description:
Top Skills:
• Cloud Infrastructure & Automation
o Design and manage scalable systems on platforms GCP.
o Use Infrastructure as Code (IaC) tools such as Terraform.
• Performance & Reliability Engineering
o Experience in capacity planning, performance tuning, and predictive analytics.
o Knowledge of distributed systems and high-availability architectures
• Monitoring & Observability
o Proficiency with APM tools like Dynatrace, New Relic, or AppDynamics.
o Proactive incident detection.
• Programming & Scripting
o Strong coding skills in Python, Go, or Java for automation and reliability improvements.
Experience Required:
Minimum 4+ years of experience in the specific skill set (SRE)
Overall IT experience of 6–8+ years
Job Description
As we expand our customer deployments to build software that improves our customer’s experience, we are seeking an experienced SRE to bring fresh ideas and demonstrate a unique and informed viewpoint to our business. The ideal candidate will be someone who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction.
As an SRE, you will work with leading edge technologies both on-premise and in the cloud. Automation and superior software quality/performance and resiliency will be your mindset. You will be an expert resource in software and operational high-performance design patterns and support different development, architecture and operational teams from start to finish to create scalable and resilient solutions.
Responsibilities
• Support development, architecture and operational teams for performance/capacity related issues associated with complex multi-tier distributed platforms during the SDLC and postproduction.
• Support/coordinate new Build/Run initiatives prior to production and assure product readiness including infrastructure recommendations, software/script development, load/chaos testing, optimization, SLO definition, capacity planning, and observation/alerting.
• Review services, applications and identify bottlenecks. Identify opportunities to improve performance and scale.
• Perform new POCs for newer technologies and architectural patterns to help teams make informed decisions.
• Define new SLOs for services and applications to meet non-functional SLA requirements defined by the business.
• Work to reduce/minimize ongoing runtime costs through efficient throttling/queuing/pooling/autoscaling across application and infrastructure tiers.
• Proactively identify anomalies and opportunities in platforms in production to achieve greater performance/scale and recommend to impacted teams for future planning.
• Define performance quality gates and support canary development CI/CD scenarios around performance for teams.
Required Skills and Qualifications
• Experience supporting/troubleshooting large scale multi-tier distributed on-premise and cloud applications
• Experience architecting, developing and setting up new infrastructure solutions for GCP cloud leveraging terraform/on-premise applications
• Experience in Capacity Planning or Performance Engineering and leveraging predictive analytics to determine needed scaling patterns for platforms
• Experience programming in languages such as Java, NodeJS, Go, Python and JavaScript
• Experience in Web Development and/or Web Service creation
• Demonstrable cross-functional knowledge with systems, storage, networking, security, and databases.
• Experience using APM tools such Dynatrace, New Relic or AppDynamics.
Preferred Qualifications
• Experienced Architect in GCP, Kubernetes, and serverless
• Collaborate with development team to define infrastructure requirements and implement scalable and resilient cloud architecture using terraform.
• Experience in migrating legacy applications to cloud-native architecture
• Strong understanding of Spring Framework
• Experienced in performance tracing/profiling using Google Developer Tools
• Experience with SQL and database scaling/replication schemes
• Familiar with tools used for front end analysis such as Lighthouse, Page Speed Metrics, Webpage Test, GTMetrix and browser developer tools.
• Experience using MongoDB/Atlas, Oracle OCI, Postgres, GCP Cloud SQL
• Experience with AngularJS, React and Vue
• Experience tuning/optimizing runtime environments for Java (JVMs), Nodejs and Python for the best performance
• Experience with DevOps/Quality gating concepts, Canary deployments and automation associated with CI/CD deployments.
• Experience in Enterprise Architecture integration patterns and domain model driven design addressing proper separation of concerns for an application/microservices and core web services.
• Experience using observability tools like Dynatrace or any APM tool is a must.
• Experience using cloud profiling tools and JVM tools like JProfiler/Java Flight Recorder.
• Experience in Testing methodologies and metrics using tools like JMeter, NeoLoad, LoadRunner or other.
• Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and self-drive
• Experience with CI/CD methodologies and having Agile/DevOps mindset
• A passion for automation with a desire to eliminate toil whenever possible
• A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
• Experience using GIT and industry build tools

Raytheon

RUZ USA

Synct

ServiceNow

Centific

Diverse Lynx

Diverse Lynx

Diverse Lynx