We are looking for a passionate “Software Reliability Engineer” to join us on our mission to empower marketers to create meaningful experiences.
- Management skills to ensure the operational leadership of Netmera services.
- Determining operational requirements, the needs according to the performance and user experience in the system and making referrals to the relevant teams,
- Making alarm definitions for all our applications and services. To take the necessary measures for the system to run smoothly. Intervention of critical problems 7/24.
- Leading deployment process
- Run the production environment by monitoring availability and taking a holistic view of system health
Incident, Problem Management:
- Troubleshoot, evaluate and resolve application failures/seizures, customer complaints, and commissioning required by the operation team
- Leading postmortem actions, root cause analysis
- To provide a solution to customer complaints at the L2 level
- Administration, installation, maintenance, and configuration of Nginx, Jboss, Kafka, MongoDB, Cassandra, Clickhouse, Mysql, Kubernetes cluster, Jenkins, Harbour, ELK, Prometheus, Grafana, Nagios, Vault
- Design, development, and maintenance of CI/CD pipelines on Jenkins
- Automate routine works with Ansible or Python Scripts
- Periodically software, Linux version upgrade
- 3+ years experience in system reliability engineering, software operations, or in DevOps
- Strong knowledge of Linux ( CentOS )
- Experience with SQL and/or NoSQL databases, MongoDB usage experience is a MUST
- Experience in/knowledge of Cassandra, Clickhouse, and Nginx. Mysql, Jboss, Kafka
- Experience in/knowledge of design, development, and maintenance of CI/CD pipelines
- Experience in Linux operating systems and Python and Bash scripting languages
- Software development experience in Java
- Experience in/knowledge of DevOps tools; Kubernetes, Ansible, ELK, Vault, Harbour, Jenkins
- Experience in/knowledge of monitoring tools; Prometheus, Grafana, Nagios, Glowroot
- Knowledge of scripting languages: Bash, Python, Ruby, Groovy, etc.
- Strong understanding of networking, routing, and security concepts
- Awareness of performance management, CPU/ram/disk optimization problems
- Good understanding of IT security and data security.
- Knowledge of mobile or SaaS environments, and large-scale web applications is beneficial
- Proficiency in English sufficient to follow the literature
- Analytical thinking and detail-oriented
- Have effective verbal and written communication skills
- Open to learning and development, initiative-taker
What you’ll be doing
- Work with teams to design and implement automated code deployment solutions
- Work with teams to design and implement application environments
- Work with teams to design and implement application monitoring and alerting solutions to get issues to the right people at the right time
- Work with teams to diagnose and isolate issues at all layers of the stack, whether it be code or infrastructure, during development and in production
- Remediate issues that impact the health and performance of our development, testing, and production systems and infrastructure (being on call with 7/24 support schedule )
- It is flexible and works from anywhere, we meet in the kitchen, play games and never neglect training!
- Each team member has a monthly fringe benefit package according to their level,
- Whether you work at home, in the office, or wherever you want it!
- Each teammate has an annual training budget!
- Netmera Kitchen,
- We play football on PlayStation etc.