Senior SRE

Оплата не указана

Вакансия находится в архиве

Quadcode

г. Москва

Требуемый опыт работы

От 3 до 6 лет

Тип занятости

Полная занятость

График работы

Полный день

About team

We are Quadcode, a fintech company excelling in financial brokerage activities and delivering advanced financial products to our global clientele. Our flagship product, an internal trading platform, is offered as a Software-as-a-Service (SaaS) solution to other brokers.

We are currently looking for Senior SRE to join our Service Desk team. The main area of responsibility is the team's oversight of ITSM processes within the company. They are involved in the development of monitoring tools and also monitor the status of our product.

The team has 5 System Engineers, 2 Technical Support Specialists, and a Team Leader.

We have more than 600 servers and more than 2000 virtual machines. We have as own infrastructure, as private and public clouds (Openstack, AWS, GCP, DO), and bare metal. Our trading platform has more than 80 million users.

Working with Agile, Scrum (1–2-week sprints, grooming, planning, retrospective), and SAFe framework.

Daily scrum standups are held at 12:30 pm EET (Cyprus time zone), engaging in peer code reviews, and using collaboration tools like Slack, Google Meet, and Zoom.

You will be responsible for building ITSM processes and applying their experience as a Site Reliability Engineer. You will have interactions with the product teams and IT Operations branch (Software Development, Infrastructure etc.).

Tech stack

  • OS: Linux Ubuntu;
  • Web server: Nginx;
  • Monitoring: Grafana, Prometheus, Graylog, Jaeger;
  • CI/CD: Jenkins, Git, Gitlab, Docker;
  • Automation: Python, Bash;
  • SCM: Ansible, Chef;
  • IaC: Terraform. Pulumi;
  • DB: PostgreSQL, Redis, Keydb, MySQL;
  • Cloud: Openstack, AWS, GCP, DO.

Examples of first tasks in the role

  • Review processes, platform and infrastructure;
  • Implementation of Grafana OnCall;
  • Review and rework ITSM processes if needed.

Tasks

  • Identification of bottlenecks and preparation of recommendations to improve the reliability of services;
  • Responding to platform emergencies, localizing and resolving the causes of failures, compiling postmortem reports;
  • Development of monitoring and alerting tools ensuring high availability and quick detection of potential issues: (Grafana, Grafana OnCall, Prometheus Alert manager, etc.);
  • Active participation in change management processes, including assessment and coordination of changes to the infrastructure within Change Advisory Board (CAB) sessions;
  • Implementation and support of ITSM processes to optimize team workflow and enhance service quality.
  • Development and maintenance of documentation in an up-to-date state.

Requirements

  • 3+ years of experience in SRE/DevOps;
  • Understanding of SRE principles, practical experience in implementing SRE practices;
  • Understanding of principles and practical experience in building resilient systems;
  • Experience with monitoring and logging systems (Prometheus, Graylog, Grafana);
  • Experience with automation tools for software build and deployment (CI/CD): GitLab, Jenkins;
  • Understanding of virtualization and containerization principles;
  • Understanding of Infrastructure as Code (IaC) approaches and experience;
  • Proficiency in a programming language for automation script development (Python, Nodejs, Golang, etc.), ability to understand service code;
  • Understanding of network protocols, topologies, and network models;
  • Experience with configuration management tools: Ansible, Chef;
  • Basic experience with relational databases, such as PostgreSQL;
  • Experience in administering Linux operating systems;
  • English B1 minimum.

Nice to have

  • Experience in implementing monitoring and logging systems from scratch;
  • Experience with k8s, Openstack;
  • Advanced programming skills in any language.

We offer

  • Full-time remote work as a Service Provider in the following countries: Bulgaria, Georgia, Belarus, Hungary, Romania, Latvia, Lithuania, Moldova, Azerbaijan, Armenia, Kyrgyzstan, Uzbekistan, Greece, Croatia, Montenegro, Serbia, Kazakhstan, Slovenia, Russia, Cyprus or Estonia (a valid residence permit is required);
  • Competitive remuneration;
  • Professional courses;
  • Friendly, enjoyable, and positive environment.

Currently, over 700 employees and service providers are stationed across our seven global offices located in the UK, Gibraltar, the UAE, the Bahamas, Australia, and the headquarters in Cyprus. By broadening its international presence, Quadcode not only offers a remote or hybrid work model but also presents a myriad of intriguing tasks and challenges for professionals like developers, market research analysts, and PR marketing specialists, among others.

Note: All applications will be treated with strict confidence. We thank all applicants for their interest, however, only those candidates selected for interviews will be contacted.

Ключевые навыки

Linux
Sre
Incindent management
Grafana
Prometheus
Docker
Kubernetes(k8s) - start learn
Ansible
Terraform
Postgresql
Python
Golang

Контактная информация

Quadcode

Сайт: quadcode.com

Почта: не указана

Вакансия опубликована 09.08.2024 в г. Москва.

Похожие вакансии

#

Москва

Полный день

Подробное описание

22 октября

#

Москва

Удаленная работа

Подробное описание

15 мая

#

Не указана

Москва

Дмитровская

Удаленная работа

Подробное описание

6 июня