Skip to main content
Salvato

Site Reliability Engineer - Retail & Banking Technology domain @ING Hubs Romania



Presenta la candidatura

Discover ING Hubs Romania

ING Hubs Romania offers 130 services in software development, data management, non-financial risk & compliance, audit, and retail operations to 24 ING units worldwide, with the help of over 2000 high-performing engineers, risk, and operations professionals.

We started out in 2015 as ING’s software development hub, then steadily expanded our range to include more services and competencies. Now we provide borderless services with bank-wide capabilities and operate from two locations: Bucharest and Cluj-Napoca

Our tech capabilities remain the core of our business, with more than 1800 colleagues active in Data and Analytics Tech, Tech Foundation and Channels, Retail Core Banking and Architecture, and Global Products and Technology Services. 

We enjoy a flexible way of working and a highly collaborative environment, where fair and constructive feedback is encouraged.  

For us, impact isn't a perk. It's the driver of our work. We are guided and rewarded by a shared desire to make the world a better place, one innovative solution at a time. Our colleagues make it their job to do impactful things and they love doing it in good company. Do you?  

The Mission

ING’s ambition is to be the number one digital banking brand in Europe, offering customers everywhere the same empowering, personalized and differentiating experience. A collaborative, communicative Site Reliability Engineer will change the way we’re working.  

R&BT SRE team

The R&BT Site Reliability Engineering (SRE) team is a multidisciplinary team of senior engineers with proven track records in development and operations across applications and infrastructure. The primary goal is to continuously and structurally improve the reliability and maintainability of the IT environments involved with the R&BT Platforms, delivered and managed from different (international) ING domains.

  • Objective: Site Reliability Engineering (SRE) enhances the reliability and scalability of BTP platform services through collaborative efforts, prioritizing availability, performance, efficiency, and observability.
  • Measurement: SRE targets increased MTBF, decreased MTTR, and minimized operational toil.
  • Approach: This is facilitated by automation, standardized procedures, and the adoption of SRE best practices.
  • Cultivate a Reliability Mindset: The aim is to foster a culture of reliability throughout the BTP organization, encouraging proactive behaviours and attitudes.

Your day to day

  • Ensure Service Level Objective (SLO) levels are set and met;
  • Optimize our Observability tooling like Grafana dashboards;
  • Report on GSRE targets and KPIs;
  • Do yearly Well Architected Reviews and observability Assessments for all critical components;
  • Drive Always Available mindset and behaviour within the R&BT organization. Be able to recognize shortcomings in knowledge and expertise, and deliver the necessary resources, skills, guidance and training to DevOps teams where needed;
  • Define and enhance standards for logging monitoring and alerting, and actively monitor end to end platform performance through white and black box monitoring tools;
  • Improve incident response practices and be actively engaged in incident response of escalated and critical incidents. On call duty is currently not part of the job, but should not be an objection if and when required;
  • Participate in Root Cause Analysis. Prioritize and implement the RCA recommendations through improvement plans with the responsible Squads / DevOps teams;
  • Track and trace actions out of post mortems and Emirs;
  • Drive Continuous improvement on all services in the R&BT Platforms through analysis of the current level of service, functional and technical setup, code, dev/ops practices and the underlying causes of incidents, underperformance, etc.
  • Roll out new resilience features trough the organization;
  • Setting up and maintaining automatic reporting and feedback loops;
  • Contribute to automating Build, Test and Deployment practices through the CI/CD pipeline;
  • Contribute to tuning application resources and updating high available deployment patterns of (mostly) container and VM based environments;
  • Initiate and contribute to new SRE initiatives like AI Ops, Chaos Engineering, migrations to Public Cloud, and Error Budgeting;
  • Participate and initiate experiments with new tools and concepts, and evaluate its value against set goals.

What you’ll bring to the team

Operations expert: 4+ years of experience working using Agile DevOps principles.

Solid understanding how technology setup and ITSM processes relate to service level objectives like Availability (time based, successful call

rate, response times), MTTR, and MTBF.

Good understanding of microservices architecture and related high availability / resilience patterns and experience building systems with multiple layers of redundancy to withstand failures in software, hardware, network infrastructure.

Proven experience:

  • working as a Site Reliability Engineer or DevOps engineer.
  • scripting in at least one of the following: Ruby, Python, Bash, PowerShell.
  • set up Build and Deployment pipelines in Azure DevOps (ADO).
  • set up white-box monitoring and able to formulate meaningful metrics for monitoring and reporting: Grafana, TraceING.
  • eliminate toil through automation and process optimization
  • Able to coordinate/lead incident response and Post mortem / root cause analysis activities.
  • Understanding of IT Service Management processes (ING Global Way of Working) and the way the relate to SRE objectives.
  • God understanding of Public Cloud concepts.

Prior work experience with tools:

  • CI/CD Pipeline: OnePipeline / Azure Devops / Kingsroad.
  • Cloud computing and container orchestration: Linux VM’s and Kubernetes container platforms. Knowledge of OpenShift + AKS and related certifications are a pre.
  • Touchpoint service mesh and SDK/Merak.
  • logging/monitoring/alerting: Kafka, ELK, Prometheus, and IAT. Experience with blackbox monitoring tools like Rigor/Splunk and AI Ops tools like Loom is a pre.
  • Backlog management: Azure Boards
  • ITSM: SNOW

The ideal candidate has:

  • A Bachelor or Master’s degree in computer science or related field.
  • Experience coaching and training DevOps engineers on technical subjects.
  • Previous experience as a consumer of R&BT Platforms, preferably Touchpoint Platform.
  • Understanding of the ING application risk journey

If you want to deep dive into the processing of personal data conducted by ING Hubs Romania during the recruitment process and your rights related to it, read the privacy notices on our website (make sure to scroll until you reach the Data Protection section/ Candidates tab). 

Presenta la candidatura
Your place of work Explore the area

Questions? Just ask
ING Recruitment team

Presenta la candidatura

In ING vogliamo che le persone possano dare il meglio di sé. Per questo, creiamo una cultura inclusiva dove tutti possono crescere e fare la differenza per i nostri clienti e la società. Promuoviamo sempre diversità, uguaglianza e inclusione. Non tolleriamo nessuna forma di discriminazione: per età, genere, identità di genere, cultura, esperienza, religione, razza, disabilità, responsabilità familiari, orientamento sessuale o altro. Se hai bisogno di supporto o un aiuto durante il processo di selezione o colloquio, contatta il reclutatore indicato nell'annuncio. Saremo felici di aiutarti per rendere tutto giusto e accessibile. Clicca qui per scoprire di più sul nostro impegno per diversità e inclusione.

Più per voi

No jobs viewed

No jobs saved

Entra nella nostra Talent Community

Interested In

  • IT Engineering, Bucarest, București, RomaniaRemove
  • IT Engineering, Cluj-Napoca, Cluj, RomaniaRemove

By submitting your information, you acknowledge that you have read our privacy policy and consent to receive email communication from ING.