Site Reliability Engineer, Data Platform

Overview
Job Description

Our Krakenites are a world-class team with crypto conviction, united by our desire to discover and unlock the potential of crypto and blockchain technology.

What makes us different?

Kraken is a mission-focused company rooted in crypto values. As a Krakenite, you’ll join us on our mission to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. For over a decade, Kraken’s focus on our mission and crypto ethos has attracted many of the most talented crypto experts in the world.

As a fully remote company, we have Krakenites in 60+ countries who speak over 50 languages. Krakenites are industry pioneers who develop premium crypto products for experienced traders, institutions, and newcomers to the space. Kraken is committed to industry-leading security, crypto education, and world-class client support through our products like Kraken Pro, Kraken NFT, and Kraken Futures.

The team:

Join our Data Infrastructure team and play a pivotal role in upholding the reliability, scalability, and efficiency of our robust Data platform. As a Senior Site Reliability Engineer (SRE) specialized in Data Infrastructure, you will collaborate closely with diverse cross-functional teams to conceive, execute, and oversee the foundational data infrastructure that empowers our array of applications and services. As a key member of our Data Infrastructure team, you will be at the forefront of ensuring the unfaltering availability and performance of our platform. Your profound proficiency in cloud technologies, infrastructure as code, automation, monitoring/alerting, logging, user and machine AuthNZ, and certificate management will be instrumental in upholding the exceptional operational standards we set for our services.

This role is destined to candidates based in the Americas.

The Opportunity:

  • Architect and implement data infrastructure solutions (self service)  that support the needs of 10+ business units and over 100 engineering and data analysts
  • Utilize Infrastructure as Code (IaC) principles to design, provision, and manage both on-premises and cloud (AWS) infrastructure components using tools such as Terraform
  • Collaborate with teams to ensure seamless integration of data-related services with existing systems.
  • Develop and maintain automation scripts using bash/shell scripting and to automate operational tasks and deployments.
  • Enhance and manage CI/CD pipelines to facilitate consistent software deployments across the data infrastructure.
  • Enable engineering self-service under tight security requirements using ChatOps and GitOps methodologies
  • Implement robust data monitoring and alerting solutions to proactively detect anomalies and performance issues.
  • Manage user and machine authentication and authorization mechanisms to ensure secure access to data and resources.
  • Evangelize and implement role-based access control (RBAC) and permissions for a multitude of user groups and machine workflows across different environments
  • Design and deploy MLOps platforms, using AWS Sagemaker and GitOps methodologies.
  • Manage and maintain real-time streaming data architecture using technologies like Kafka and Debezium Change Data Capture (CDC).
  • Ensure the timely and accurate processing of streaming data, enabling data analysts and engineers to gain insights from up-to-date information.
  • Utilize Kubernetes to manage containerized applications within the data infrastructure, ensuring efficient deployment, scaling, and orchestration.
  • Implement effective incident response procedures and participate in on-call rotations.
  • Troubleshoot and resolve incidents promptly to minimize downtime and impact.
  • Collaborate with data analysts, engineers, and cross-functional teams to understand requirements and implement appropriate solutions.
  • Document architecture, processes, and best practices to enable knowledge sharing and support continuous improvement.
  • Enable environments for ML experimentation
  • Create and manage MLOps flows for training, validation and deployment of models
  • Implement efficient, reproducible production deployment of ML models for inference

Skills you should HODL:

  • Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
  • Proven experience (5+ years) working as a Site Reliability Engineer, Infrastructure Engineer, or similar roles, with a focus on data infrastructure and security.
  • Experience with real-time data processing technologies, such as Kafka and Debezium
  • Strong expertise in cloud technologies, particularly AWS and (HashiCorp nice to have).
  • Proficiency in Infrastructure as Code tools such as Terraform and Atlantis.
  • Experience with containerization and orchestration tools, particularly Kubernetes.
  • Solid understanding of bash/shell scripting and proficiency in at least one programming language.
  • Familiarity with CI/CD deployment pipelines and related tools.
  • Knowledge of HashiCorp products like Vault, Nomad, and Consul is a plus.
  • Strong problem-solving skills and the ability to troubleshoot complex systems.
  • Expertise in zero-trust architecture and service meshes is a plus
  • Experience with data-related technologies (databases, airflow, data warehousing, data lakes) is a plus.

Kraken is powered by people from around the world and we celebrate all Krakenites for their diverse talents, backgrounds, contributions and unique perspectives. We hire strictly based on merit, meaning we seek out the candidates with the right abilities, knowledge, and skills considered the most suitable for the job. We encourage you to apply for roles where you don’t fully meet the listed requirements, especially if you’re passionate or knowledgable about crypto!

As an equal opportunity employer, we don’t tolerate discrimination or harassment of any kind. Whether that’s based on race, ethnicity, age, gender identity, citizenship, religion, sexual orientation, disability, pregnancy, veteran status or any other protected characteristic as outlined by federal, state or local laws.

Receive jobs from , on your whatsapp

Stay up to date with job opportunities, directly on your WhatsApp!
Receive instant notifications about the latest job openings in your region

Anúncio

Related Jobs

Related Jobs to Site Reliability Engineer, Data Platform

Freelance English Transcriptionist (AMER/EMEA)
273 days ago

We are seeking a skilled and experienced Freelance Transcriptionist to join our team. As a Freelance Transcrip...

Technical Writer - Security
278 days ago

We are looking for a Technical Writer to join our tech client’s commercial enterprise services team and wr...

Manager, Communications - People Team
296 days ago

We are seeking a dynamic and experienced Communication Manager to join our team and lead all internal team mem...

Senior or Staff Frontend Engineer - React
295 days ago

We’re looking for an experienced full-time (or part-time) Frontend Software Engineer to join our engin...

Clinical Care Navigator
296 days ago

Lyra is transforming mental health care through technology with a human touch to help people feel emotional...

HR Business Partner
286 days ago

Headway’s mission is a big one – to build a new mental health care system everyone can access. We’v...

Healthcare Customer Service Representative
283 days ago

We’re obsessed with growth. From enabling companies to flourish, to helping careers bloom. SupportNinja wa...

Business Analyst
262 days ago

Piper Companies is seeking a Business Analyst to join a global investment company out of Wayne, PA. This...

Account Executive
280 days ago

Adentro was created to solve one of the most important problems in the modern economy—the majority of consu...

Customer Experience and Insights Manager, Enterprise
284 days ago

Upwork ($UPWK) is the world’s work marketplace. We serve everyone from one-person startups to large, Fortune...

Software Engineer, Platform
297 days ago

Branch is on a mission to help working Americans grow financially. We do this by helping companies accelerate ...

Senior Product Manager
279 days ago

Are you passionate about building products from scratch? Are you ready to manage a product that influences h...

Back-end Developer (Node.js)
270 days ago

The IT сompany Andersen invites a Back-end Developer (Node.js) to work on a large-scale project for our USA ...

Senior CRM Manager
267 days ago

Discord is about giving people the power to create space to find belonging in their lives. We want to make it ...

Account Executive
279 days ago

We believe time is our most precious resource and our mission is to help leaders shift their time from things ...

Account Executive, Enterprise
291 days ago

We are looking for an Account Executive, Enterprise to join our dynamic team of creatives, engineers, market...

Customer Support Frontline Specialist
265 days ago

We are Semrush, a global IT company developing our own product – a platform for digital marketers. New sta...

Principal SEO
222 days ago

We are looking for a Principal SEO to guide the strategy for the company’s organic growth. This role involve...

Consumer Support Specialist
Consumer Support Specialist
236 days ago

At PNC, our people are our greatest differentiator and competitive advantage in the markets we serve. We are a...

Account Executive, Mid Market
205 days ago

We are looking for a seasoned Mid-Market Account Executive to continue driving LeadIQ’s suite of capabilitie...

Enterprise Account Executive
296 days ago

From the very beginning, SugarCRM had a unique vision: to offer a different kind of Customer Relationship Man...

Software Developer
Software Developer
240 days ago

At PNC, our people are our greatest differentiator and competitive advantage in the markets we serve. We are a...

Principal Site Reliability Engineer
296 days ago

Hi, we’re DuckDuckGo, the Internet privacy company for everyone who wants to take back their privacy now. Fo...

Software Engineer
296 days ago

Argyle is a fast-growing, remote-first Series B startup solving a systemic data problem. Underneath the consum...