Site Reliability Engineer – Platform Foundation

Salary: 600 - 1200 百万円

KubernetesAWS

English: Fluent

Minimum year of experience: 3

Sales Marker

Site Reliability Engineer

Contract Type: Full-time Employee
Salary Range: 6,000,000 ~ 12,000,000 JPY

The Team

Global and Diverse: Product & Engineering team from 24+ countries, with backgrounds at tech leaders including Google, Microsoft, Indeed, Mercari, LINE, Yahoo, and SmartNews.
Recognized Leadership: Co-founders featured on the Forbes 30 Under 30 Asia List (2023).
Engineering Culture:
- Customer Obsession: Solve real problems, exceed expectations
- Ownership: End-to-end responsibility
- 10x Mindset: Aim for bold impact, move fast, disrupt old standards

The Role

Join the Common Foundation team to empower product engineering with scalable, reliable, and reusable systems. Focus areas include reliability, performance, security, and developer productivity across cloud infrastructure and Kubernetes. Responsibilities include designing and operating AWS and Kubernetes environments, leading reliability initiatives, and collaborating to embed best practices.

Responsibilities

Operate and improve the Kubernetes platform (EKS): cluster lifecycle, upgrades, scaling, networking, multi-tenant isolation.
Design, provision, and manage AWS infrastructure (VPC, RDS/Aurora, OpenSearch, S3, SQS, Lambda, API Gateway, Batch, Glue) focusing on security, reliability, and developer experience.
Build infrastructure as code with Terraform and AWS CDK. Establish standards for modules, environments, and change management via GitOps.
Drive observability: metrics, logs, traces, SLOs, error budgets, actionable dashboards and alerts using Datadog.
Partner with backend engineers to improve service reliability, performance, and cost efficiency; champion best practices in testing, rollout, and production readiness.
Automate operations and repetitive tasks using tooling and pipelines. Reduce MTTR with better runbooks, diagnostics, and incident tools.
Lead incident response and post-incident reviews. Raise the operational bar via blameless retros, remediation plans, and reliability roadmaps.
Strengthen platform security with identity/access control, secrets management, network policies, patching, and vulnerability management.
Support data workloads and pipelines with robust, scalable infrastructure and monitoring.
Contribute to documentation, paved paths, and self-service developer workflows.

What We're Looking For

Required

3+ years in SRE, Platform, or Infrastructure Engineering with production ownership of cloud-native systems.
Strong experience running Kubernetes in production (upgrades, scaling, workload reliability).
Deep hands-on expertise with AWS services (networking, compute, storage, databases, messaging) and secure-by-default architectures.
Proficiency with IaC (Terraform and/or AWS CDK), modularization, and environment management.
Solid observability fundamentals: metrics, logging, tracing, SLOs/error budgets, actionable alerting.
Proven track record of improving reliability, performance, and developer experience in partnership with application teams.
Experience in incident response and driving post-incident improvements.

Nice to Haves

Experience with identity and access management patterns, Cognito, JWT, and service-to-service authentication.
Background in multi-tenant architectures, capacity planning, and cost optimization.
History of handling major incidents at scale and building tooling to reduce MTTR/MTTD.
Contributions to internal developer platforms, golden paths, or shared libraries.
Fluency in English or Japanese.

Tech Stack

Front-end

TypeScript, React, NextJS
Testing: Storybook, jest, playwright
Hosting: Amplify
Feature flag: Unleash

Server Side / Back-End

Infrastructure: AWS, EKS, ElasticBeanstalk
Databases: Aurora, ElasticSearch, Redis
Languages: Go, TypeScript
Analytics: Athena, Superset
Monitoring: DataDog
Others: AWS Lambda, AWS Batch, AWS API Gateway, AWS Glue, AWS S3

Why Us?

Fast-growing SaaS startup with strong financial growth
Opportunity for innovative new product development and to build from scratch
Leadership and career development opportunities
Hybrid work environment & fully flexible schedules
Global team; English-speaking work environment
Benefits & perks (Resort Worx, purchasing books, free weekly lunch, offsites, etc.)

Working Style

Hybrid Work: Combination of office and remote work; communication via Zoom, Google Meet, Gather.
Flex Work: Customizable working hours; business/client-facing schedules arranged for meetings.
Global Environment: 20+ nationalities; diverse, inclusive, English/Japanese speaking workplace.

Learn More

reco経由で応募する