Site Reliability Engineer – Platform Foundation
Salary: 600 - 1200 百万円
AWSKubernetes
English: Fluent
Minimum year of experience: 3
Sales MarkerSite Reliability Engineer
Contract Type: Full-time Employee
Salary Range: 6,000,000 ~ 12,000,000 JPY
The Team
- Global and Diverse: Product & Engineering team from 24+ countries, with backgrounds at tech leaders including Google, Microsoft, Indeed, Mercari, LINE, Yahoo, and SmartNews.
- Recognized Leadership: Co-founders featured on the Forbes 30 Under 30 Asia List (2023).
- Engineering Culture:
- Customer Obsession: Solve real problems, exceed expectations
- Ownership: End-to-end responsibility
- 10x Mindset: Aim for bold impact, move fast, disrupt old standards
The Role
Join the Common Foundation team to empower product engineering with scalable, reliable, and reusable systems. Focus areas include reliability, performance, security, and developer productivity across cloud infrastructure and Kubernetes. Responsibilities include designing and operating AWS and Kubernetes environments, leading reliability initiatives, and collaborating to embed best practices.
Responsibilities
- Operate and improve the Kubernetes platform (EKS): cluster lifecycle, upgrades, scaling, networking, multi-tenant isolation.
- Design, provision, and manage AWS infrastructure (VPC, RDS/Aurora, OpenSearch, S3, SQS, Lambda, API Gateway, Batch, Glue) focusing on security, reliability, and developer experience.
- Build infrastructure as code with Terraform and AWS CDK. Establish standards for modules, environments, and change management via GitOps.
- Drive observability: metrics, logs, traces, SLOs, error budgets, actionable dashboards and alerts using Datadog.
- Partner with backend engineers to improve service reliability, performance, and cost efficiency; champion best practices in testing, rollout, and production readiness.
- Automate operations and repetitive tasks using tooling and pipelines. Reduce MTTR with better runbooks, diagnostics, and incident tools.
- Lead incident response and post-incident reviews. Raise the operational bar via blameless retros, remediation plans, and reliability roadmaps.
- Strengthen platform security with identity/access control, secrets management, network policies, patching, and vulnerability management.
- Support data workloads and pipelines with robust, scalable infrastructure and monitoring.
- Contribute to documentation, paved paths, and self-service developer workflows.
What We're Looking For
Required
- 3+ years in SRE, Platform, or Infrastructure Engineering with production ownership of cloud-native systems.
- Strong experience running Kubernetes in production (upgrades, scaling, workload reliability).
- Deep hands-on expertise with AWS services (networking, compute, storage, databases, messaging) and secure-by-default architectures.
- Proficiency with IaC (Terraform and/or AWS CDK), modularization, and environment management.
- Solid observability fundamentals: metrics, logging, tracing, SLOs/error budgets, actionable alerting.
- Proven track record of improving reliability, performance, and developer experience in partnership with application teams.
- Experience in incident response and driving post-incident improvements.
Nice to Haves
- Experience with identity and access management patterns, Cognito, JWT, and service-to-service authentication.
- Background in multi-tenant architectures, capacity planning, and cost optimization.
- History of handling major incidents at scale and building tooling to reduce MTTR/MTTD.
- Contributions to internal developer platforms, golden paths, or shared libraries.
- Fluency in English or Japanese.
Tech Stack
Front-end
- TypeScript, React, NextJS
- Testing: Storybook, jest, playwright
- Hosting: Amplify
- Feature flag: Unleash
Server Side / Back-End
- Infrastructure: AWS, EKS, ElasticBeanstalk
- Databases: Aurora, ElasticSearch, Redis
- Languages: Go, TypeScript
- Analytics: Athena, Superset
- Monitoring: DataDog
- Others: AWS Lambda, AWS Batch, AWS API Gateway, AWS Glue, AWS S3
Why Us?
- Fast-growing SaaS startup with strong financial growth
- Opportunity for innovative new product development and to build from scratch
- Leadership and career development opportunities
- Hybrid work environment & fully flexible schedules
- Global team; English-speaking work environment
- Benefits & perks (Resort Worx, purchasing books, free weekly lunch, offsites, etc.)
Working Style
- Hybrid Work: Combination of office and remote work; communication via Zoom, Google Meet, Gather.
- Flex Work: Customizable working hours; business/client-facing schedules arranged for meetings.
- Global Environment: 20+ nationalities; diverse, inclusive, English/Japanese speaking workplace.