SRE for Large-Scale Products Used by 50 Million Monthly Users
Description
Overview
You will be involved in building and operating the infrastructure for multiple products, such as a recipe video platform and a shopping support app, which together form an economic ecosystem. The team develops features to connect these services, launches new domains using accumulated data, and supports the expansion of this ecosystem.
Background
Our goal is to build a full-cycle development system where SREs enable development teams to own operations within their responsibility areas. This approach aims to improve system reliability, accelerate development speed, and raise the engineering level of the entire organization.
With rapid user growth and increasing service complexity, there is a growing need for robust system operation, risk reduction of dependence on individuals, and balancing feature development with operations. The SRE team is currently driving initiatives such as:
- Establishing operational evaluation criteria
- Implementing SRE rotation among feature developers
- Defining and operating Production Readiness Check (PRC)
If you are passionate about improving large-scale service reliability and enhancing organizational engineering capabilities, join our team during this period of transformation!
Responsibilities
- Build and operate containers and servers (mainly ECS Fargate) on AWS
- Set up and manage databases such as Aurora
- Build and operate search systems using Elasticsearch (on Elastic Cloud)
- Build and operate monitoring with Datadog, New Relic, etc.
- Establish incident response structures, identify and resolve operational challenges
- Manage and optimize infrastructure costs
What Makes This Job Attractive
- Handle large-scale, high-traffic services used by 50 million monthly users
- Opportunity to polish skills in scalable system design and operation
- Contribute to infrastructure that supports everyday life
- Feel the impact of supporting services people rely on daily
- Tackle technical challenges across multiple products
- Collaborate with different project teams, impacting reliability and infrastructure broadly
- Help shape team culture and systems as part of the Enabling SRE team
- Contribute to building the operational foundation of the entire organization
- Collaborate beyond development and infrastructure boundaries
- Promote full-cycle development and grow with a broad set of skills
Technology Stack
- Infrastructure: AWS (CloudFront, ALB, ECS, EC2, Aurora, DynamoDB, ElastiCache, S3, Lambda, Athena, CodePipeline, CodeBuild, etc.)
- Middleware: MySQL, Memcached, Nginx, Elasticsearch, Fluentd
- OS/Containers: Linux, ECS, Docker
- Configuration Management: Terraform, Ansible, Packer
- Monitoring: Kibana, CloudWatch, Sentry, Datadog, New Relic
- Languages: Mainly Ruby, with Python/JavaScript for Lambda
Further Reading
Requirements
(Requirements not explicitly provided in the original. Please inquire for specifics.)
Preferred Experiences
- Enjoys using SQL to analyze how developed features are used
- Likes to catch up on and apply the latest tech trends
- Likes organizing or participating in study sessions or coding meetups
- Has spoken at conferences like DroidKaigi
- Recognized as a DATA SUPERHEROES by Snowflake
We are Looking for People Who
- Are interested in building infrastructure services for everyday life
- Proactively implement the latest technologies into their environment
- Want to grow products using their technical expertise
Working Conditions
Salary
- Annual salary: ¥7,000,000 – ¥12,000,000 (including twice-yearly bonus)
- Paid monthly on the 25th
- Bonus and raise opportunities twice per year (based on company and individual performance)
- Monthly salary: ¥503,449 to ¥813,009
- Base salary: from ¥372,892
- Fixed overtime allowance: from ¥130,557 (covers 45 hours; extra paid if exceeded; some management roles excluded)
- Expected annual amount assumes standard performance
Location
- Tokyo Minato-ku 3-1-1, msb Tamachi Station Tower N 23F, 108-0023
Job Type
- Full-time (No fixed term)
Work Hours
Work Style & Office Attendance:
- Flextime system
- Core time: 10:00–16:00
- Flexible: 5:00–10:00, 16:00–22:00
- Attendance
- Engineers/Designers/PdMs: 3 days/week in office, 2 days remote (Wed, Fri)
- Sales/Marketing/Corporate: typically all weekdays in office
- Family-flex and family-remote options available for personal circumstances
- Standard work: avg. 8 hrs/day
- Average overtime: 12 hrs/month (engineers), 17 hrs/month (company-wide)
- Work monitoring system in place
Holidays & Leave:
- 2 days off per week (Sat, Sun) + holidays
- Year-end/new year holiday
- Annual paid leave (3 days at hire + 7 after probation)
- Special leave (bereavement, nursing, menstrual, marriage, etc.)
Probation Period
- 3 months (up to 6 months if extended)
- No change in benefits or conditions
- 10 days paid leave in first year (3 at hire, 7 after probation)
Benefits
Childcare & Family Support
- Maternity/paternity leave
- Parental leave (for any gender, 100% return rate)
- Encouragement for paternity leave
- Nursing leave for children (up to 5 days/year, hour-based leave available)
- Leave for childbirth attendance (up to 2 days)
- Shorter working hours (for childcare/caregiving)
- Babysitter discount tickets
Health & Wellness
- Health checkups
- Vaccine support
- Occupational physician consultations
- Life Bridge system (temporary remote work for family emergencies)
- Wevox for motivation/engagement visibility
- On-site care room
Company Systems
- Full social insurance (employment, workers' comp, health, pension)
- Commuting allowance (up to ¥40,000/month)
- Side jobs possible (with approval)
- Support for engineering/design equipment purchases (up to ¥50,000)
- Company housing benefit
Internal Culture & Communication
- Vision Day (semiannual)
- Year End Party
- Award system (biannual recognition)
- Weekly all-hands meetings
- Division meetings
- AI workshops
- Thanks message card system
- Referral recruiting incentive
- Welcome lunches
- Club activities
Last updated: 2025/04/22