Kubernetes/Container Development/Operation Engineer - Cloud Services Department (CLSD)
Salary not provided
RakutenJob Description: Business Overview The Technology Platforms Division (TPD) drives the growth of Rakuten's ecosystem by delivering innovative, high-quality technology platforms characterized by integrated control and strategic partnerships. Within TPD, the Cloud Platform Supervisory Department (CPSD) develops and manages Rakuten's state-of-the-art cloud platform, empowering global scalability and accelerating innovation across its diverse business units. Department Overview The Cloud Services Department (CLSD) at Rakuten Group provides high-quality cloud infrastructure and platform services to application developers across Rakuten. Our mission is to enable secure, scalable, and efficient digital innovation. We deliver key domain services, including compute, storage, core infrastructure components, databases, container platform, observability, and gateway solutions, empowering Rakuten application teams to focus on their core business objectives. Position: Why We Hire The business of Rakuten Group, Inc. is rapidly growing and our private cloud is rapidly growing as well. To support such growth, many interesting and ambitious projects are on going. We’re searching for new members who can enjoy such interesting and ambitious projects. We’re also welcoming those who can propose new ideas which can further support Rakuten Group, Inc.'s growth, with internal/external technologies and flexible mind. Position Details We are seeking a highly skilled and motivated Infrastructure Engineer with a strong background in Kubernetes, container technologies, and Linux systems, coupled with proven software development capabilities. In this role, you will be instrumental in designing, developing, and operating our core infrastructure, ensuring high availability, performance, and security. The ideal candidate will thrive in a fast-paced environment, embrace a "Get Things Done" mindset, and contribute to a culture of operational excellence within a large enterprise setting. Key Responsibilities 1) Operation - Cluster/Node Provisioning - Alert/Incident Handling - OS/Middleware Update - Security Requirement Achievement - Midnight Release, Midnight Monitoring - Operation Manual Creation - Risk Analysis of Production Environment Operation 2) Development - Design/Proposal Doc (Diagram, pros/cons comparison) - Cluster/Node auto provisioning - OS/Middelware auto upgrade - Engineer Self-healing 3) User Support and Migration Support - Support special cases which user support group cannot handle - Support migration from the legacy platform to new private cloud Work Environment - 16 members - Language: Go, Python, Groovy, Shell Script - Infrastructure: Private Cloud (Kubernetes, Baremetal, VM, Container) - Provisioning/Operation: Ansible, multiple inhouse tools written in Go and operator pattern (redhat operator framework, etc.), jenkins - Monitoring: prometheus, cortex, grafana, kibana, Datadog, PagerDuty - CI/CD: Jenkins - Knowledge Tool: Confluence - Project Management: JIRA - Communication Tool: Slack, MS Teams, Viber Mandatory Qualifications: - Certified Kubernetes Administrator (CKA) Holder (Note: If not currently held, successful candidates are required to obtain CKA certification within 3 months of joining) - Strong sense of responsibility to keep the stability of the system, and to output artifacts by deadline - Get things done mind for projects to meet the deadline - Experience of leading projects - Deep understanding and experience of Kubernetes/container/Linux provisioning and trouble shooting - Experience of designing and developing web services (golang/python) - Basic knowledge of networking, TCP/IP - Basic knowledge of distributed system and HA structure - Experience of large scale system operation (100+ servers) - Those who can follow to the strict rules such as document creation and approval process which is mandatory for infrastructure Desired Qualifications: - Participate in open source activities, OSS contributor - Bachelor/Master's degree around computer science, engineering, or related fields - Experience of automation of large scale system operation - Experience of development of middle - large scale application - Experience of multiple monitoring tool (prometheus, cortex, grafana, datadog, newrelic, elasticsearch, kibana, etc.) - Private/public cloud experience #engineer #infrastructureengineer #technologyplatformdiv Languages: English (Overall - 4 - Fluent) In Japanese, Rakuten stands for ‘optimism.’ It means we believe in the future. It’s an understanding that, with the right mind-set, we can make the future better by what we do today. So we challenge ourselves to evolve, innovate and experiment, to create a better, brighter future for everyone. Today, our 70+ businesses span e-commerce, digital content, communications and fintech, bringing the joy of discovery to almost 1.3 billion members across the world. If you have any trouble logging in, please contact us here Rakuten Group, Inc.: rakuten-recruiting-info@mail.rakuten.com Please read the Application Requirements(EN) / 募集要項(JP) before applying. Our Diversity & Inclusion Policy and Application Documents Rakuten’s corporate mission is to “contribute to society by creating value through innovation and entrepreneurship.” We foster a culture that provides equal opportunities to those who share this founding philosophy and take on the challenge to transform society, regardless of age, gender, nationality, or any other status. Diversity is one of Rakuten's core strategies and a driving force for innovation. Because of this, you are not required to submit any of the following information in order to apply for our job positions. - Gender - Age - Photo - Nationality - Information not related to business, such as ideological beliefs, family structure, etc. * For legal compliance, we may ask you about your work eligibility. See the details