Cloud Operations Engineer

Cloud Infrastructure | Toronto, Ontario

Description

Laserfiche is looking for an experienced Cloud Operations Engineer to take responsibility for fine-grained system-wide awareness, expedient issue resolution, and targeting improvements to DevOps change flow for our fast-growing SaaS system. Do you want to work in a growing organization with opportunities for rapid advancement, all while delivering software that’s used—and loved—by more than 34,000 organizations around the world?

As part of our 24/7 Cloud Operations team, you will maintain our cloud system, respond to incidents, and serve as a program resource for driving improvements to system availability, scalability, reliability, and security for our SaaS cloud service running on AWS infrastructure. You will participate in handling incidents and support issues to achieve timely resolution and assist in the post-incident analysis to prevent reoccurrence and improve future responses.

Cloud Operations Engineers advocate enhancements to our DevOps continuous integration flow that will improve the reliability of new functionality arriving in production. You will establish and maintain contacts with all software development teams that contribute to the product and platform. You will be a key representative for reporting out system health trends, incidents and planned reliability improvements.

Responsibilities Include: 

  • Perform continuous monitoring and regular execution of maintenance tasks in pursuit of service level objectives
  • Collaborate with engineering teams to refine in-application diagnostics and consolidate data across services
  • Check diagnostics and file issues as appropriate while performing routine system performance, integrity, and availability monitoring
  • Participate in the identification, troubleshooting, mitigation, and documentation of any incidents which may arise during system operation
  • Facilitate timely and relevant communication of incidents to affected stakeholders
  • Engage in the post-incident analysis to examine incident response performance, identify areas for service improvement, and track improvements to resolution
  • Provide justification and assist prioritization of system monitoring and health improvements
  • Assist with the creation, revision, curation, and documentation of service operation standards

What You'll Need:

  • 4-year degree (BA, BS) in a STEM field
  • 2 – 5 years of experience at managing mission-critical auto-scaling cloud-based n-tier web applications, distributed for high availability
  • Preferably experienced and certified at working with AWS
  • Experienced in systems troubleshooting, including database systems, TCP/IP-based networking, web applications, and Windows and Linux system administration
  • Skilled at scripting/programming, preferably in one or more of PowerShell, Bash, Python, C#, JavaScript, and Ruby.
  • 2 – 5 years of experience interacting with customers in technical support or helpdesk
  • Exceptional problem-solving and analytical skills
  • Ability to learn quickly and adapt to changing environments
  • Professional, effective communicator both in person and in writing while maintaining a collaborative attitude
  • Motivated and customer- and quality-focused
  • Experienced in building cross-functional relationships
  • Must be available after hours based on a rotating schedule to perform periodic system checks and to be on-call for intervention and incident response
  • Expected work schedule: 5 AM EST - 2 PM EST

Click here to learn more about Laserfiche

Share