Monitor global infrastructure using Datadog and SolarWinds, triage and resolve L1 incidents, escalate complex issues, participate in incident response and post-incident reviews, maintain SOPs and ServiceNow tickets, support automation with basic scripting, and provide weekend on-call coverage.
JOB DESCRIPTION
- Monitor Sysco’s global infrastructure and systems using tools such as Datadog, SolarWinds, and other enterprise monitoring platforms.
- Detect, triage, and respond to incidents proactively before customer or business impact.
- Independently resolve:
- Server performance issues
- Monitoring agent issues
- Basic infrastructure and system alerts
- Escalate major incidents, complex infrastructure issues, and application-related incidents to L2/L3 teams in line with SOPs and SLAs.
- Ensure initial response and resolution targets are met for all priority levels.
- Participate in incident bridge calls and coordinate with internal and external stakeholders.
- Perform initial investigations and document findings to support faster resolution.
- Contribute to post-incident reviews and root cause analysis, including analysis via Datadog Watchdog.
- Follow and execute Standard Operating Procedures (SOPs) for known incidents.
- Maintain accurate documentation and ticket updates in ServiceNow.
- Support initiatives to improve First-Time Resolution (FTR) and reduce MTTR.
- Contribute to project-level operational improvements and initiatives tracked in Jira.
- Apply basic scripting or automation knowledge where applicable to support monitoring improvements and operational efficiency.
- Actively participate in knowledge sharing and continuous learning initiatives.
- Standard shift: Monday to Friday, from 10:30 AM to 7:30 PM CST
- Weekend on-call coverage required (one day per weekend, 10:30 AM – 7:30 PM CST; monthly shift rotation defined based on business needs, with prior notification provided by the team manager).
- Bachelor’s degree in Information Technology or equivalent experience.
- 2 years of experience in Operations Engineering, NOC, SRE, or similar roles.
- Strong understanding of:
- Windows Server and/or UNIX/Linux environments
- Networking fundamentals (LAN/WAN, TCP/IP, DHCP, firewalls, routing)
- Experience with an enterprise ticketing tool (e.g., ServiceNow,Jira).
- Strong communication skills in English and ability to work under pressure.
- Willingness to work in a Weekend on-call coverage required
- Excellent communication skills in English (B2+ or higher) and ability to collaborate across functions and geographies.
- Experience with Datadog, SolarWinds, or similar monitoring platforms.
- Exposure to AWS, Azure, or GCP.
- Familiarity with Jira for tracking initiatives and projects.
- ITIL certification or hands-on experience with ITIL practices.
- Basic scripting or automation knowledge (e.g., PowerShell, Bash, Python).
Benefits:
- This is a hybrid position based in Ultra Park II, Lagunilla (Heredia). On-site presence is required only when necessary, such as for meetings, trainings, or collaborative activities, in alignment with the company’s telework agreement, which currently requires employees to work on-site three (3) days per week)
- Private Medical Insurance
- Asociacion Solidarista
- Life Insurance
- Personal Day Off
Note: Only candidates with Costa Rican nationality or valid immigration status will be considered; applicants residing outside Costa Rica will not be considered, and relocation is not available
Similar Jobs
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Lead production database troubleshooting and performance tuning across multi-tenant PostgreSQL/MariaDB fleets. Own incident RCAs, develop observability and automation, influence infrastructure decisions, mentor cross-functional engineers, and support large-scale web distributed applications and Linux-based environments in cloud and on-prem deployments.
Top Skills:
AnsibleApacheCi/CdCloud InfrastructureContainersJavaScriptJbossKubernetesLinuxMariadbPaasPostgresPythonSaaSServicenowShell ScriptingTomcatWeblogicWebsphere
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Senior Software Engineer will manage data systems, develop scalable pipelines, ensure data security, and build self-service applications for users at Coinbase.
Top Skills:
AirflowGoJavaKafkaPythonSparkSQL
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Lead design and delivery of backend risk systems to detect and prevent fraud, manage credit and market risk, and protect users. Drive architecture for distributed, high-availability services, partner with Data Science/ML and product teams, build AI-native detection and response systems, mentor engineers, own operational excellence, and lead incident response and post-mortems.
Top Skills:
Event-Driven ArchitectureGenerative AiGoGraphQLJavaMicroservicesPythonRest ApisRuby
What you need to know about the Calgary Tech Scene
Employees can spend up to one-third of their life at work, so choosing the right company is crucial, not just for the job itself but for the company culture as well. While startups often offer dynamic culture and growth opportunities, large corporations provide benefits like career development and networking, especially appealing to recent graduates. Fortunately, Calgary stands out as a hub for both, recognized as one of Startup Genome's Top 100 Emerging Ecosystems, while also playing host to a number of multinational enterprises. In Calgary, job seekers can find a wide range of opportunities.


.png)