By Light HQ

Site Reliability Engineer (SRE)

Job Locations US-Remote
Posted Date 1 week ago(11/6/2024 2:34 PM)
ID
2024-9838
# of Openings
1
Clearance
Tier 4 - High Risk (Public Trust)

Company Overview

By Light Professional IT Services LLC readies warfighters and federal agencies with technology and systems engineered to connect, protect, and prepare individuals and teams for whatever comes next. Headquartered in McLean, VA, By Light supports defense, civilian, and commercial IT customers worldwide.

Position Overview

We are seeking a Site Reliability Engineer (SRE) to join our team supporting My HealtheVet, the U.S. Department of Veterans Affairs’ (VA) online health portal. This role focuses on ensuring the platform’s performance, scalability, and reliability while enhancing automation and infrastructure-as-code practices. The SRE will collaborate with development, operations, and security teams to deploy and maintain robust systems that meet the unique needs of the healthcare industry, with special attention to compliance, data privacy, and integration with Oracle Health (Cerner) systems.

Responsibilities

  • Maintain System Reliability: Oversee availability, reliability, and performance of the My HealtheVet platform, proactively identifying potential issues and ensuring minimal service disruptions.
  • Automation & Infrastructure-as-Code: Develop and maintain CI/CD pipelines using tools such as Jenkins, GitLab CI/CD, or AWS CodePipeline; automate deployment processes to ensure consistency and scalability.
  • Monitoring & Incident Response: Implement and manage monitoring tools (e.g., Prometheus, Grafana, CloudWatch) to detect and resolve incidents quickly; participate in on-call rotation for 24/7 support.
  •  Performance Tuning & Optimization: Conduct regular performance testing and tuning, ensuring the platform meets user demands and operates efficiently.
  • Collaboration: Work closely with development, QA, and security teams to align on deployment strategies and secure development best practices, including healthcare-specific compliance (e.g., HIPAA).
  • Continuous Improvement: Identify and resolve bottlenecks, implement best practices, and contribute to team knowledge sharing to improve overall system stability and team efficiency.
  • Security & Compliance: Enforce security protocols and work with compliance teams to ensure My HealtheVet meets HIPAA, FISMA, and federal security guidelines.

Required Experience/Qualifications

  • Experience in SRE/DevOps: 5+ years of experience as an SRE or DevOps engineer, preferably within a healthcare or government environment.
  • Proficiency in Cloud Platforms: Expertise with AWS (GovCloud experience preferred); familiarity with core AWS services such as EC2, RDS, Lambda, VPC, and IAM.
  • Infrastructure as Code (IaC): Strong experience with tools like Terraform, AWS CloudFormation, or Ansible for managing infrastructure as code.
  • CI/CD Pipelines: Hands-on experience with CI/CD tools (e.g., Jenkins, GitLab CI/CD, AWS CodePipeline) to streamline deployment and reduce time to market.
  • Monitoring & Logging: Skilled in monitoring and logging platforms such as Prometheus, Grafana, ELK stack, CloudWatch, and Splunk.
  • Automation Scripting: Proficiency in scripting languages such as Python, Bash, or PowerShell for automation and process improvement.
  • Networking & Security: Strong understanding of network protocols, VPNs, firewalls, and security practices; experience with compliance frameworks like HIPAA, FISMA, and FedRAMP is a plus.
  • Containerization & Orchestration: Experience with containerization (Docker) and orchestration platforms (Kubernetes, OpenShift) to manage scalable deployments.
  • Problem-Solving Skills: Ability to troubleshoot complex issues in a high-availability production environment.
  • Soft Skills: Strong communication and collaboration skills, especially in cross-functional team settings.

 

Preferred Experience/Qualifications

  • Healthcare IT Standards: Familiarity with HL7 and FHIR standards for data interoperability within healthcare applications.
  • Experience with Oracle Health (Cerner): Knowledge of Oracle Health’s (Cerner) systems and experience with integrations in the healthcare IT environment.
  • Security Certifications: Security certifications like CompTIA Security+, CISSP, or AWS Certified Security Specialist.
  • ITIL Foundation: Knowledge of ITIL practices, especially around incident, change, and problem management.

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed