SRE - Site Reliability Engineer

Taipei, Taipei City, Taiwan Contract Allows remote

About the position

TOTVS Labs is looking for a Site Reliability Engineer to work on Carol - our Data & Machine Learning Platform (www.carol.com).

Carol is already being used by thousands of companies in different industries, e.g., Education, Manufacturing, Retail, Healthcare, and Agribusiness. We are bringing Data & Machine Learning for all, come and join us!

Responsibilities:

  • Manage systems security including keys, VPCs, iptables, and more
  • Ensure failover and reliability via snapshots, multi­availability zones, ELB
  • Manage robust monitoring and alerting infrastructure
  • Manage our big data clusters
  • Support several Linux servers running our SaaS platform stack in multiple 24x7 data centers.

Requirements:

  • Experience automating systems administration tasks – BASH/Shell scripting required.
  • Experience with monitoring tools (e.g., Zabbix , Nagios, Prometheus, Collectd)
  • Knowledge of the HTTP protocol, SSL, Web Services
  • Knowledge of TCP/IP protocols
  • Knowledge of load balancer and its configuration
  • Knowledge of DDoS and API protection technologies
  • Experience with Cloud infrastructure platforms, such as AWS, Digital Ocean, Google Cloud, etc.

Bonus:

  • You have experience managing large Elasticsearch clusters
  • You have experience managing Couchbase clusters 
  • You have experience managing Data Pipelines.

Apply for this opening at http://totvslabs.recruiterbox.com/jobs/fk013lm?apply=true