General Information

Full Name Shivam Khandelwal
Date of Birth 25th July 1994
Languages English, Hindi


  • Programming Languages: Python, Bash, Javascript, Ruby
  • Orchestration: Kubernetes, Virtuozzo, Docker
  • Technologies: Grafana, Argo, ElasticSearch, Kafka, PyTorch, Flask


  • Oct 2019 - Present
    AIcrowd, CTO
    Worked in a community driven MLOps startup with the responsibilities ranging from individual contributions, managing the technical team to handling external clients and community members.
    • Infrastructure: Responsible for architecting and managing company-wide deployments and vendor neutral infrastructure, which was scaled to work for 0 to 60k+ users over time.
      • CI Pipelines: To run and evaluate ML models performance by automating using GitLab CI, Argo and Helm charts. These pipelines were triggered 200k times along with 4TB+ artifacts hosting.
      • Kubernetes: Managed multiple clusters deployed using kOps, and Azure managed kubernetes service.
      • Observability: Grafana dashboards and GitLab status pages using Grafana, Loki, and Prometheus for business usecases, infrastructure debugging and self-service portals for community members.
    • Benchmark Competitions: Involved in hosting multiple ML benchmark competitions like MineRL, Airborne Object Tracking, Music Demixing, etc. This involved developing easy-to-use starter kits to onboard community members, dynamic docker runtime creation service, and community management.
    • Products: Developed multiple products and POCs using Flask, Rails, and JS frameworks. The usecases include visualizing ML models performance, human-in-the-loop model evaluations, and hosting ML models.
  • Jul 2018 – Sep 2019
    Tower Research Capital, Core Engineering Software Developer
    Worked with HFT firm in the cloud services team and observability component with responsibility to manage ELK infrastructure and help other teams in better visibility into their cloud workloads.
    • ELK Stack: Deployed and managed the infrastructure to collect logs from company wide cloud devices and selected onprem devices via filebeat, logstash and syslog. Worked to automate ELK deployments and provide self service deployment using GitLab CI, Strimzi Kafka k8s operator and Helm.
    • Alerts: Implemented in-house aggregation and dashboarding of alerts using Alertmanager and Alerta.
    • Calico: Worked closely with the network team for multiple stability fixes in the onprem k8s clusters.


  • 2013 - 2018
    (pending thesis)
    B.Tech. (Honors) and M.S. by Research in CSE
    International Institute of Information Technology, Hyderabad
    • CGPA: 8.02/10.0
    • Teaching Assistant:
      Software Engineering (SSAD), IT Workshop
    • Selected Coursework:
      Advanced Computer Networks, Database Systems, Usability Engineering, Software Engineering


  • Jun 2017 – Sep 2017
    Facebook (now Meta), Production Engineering Intern
    Worked with the MySQL Infrastructure team on the continuous backup and restore system to scale the deployment to multiple pools having different MySQL versions, configurations, etc.
  • Aug 2016 – Jun 2018
    IIIT Hyderabad, Student Systems Administrator
    Moved infrastructure (including Zimbra mail servers) to high availability setups using DRBD disks and Pacemaker, fail2ban based applications, institute-wide web proxy, LDAP, etc.
  • May 2016 – Aug 2016
    Berkman Klein, Harvard University, GSoC
    Built HTTP framework upon TimeGate implementation adhering to RFC7089 guidelines for Time-Based Access to the Resource States for distributed and resilient version of Amber.
  • May 2016 – Jul 2016
    Tata Research Development and Design Centre
    Conducted Systematic Literature Review and Systematic Mapping Study on Domain Specific Languages domain. Analysed over 1000 candidate research papers.
  • Dec 2014 – May 2017
    CodeChef, Software Engineering Intern
    Worked on improving cache layer over legacy code-base and scaling the architecture horizontally (involving twemproxy). Developed Ember.js components for online IDE (Code, Compile & Run) & commenting system (similar to Disqus).


  • MLOps, Infrastructure, Software Engineering