Skip to content

About

I am a Machine Learning/AI Engineer. I started Scaled Focus to help companies evaluate, optimize and deploy agents.

I have written a book on Natural Language Processing for software engineers.

I have also contributed to Ragas, built FastEmbed and maintained Awesome NLP.

Book

Book: NLP in Python: Quickstart Guide has sold over 5000 copies.

I wrote this book in 2018 to make Natural Language Programming more accessible for software engineers and programmers. The accompanying code is available on Github.

Papers and Open Source Contributions

  1. Ragas - We helped improve the most widely used Agent Metrics layer. My contributions are here.

  2. FastEmbed - I built the first version of FastEmbed, an embedding library built for speed using ONNX. It is used by Qdrant, NVIDIA Nemo Guardrails and 3000+ more respositories alone!.

  3. OpenAI Cookbook: Fine-tuned Retrieval Augmented Generation with Qdrant

  4. Hinglish: github, paper focussed on code-mixed languages was published in ACL 2019.

  5. Awesome Project Ideas - Curated list of machine learning (mostly deep learning) project ideas with datasets. These ideas range from Vision, Text, Forecasting to Recommender Systems

  6. Awesome NLP Curated list of Natural Language Processing Resources.

    • Recommended by Dr. Andrew Ng's (Stanford) CS 230
    • Featured in Github's Official Machine Learning Collection since 2016 and
  7. 2018: State of the Art Language Modeling in Hindi + new datasets, check the code here at hindi2vec

Recognitions

  1. Analytics Vidhya Magazine named me one of the Top 5 GenAI Scientists in India in 2023.
  2. Won the First ever NLP themed Kaggle Kernel Prize (2019): The Hitchhiker's Guide to NLP in spaCy
  3. My Jupyter Notebook best practices were appreciated by Nobel Laureate Dr. Paul Romer in 2018 (link).

Talks

  1. Fifth Elephant MLOps Conf 2021: Slides
  2. PyCon India 2019: Slides and Youtube
  3. inMobi Tech Talks: A Nightmare on the LM Street; Slides
  4. Wingify DevFest: NLP for Indian Languages; Slides, Youtube
  5. PyData Bengaluru Inaugral Talk: Quiz Generation with spaCy; Youtube
See more recognitions, competitions, and mentions
  1. Search and Informational Retrieval Ranking Challenge hosted by Bing AI Team (2019)
  2. FactorDaily's piece on [The great rush to data sciences in India](https://factordaily.com/rush-training-data-science-machine-learning-ai-india/) ends with a direct quote from me. * FactorDaily is a new age news company which sits at the intersection of technology with life, culture and society in India.
  3. First Runner's Up at the Future Group Datathon (March 2019) * Two stage Machine Learning hackathon called [Tathastu](https://www.tathastu.ai/datathon), working on recommendation systems and item information extraction problems
  4. Opened AI Hackathon (2019) * Awesome NCERT won the Best use of IBM Watson API; [blog](https://medium.com/opened-ai/global-hackweek-winners-2017-a9e5da513270) * Idea: Find recent+relevant news articles against any NCERT chapter in sciences and social studies