Hello, I am Vasanth

Senior Software Engineer | Data Engineer | Specialized in Big Data, PySpark, and Cloud Solutions

I am Vasanth Desai, a Senior Software Engineer specializing in Data Engineering and an AWS Certified Data Engineer Associate. With a passion for building scalable, data-driven solutions, I focus on leveraging cloud platforms, big data, and ETL pipelines to transform raw data into impactful insights. My previous experience as a Data Analyst at LabCorp equipped me with skills in predictive analytics, natural language processing (NLP), and data automation. Outside of my professional work, I enjoy reading, sketching, and exploring diverse cuisines and destinations, embracing a balanced and fulfilling lifestyle.

Meet Vasanth

Areas of Interest

Data Engineering & ETL Pipelines

Designing and building scalable ETL pipelines to manage and process large datasets efficiently, ensuring seamless data integration from multiple sources.

Cloud Data Solutions (AWS)

Utilizing AWS services to architect and maintain cloud-based data infrastructure, optimizing data storage, processing, and retrieval for real-time analytics.

PySpark & Databricks

Harnessing the power of PySpark and Databricks to handle big data processing, enabling parallel processing and advanced analytics on large-scale datasets

Data Automation

Automating data workflows to increase efficiency and reduce manual processes, enhancing data accuracy and operational speed.

Data Governance & Integrity

Implementing strong data governance practices to ensure data quality, security, and compliance, safeguarding the integrity of data across various platforms.

Career Highlights.

Data Delivery & Automation Specialist

April 2022 - September 2024

Labcorp Laboratories India

  • Advanced Analytics and Forecasting: Led high-stakes projects utilizing FBProphet and ARIMA, achieving accuracies of 80-95%, significantly enhancing resource allocation and forecasting accuracy.

  • Document Processing Automation: Engineered an NLP-based system that processes over 20,000 documents monthly with SpaCy and OCR technologies, streamlining document handling and boosting data accuracy.

  • Data Management and Optimization: Managed the ingestion of large datasets from sources including S3 and SFTP, utilizing Python and SQL to enhance data flow and operational efficiency.

  • Automated System Implementations:

    • Developed automated systems for user access provisioning and specimen monitoring, substantially reducing manual labor and enhancing real-time reporting capabilities.

    • Spearheaded the creation of a sophisticated client billing automation system, reducing errors and improving operational insights.

  • Leadership and Recognition: Mentored team members, enhancing team capabilities and performance, recognized with the Ace Level 4 award for contributions to automation and process improvement.

Data Management Associate

April 2021 - April 2022

Labcorp Laboratories India

  • Client and Interdepartmental Liaison: Acted as the primary contact for Clinical Data Management, resolving client queries and facilitating cross-departmental communication during critical startup phases.

  • Standardization of Data Processes: Developed and implemented standardized data transfer formats using SQL, aligning with client needs and significantly enhancing data integrity and process efficiency.

  • Data Optimization and Efficiency: Led efforts to optimize data management workflows using advanced SQL techniques, improving process efficiency and accuracy, thereby boosting team productivity and enhancing strategic focus.

Junior Research Fellow

August 2018 - March 2021

COE RVCE, Bengaluru

  • Research and Data Analysis: Engaged in multiple data-centric projects, enhancing research data handling efficiency and accuracy through advanced data analysis and automation techniques.

  • Automation of NGS Pipelines: Developed automation tools for NGS pipelines using Bash and Python, significantly improving data processing speed and reliability.

  • Software Development for Bioinformatics: Contributed to the development of a Python plugin for PyMol in collaboration with PDB Europe, enhancing molecular structure analysis and visualization capabilities.

  • Collaborative Projects: Collaborated with international teams on various projects, enriching my data science skills and enhancing team collaboration effectiveness.

Highlighted Projects

BioQuery: AI-Powered Biology Assistant

  • Technologies Used: Python, Streamlit, HuggingFace Transformers, Langchain.

  • Key Contributions: Developed 'BioQuery', utilizing the efficient LaMini-T5-738M model from HuggingFace to answer biology-related queries with high accuracy. Integrated this model into a Streamlit application, offering a seamless user experience for complex query handling.

  • Impact: Enhanced accessibility and accuracy in biological information retrieval, significantly benefiting educational and research communities.

Emotion Recognition Using EEG Brainwave Data

  • Technologies Used: Python, scikit-learn, EEG.

  • Key Contributions: Engineered a machine learning solution using Random Forest to predict emotions from EEG data, achieving ~98.4% accuracy. Optimized the model through extensive feature engineering and hyperparameter tuning.

  • Impact: Advanced the field of neuroscience and mental health by enabling more intuitive interfaces and improving treatment methodologies.

DataStreamX: Seamless Data Engineering

  • Technologies Used: Apache Airflow, Kafka, Python, Docker, Cassandra, PostgreSQL.

  • Key Contributions: Orchestrated a robust data streaming architecture using Apache Kafka and Airflow to process data efficiently across platforms. Employed Docker and Cassandra for scalable and resilient data storage solutions.

  • Impact: Streamlined data workflows, enhancing real-time data processing and reliability, crucial for dynamic data handling and analytics operations.

Get in touch

+91 991 67 991 78
vpdesai2020@gmail.com