Hello, I am Vasanth
Senior Software Engineer | Data Engineer | Specialized in Big Data, PySpark, and Cloud Solutions
I am Vasanth Desai, a Senior Software Engineer specializing in Data Engineering and an AWS Certified Data Engineer Associate. With a passion for building scalable, data-driven solutions, I focus on leveraging cloud platforms, big data, and ETL pipelines to transform raw data into impactful insights. My previous experience as a Data Analyst at LabCorp equipped me with skills in predictive analytics, natural language processing (NLP), and data automation. Outside of my professional work, I enjoy reading, sketching, and exploring diverse cuisines and destinations, embracing a balanced and fulfilling lifestyle.
Meet Vasanth


Areas of Interest
Data Engineering & ETL Pipelines
Designing and building scalable ETL pipelines to manage and process large datasets efficiently, ensuring seamless data integration from multiple sources.


Cloud Data Solutions (AWS)
Utilizing AWS services to architect and maintain cloud-based data infrastructure, optimizing data storage, processing, and retrieval for real-time analytics.


PySpark & Databricks
Harnessing the power of PySpark and Databricks to handle big data processing, enabling parallel processing and advanced analytics on large-scale datasets


Data Automation
Automating data workflows to increase efficiency and reduce manual processes, enhancing data accuracy and operational speed.


Data Governance & Integrity
Implementing strong data governance practices to ensure data quality, security, and compliance, safeguarding the integrity of data across various platforms.


Career Highlights.
Data Delivery & Automation Specialist
April 2022 - September 2024
Labcorp Laboratories India
Advanced Analytics and Forecasting: Led high-stakes projects utilizing FBProphet and ARIMA, achieving accuracies of 80-95%, significantly enhancing resource allocation and forecasting accuracy.
Document Processing Automation: Engineered an NLP-based system that processes over 20,000 documents monthly with SpaCy and OCR technologies, streamlining document handling and boosting data accuracy.
Data Management and Optimization: Managed the ingestion of large datasets from sources including S3 and SFTP, utilizing Python and SQL to enhance data flow and operational efficiency.
Automated System Implementations:
Developed automated systems for user access provisioning and specimen monitoring, substantially reducing manual labor and enhancing real-time reporting capabilities.
Spearheaded the creation of a sophisticated client billing automation system, reducing errors and improving operational insights.
Leadership and Recognition: Mentored team members, enhancing team capabilities and performance, recognized with the Ace Level 4 award for contributions to automation and process improvement.
Data Management Associate
April 2021 - April 2022
Labcorp Laboratories India
Client and Interdepartmental Liaison: Acted as the primary contact for Clinical Data Management, resolving client queries and facilitating cross-departmental communication during critical startup phases.
Standardization of Data Processes: Developed and implemented standardized data transfer formats using SQL, aligning with client needs and significantly enhancing data integrity and process efficiency.
Data Optimization and Efficiency: Led efforts to optimize data management workflows using advanced SQL techniques, improving process efficiency and accuracy, thereby boosting team productivity and enhancing strategic focus.
Junior Research Fellow
August 2018 - March 2021
COE RVCE, Bengaluru
Research and Data Analysis: Engaged in multiple data-centric projects, enhancing research data handling efficiency and accuracy through advanced data analysis and automation techniques.
Automation of NGS Pipelines: Developed automation tools for NGS pipelines using Bash and Python, significantly improving data processing speed and reliability.
Software Development for Bioinformatics: Contributed to the development of a Python plugin for PyMol in collaboration with PDB Europe, enhancing molecular structure analysis and visualization capabilities.
Collaborative Projects: Collaborated with international teams on various projects, enriching my data science skills and enhancing team collaboration effectiveness.
Highlighted Projects


BioQuery: AI-Powered Biology Assistant
Technologies Used: Python, Streamlit, HuggingFace Transformers, Langchain.
Key Contributions: Developed 'BioQuery', utilizing the efficient LaMini-T5-738M model from HuggingFace to answer biology-related queries with high accuracy. Integrated this model into a Streamlit application, offering a seamless user experience for complex query handling.
Impact: Enhanced accessibility and accuracy in biological information retrieval, significantly benefiting educational and research communities.
Emotion Recognition Using EEG Brainwave Data
Technologies Used: Python, scikit-learn, EEG.
Key Contributions: Engineered a machine learning solution using Random Forest to predict emotions from EEG data, achieving ~98.4% accuracy. Optimized the model through extensive feature engineering and hyperparameter tuning.
Impact: Advanced the field of neuroscience and mental health by enabling more intuitive interfaces and improving treatment methodologies.


DataStreamX: Seamless Data Engineering
Technologies Used: Apache Airflow, Kafka, Python, Docker, Cassandra, PostgreSQL.
Key Contributions: Orchestrated a robust data streaming architecture using Apache Kafka and Airflow to process data efficiently across platforms. Employed Docker and Cassandra for scalable and resilient data storage solutions.
Impact: Streamlined data workflows, enhancing real-time data processing and reliability, crucial for dynamic data handling and analytics operations.


