Stock Market Data ETL Pipeline

  • Tech Stack: Python(Boto3 SDK), Apache Kafka, AWS(S3, EC2, Crawler, Glue, Athena)

- Project aims at designing and implementing a scalable ETL pipeline leveraging AWS and real-time data processing technologies to streamline data ingestion, transformation, and analysis.

- Using Kafka, I enabled real-time data streaming, while Python (boto3) automated interactions with AWS S3 for efficient data storage.

- AWS Glue and Crawler were utilized for schema inference and data cataloging, ensuring seamless integration with Athena for serverless querying and analytics.

- The pipeline was deployed on AWS EC2, providing flexibility and scalability.

- This solution improved data accessibility, enabled real-time insights, and optimized data processing for analytics and AI-driven applications.