Experienced Data Science Engineer and having core knowledge about information technology and services industry. Total 3+ years of experience in software analysis, design, development and implementation of large-scale applications within Big Data Hadoop environment. 2+ years of experience in building batch and real time data pipeline with DBT, pyspark, apache kafka, airflow and python. 2+ years of experience in AWS Cloud platform. Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, IAM, EMR, and other services of the AWS family. 1+ years of experience in data warehousing with snowflake, FiveTran, DBT. Extensively used ETL, Data Migration and Data Integration methodologies for data extraction, transformations and loading process in a corporate-wide-ETL Solution using Streamsets and SSIS. Having good knowledge in Bigdata related technologies like Hadoop framework, Hive, HBASE, KUDU, SQOOP, Impala, OOZIE, Spark and Storm. Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm. Worked with Apache Spark which provides fast and general engine for large data processing integrated with Python. Able to Handle large structured, semi-structured and unstructured datasets using Python, impala and Hive. Very good experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance. Hands on experience in writing Hive scripts, kudu scripts and job scheduling with Oozie and airflow. Migrated the data using Sqoop from HDFS to Relational Database System, Mainframe and vice-versa according to requirement. Able to Extend HIVE and Impala core functionality by using custom User Defined Functions. Extensive knowledge in using SQL queries for backend database analysis. Proficient in converting SQL queries into Spark Transformations using Spark RDDs, Data Frames, Data Sets and Python, and performed map-side joins on RDD`s. Good understanding and work experience in NoSQL databases and its applications like HBase and MongoDB. Involved in database development using SQL and PL/SQL and experience working on databases like MySQL, SQL Server, PostgreSQL. Worked with BI tools like Tableau for report creation and further analysis from the front end. Skillset: Big Data, Apache Spark, Apache Hadoop, Apache Kafka, Apache Airflow, Apache Hive, Apache Impala, Hue, Streamsets, FiveTran, DBT, Snowflake, Google Studio, AWS, PostgreSQL, MySQL,SQL Server, Machine Learning, Linux, Python Django, rest api and Analytics.
©