Title | Big Data Hadoop Developer |
Posting Date: | 08/09/2021 |
Location: | Texas |
Job Type | Full Time |
Job Description: This role requires a Big Data Hadoop Developer. The candidate must be detail oriented, creative technologist, comfortable working in a fast-paced, highly collaborative, entrepreneurial environment, manage and prioritize competing demands, and work with multiple customers while incrementally building the next generation of products
Roles & Responsibilities:
Develop the Big Data applications by following Agile Software development life cycle for fast and efficient progress.
Study various upcoming tools and work on proof of concepts for leveraging those applications in the current system.
Write software to ingest data into Hadoop and write scalable and maintainable Extract, Transformation and Load jobs.
Create Scala/Spark jobs for data transformation and aggregation with focus on functional programming paradigm.
Build distributed, reliable and scalable data pipelines to ingest and process data in real-time. Deal with fetching impression streams, transaction behaviors, clickstream data and other unstructured data.
Load the data into Spark RDD and do in memory data Computation to generate the Output response.
Develop shell scripts and automated data management from end to end integration work.
Develop Oozie workflow for scheduling and orchestrating the ETL process and worked on Oozie workflow engine for job scheduling.
Created environment to access Loaded Data via spark SQL, through JDBC&ODBC (via Spark Thrift Server).
Developed real time data ingestion/ analysis using Kafka/ Spark-streaming.
Design and Implementation of Backup and Disaster Recovery strategy based out of Cloudera BDR utility for Batch applications and Kafka mirror maker for real-time streaming applications.
Implement Hadoop Security like Kerberos, Cloudera Key Trustee Server and Key Trustee Management Systems.
Use Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
Work on Cassandra and Query Grid. Implementing shell scripts to move data from Relational Database to HDFS (Hadoop Distributed File System) and vice versa.
Optimize and Performance tuning of the cluster by changing the parameters based on the benchmarking results.
Run complex queries and work on Bucketing, Partitioning, Joins and sub-queries.
Apply different HDFS formats and structure like Parquet, Avro, etc. to speed up analytics.
Translate, load and exhibit unrelated data sets in various formats and sources like JSON, text files, Kafka queues, and log data.
Work on writing complex workflow jobs using Oozie and set up multiple programs scheduler system which helped in managing multiple Hadoop, Hive, Sqoop, Spark jobs.
Required Skills:
A minimum of bachelor's degree in computer science or equivalent.
Cloudrea Hadoop(CDH), Cloudera Manager, Informatica Bigdata Edition(BDM), HDFS, Yarn, MapReduce, Hive, Impala, KUDU, Sqoop, Spark, Kafka, HBase, Teradata Studio Express, Teradata, Tableau, Kerberos, Active Directory, Sentry, TLS/SSL, Linux/RHEL, Unix Windows, SBT, Maven, Jenkins, Oracle, MS SQL Server, Shell Scripting, Eclipse IDE, Git, SVN
Must have strong problem-solving and analytical skills
Must have the ability to identify complex problems and review related information to develop and evaluate options and implement solutions.