Question # 1 Problem Scenario 62 : You have been given below code snippet.val a = sc.parallelize(List("dogM, "tiger", "lion", "cat", "panther", "eagle"), 2) val b = a.map(x => (x.length, x)) operation1 Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String)] = Array((3,xdogx), (5,xtigerx), (4,xlionx), (3,xcatx), (7,xpantherx), (5,xeaglex))
Answer Description Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : b.mapValuesf'x" + _ + "x").collect mapValues [Pair] : Takes the values of a RDD that consists of two-component tuples, and applies the provided function to transform each value. Tlien,.it.forms newtwo-componend tuples using the key and the transformed value and stores them in a new RDD.
Question # 2 Problem Scenario 26 : You need to implement near real time solutions for collecting information when submitted in file with below information. You have been given below directory location (if not available than create it) /tmp/nrtcontent. Assume your departments upstream service is continuously committing data in this directory as a new file (not stream of data, because it is near real time solution). As soon as file committed in this directory that needs to be available in hdfs in /tmp/flume location Data echo "I am preparing for CCA175 from ABCTECH.com" > /tmp/nrtcontent/.he1.txt mv /tmp/nrtcontent/.he1.txt /tmp/nrtcontent/he1.txt After few mins echo "I am preparing for CCA175 from TopTech.com" > /tmp/nrtcontent/.qt1.txt mv /tmp/nrtcontent/.qt1.txt /tmp/nrtcontent/qt1.txt Write a flume configuration file named flumes.conf and use it to load data in hdfs with following additional properties. 1. Spool /tmp/nrtcontent 2. File prefix in hdfs sholuld be events 3. File suffix should be Jog 4. If file is not commited and in use than it should have as prefix. 5. Data should be written as text to hdfs
Answer Description Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create directory mkdir /tmp/nrtcontent Step 2 : Create flume configuration file, with below configuration for source, sink and channel and save it in flume6.conf. agent1 .sources = source1 agent1 .sinks = sink1 agent1.channels = channel1 agent1 .sources.source1.channels = channel1 agent1 .sinks.sink1.channel = channel1 agent1 .sources.source1.type = spooldir agent1 .sources.source1.spoolDir = /tmp/nrtcontent agent1 .sinks.sink1 .type = hdfs agent1 .sinks.sink1.hdfs.path = /tmp/flume agent1.sinks.sink1.hdfs.filePrefix = events agent1.sinks.sink1.hdfs.fileSuffix = .log agent1 .sinks.sink1.hdfs.inUsePrefix = _ agent1 .sinks.sink1.hdfs.fileType = Data Stream Step 4 : Run below command which will use this configuration file and append data in hdfs. Start flume service: flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/fIume6.conf -name agent1 Step 5 : Open another terminal and create a file in /tmp/nrtcontent echo "I am preparing for CCA175 from ABCTechm.com" > /tmp/nrtcontent/.he1.txt mv /tmp/nrtcontent/.he1.txt /tmp/nrtcontent/he1.txt After few mins echo "I am preparing for CCA175 from TopTech.com" > /tmp/nrtcontent/.qt1.txt mv /tmp/nrtcontent/.qt1.txt /tmp/nrtcontent/qt1.txt
Question # 3 Problem Scenario 70 : Write down a Spark Application using Python, In which it read a file "Content.txt" (On hdfs) with following content. Do the word count and save the results in a directory called "problem85" (On hdfs) Content.txt Hello this is ABCTECH.com This is XYZTECH.com Apache Spark Training This is Spark Learning Session Spark is faster than MapReduce
Answer Description Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create an application with following code and store it in problem84.py # Import SparkContext and SparkConf from pyspark import SparkContext, SparkConf # Create configuration object and set App name conf = SparkConf().setAppName("CCA 175 Problem 85") sc = sparkContext(conf=conf) #load data from hdfs contentRDD = sc.textFile(MContent.txt") #filter out non-empty lines nonemptyjines = contentRDD.filter(lambda x: len(x) > 0) #Split line based on space words = nonempty_lines.ffatMap(lambda x: x.split(''}} #Do the word count wordcounts = words.map(lambda x: (x, 1)) \ reduceByKey(lambda x, y: x+y) \ map(lambda x: (x[1], x[0]}}.sortByKey(False} for word in wordcounts.collect(): print(word) #Save final data " wordcounts.saveAsTextFile("problem85") step 2 : Submit this application spark-submit -master yarn problem85.py
Question # 4 Problem Scenario 45 : You have been given 2 files , with the content as given Below (spark12/technology.txt) (spark12/salary.txt) (spark12/technology.txt) first,last,technology Amit,Jain,java Lokesh,kumar,unix Mithun,kale,spark Rajni,vekat,hadoop Rahul,Yadav,scala (spark12/salary.txt) first,last,salary Amit,Jain,100000 Lokesh,kumar,95000 Mithun,kale,150000 Rajni,vekat,154000 Rahul,Yadav,120000 Write a Spark program, which will join the data based on first and last name and save the joined results in following format, first Last.technology.salary
Answer Description Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create 2 files first using Hue in hdfs. Step 2 : Load all file as an RDD val technology = sc.textFile(Msparkl2/technology.txt").map(e => e.splitf',")) val salary = sc.textFile("spark12/salary.txt").map(e => e.split(".")) Step 3 : Now create Key.value pair of data and join them. val joined = technology.map(e=>((e(0),e(1)),e(2))).join(salary.map(e=>((e(0),e(1)),e(2)))) Step 4 : Save the results in a text file as below. joined.repartition(1).saveAsTextFile("spark12/multiColumn Joined.txt")
Question # 5 Problem Scenario 69 : Write down a Spark Application using Python, In which it read a file "Content.txt" (On hdfs) with following content. And filter out the word which is less than 2 characters and ignore all empty lines. Once doen store the filtered data in a directory called "problem84" (On hdfs) Content.txt Hello this is ABCTECH.com This is ABYTECH.com Apache Spark Training This is Spark Learning Session Spark is faster than MapReduce
Answer Description Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create an application with following code and store it in problem84.py # Import SparkContext and SparkConf from pyspark import SparkContext, SparkConf # Create configuration object and set App name conf = SparkConf().setAppName("CCA 175 Problem 84") sc = sparkContext(conf=conf) #load data from hdfs contentRDD = sc.textFile(MContent.txt") #filter out non-empty lines nonemptyjines = contentRDD.filter(lambda x: len(x) > 0) #Split line based on space words = nonempty_lines.ffatMap(lambda x: x.split(''}} #filter out all 2 letter words finalRDD = words.filter(lambda x: len(x) > 2) for word in finalRDD.collect(): print(word) #Save final data finalRDD.saveAsTextFile("problem84M) step 2 : Submit this application spark-submit -master yarn problem84.py
Question # 6 Problem Scenario 11 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following. 1. Import departments table in a directory called departments. 2. Once import is done, please insert following 5 records in departments mysql table. Insert into departments(10, physics); Insert into departments(11, Chemistry); Insert into departments(12, Maths); Insert into departments(13, Science); Insert into departments(14, Engineering); 3. Now import only new inserted records and append to existring directory . which has been created in first step.
Answer Description Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Clean already imported data. (In real exam, please make sure you dont delete data generated from previous exercise). hadoop fs -rm -R departments Step 2 : Import data in departments directory. sqoop import \ -connect jdbc:mysql://quickstart:3306/retail_db \ -username=retail_dba \ -password=cloudera \ -table departments \ "target-dir/user/cloudera/departments Step 3 : Insert the five records in departments table. mysql -user=retail_dba -password=cloudera retail_db Insert into departments values(10, "physics"); Insert into departments values(11, "Chemistry"); Insert into departments values(12, "Maths"); Insert into departments values(13, "Science"); Insert into departments values(14, "Engineering"); commit; select' from departments; Step 4 : Get the maximum value of departments from last import, hdfs dfs -cat /user/cloudera/departments/part* that should be 7 Step 5 : Do the incremental import based on last import and append the results. sqoop import \ -connect "jdbc:mysql://quickstart.cloudera:330G/retail_db" \ ~username=retail_dba \ -password=cloudera \ -table departments \ -target-dir /user/cloudera/departments \ -append \ -check-column "department_id" \ -incremental append \ -last-value 7 Step 6 : Now check the result. hdfs dfs -cat /user/cloudera/departments/part"
Question # 7 Problem Scenario 63 : You have been given below code snippet.val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2) val b = a.map(x => (x.length, x)) operation1 Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String}] = Array((4,lion), (3,dogcat), (7,panther), (5,tigereagle))
Answer Description Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : b.reduceByKey(_ + _).collect reduceByKey JPair] : This function provides the well-known reduce functionality in Spark. Please note that any function f you provide, should be commutative in order to generate reproducible results.
Question # 8 Problem Scenario 18 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Now accomplish following activities. 1. Create mysql table as below. mysql -user=retail_dba -password=cloudera use retail_db CREATE TABLE IF NOT EXISTS departments_hive02(id int, department_name varchar(45), avg_salary int); show tables; 2. Now export data from hive table departments_hive01 in departments_hive02. While exporting, please note following. wherever there is a empty string it should be loaded as a null value in mysql. wherever there is -999 value for int field, it should be created as null value.
Answer Description Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create table in mysql db as well. mysql ~user=retail_dba -password=cloudera use retail_db CREATE TABLE IF NOT EXISTS departments_hive02(id int, department_name varchar(45), avg_salary int); show tables; Step 2 : Now export data from hive table to mysql table as per the requirement. sqoop export -connect jdbc:mysql://quickstart:3306/retail_db \ -username retaildba \ -password cloudera \ -table departments_hive02 \ -export-dir /user/hive/warehouse/departments_hive01 \ -input-fields-terminated-by '\001' \ -input-Iines-terminated-by '\n' \ -num-mappers 1 \ -batch \ -Input-null-string "" \ -input-null-non-string -999 step 3 : Now validate the data,select * from departments_hive02;
Up-to-Date
We always provide up-to-date CCA175 exam dumps to our clients. Keep checking website for updates and download.
Excellence
Quality and excellence of our CCA Spark and Hadoop Developer Exam practice questions are above customers expectations. Contact live chat to know more.
Success
Your SUCCESS is assured with the CCA175 exam questions of passin1day.com. Just Buy, Prepare and PASS!
Quality
All our braindumps are verified with their correct answers. Download CCA Spark and Hadoop Developer Practice tests in a printable PDF format.
Basic
$80
Any 3 Exams of Your Choice
3 Exams PDF + Online Test Engine
Buy Now
Premium
$100
Any 4 Exams of Your Choice
4 Exams PDF + Online Test Engine
Buy Now
Gold
$125
Any 5 Exams of Your Choice
5 Exams PDF + Online Test Engine
Buy Now
Passin1Day has a big success story in last 12 years with a long list of satisfied customers.
We are UK based company, selling CCA175 practice test questions answers. We have a team of 34 people in Research, Writing, QA, Sales, Support and Marketing departments and helping people get success in their life.
We dont have a single unsatisfied Cloudera customer in this time. Our customers are our asset and precious to us more than their money.
CCA175 Dumps
We have recently updated Cloudera CCA175 dumps study guide. You can use our CCA Spark and Hadoop Developer braindumps and pass your exam in just 24 hours. Our CCA Spark and Hadoop Developer Exam real exam contains latest questions. We are providing Cloudera CCA175 dumps with updates for 3 months. You can purchase in advance and start studying. Whenever Cloudera update CCA Spark and Hadoop Developer Exam exam, we also update our file with new questions. Passin1day is here to provide real CCA175 exam questions to people who find it difficult to pass exam
CCA Spark and Hadoop Developer can advance your marketability and prove to be a key to differentiating you from those who have no certification and Passin1day is there to help you pass exam with CCA175 dumps. Cloudera Certifications demonstrate your competence and make your discerning employers recognize that CCA Spark and Hadoop Developer Exam certified employees are more valuable to their organizations and customers. We have helped thousands of customers so far in achieving their goals. Our excellent comprehensive Cloudera exam dumps will enable you to pass your certification CCA Spark and Hadoop Developer exam in just a single try. Passin1day is offering CCA175 braindumps which are accurate and of high-quality verified by the IT professionals. Candidates can instantly download CCA Spark and Hadoop Developer dumps and access them at any device after purchase. Online CCA Spark and Hadoop Developer Exam practice tests are planned and designed to prepare you completely for the real Cloudera exam condition. Free CCA175 dumps demos can be available on customer’s demand to check before placing an order.
What Our Customers Say
Jeff Brown
Thanks you so much passin1day.com team for all the help that you have provided me in my Cloudera exam. I will use your dumps for next certification as well.
Mareena Frederick
You guys are awesome. Even 1 day is too much. I prepared my exam in just 3 hours with your CCA175 exam dumps and passed it in first attempt :)
Ralph Donald
I am the fully satisfied customer of passin1day.com. I have passed my exam using your CCA Spark and Hadoop Developer Exam braindumps in first attempt. You guys are the secret behind my success ;)
Lilly Solomon
I was so depressed when I get failed in my Cisco exam but thanks GOD you guys exist and helped me in passing my exams. I am nothing without you.