Tuesday, 23 May 2017

C2090-305 IBM InfoSphere Information Analyzer v9.1

Test information:
Number of questions: 64
Time allowed in minutes: 90
Required passing score: 64%
Languages: English

The IBM Certified Solution Developer - IBM InfoSphere Information Analyzer v9.1 exam certifies that the successful candidate has important knowledge, skills, and abilities necessary to conduct a baseline assessment of the data/information being analyzed. This included characterizing the data or information condition in terms of completeness, validity, structural integrity, relational integrity, technical standards, and business rule compliance including quality monitoring and information governance.

Section 1 - Environment (9%)
List key components and configuration steps
Given a scenario, describe how to plan for a large data environment
Identify the minimum requirements for installation

Section 2 - Data Setup (9%)
Identify steps necessary to make data sources available for analysis
Given a scenario, describe project activity security

Section 3 - Profiling Process (25%)
Identify the purposes of each analysis
Interpret system flag icons throughout the system
Identify the common steps that are needed to perform data profiling
Given a scenario, describe the process needed to assess and use data profiling results

Section 4 - Data Rules / Monitoring (36%)
Describe the purpose of data rule validation
Given a scenario, describe the process to implement and deploy the data rule
Identify the steps needed to implement data rule monitoring and baselining
Identify the steps needed to perform data rule execution
Describe the purpose and process of exception management (output tables, Data Quality Console)
Identify the steps necessary to deploy and execute a data rule in an external process

Section 5 - Populating / Reporting / Publishing (14%)
Demonstrate how to populate/report/publish
Demonstrate knowledge of content creation and scheduling
Given a scenario, describe the process to deliver outputs to a broad community

Section 6 - Troubleshooting / Tuning (6%)
Given a scenario, demonstrate troubleshooting techniques
Given a scenario, demonstrate system and process tuning techniques

IBM Certified Solution Developer - InfoSphere Information Analyzer v9.1

Job Role Description / Target Audience
The IBM Certified Solution Developer - InfoSphere Information Analyzer v9.1 is a Source System Analyst, Data Analyst, Business Analyst, Data Quality Analyst, Data Steward, Data Modeler, Data Architect, ETL Developer, and Data Scientist.

QUESTION 1
What feature permits the addition of data from one or more columns into one column allowing column analysis on this newly formed concatenated column?

A. Virtual table
B. Logical table
C. Virtual column
D. Logical column

Answer: C
Reference:https://www.scribd.com/doc/125259952/IBM-infoSphere-information-analyzer-v8-7-User-guide(page 70)


QUESTION 2
When importing published analysis results into a different Information Analyzer project which two contents are included?

A. Median value
B. Frequency values
C. Data Classification
D. Cardinality Percent
E. Uniqueness Code

Answer: C,D


QUESTION 3
You are searching for an employee named Kristie Jones who has accounts on various applications, in which she has used the first names Kristie, Kristin, and Kristen as her first name.
Which is the correct rule logic for addressing this problem?

A. name like ('KristieTKristinTKristen')[ ]'Jones'
B. name matches (,KristieTKristinTKristen,)[]'Jones,
C. name contains ('KristieTKristinTKristen'^rJones'
D. name matches_regex ('Kristie'l'KristinTKristen^j'Jones'

Answer: D


QUESTION 4
You are working with a System Administrator who must move an Information Analyzer project to a new Information Server environment using the command line tools. You need to ensure that the project is established with the same metadata and reports as in the original environment.
Which parameter must be included to successfully complete the export?

A. -includeCommonMetadata; -includeCommonReports; -includeAIIDataClasses
B. -includeAssignedAssets; -includeProjectReports; -includeAIIDataSources
C. -includeCommonMetadata; -includeReports; -includeAIIDataClasses
D. -includeProjectMetadata; -includelAReports; -includeDependentClasses

Answer: D


Saturday, 20 May 2017

C2090-103 Apache Spark 1.6 Developer

Test information:
Number of questions: 60
Time allowed in minutes: 120
Required passing score: 65%
Languages: English

This test will certify that the successful candidate has the necessary skills to work with, transform, and act on data at a very large scale. The candidate will be able to build data pipelines and derive viable insights into the data using Apache Spark. The candidate is proficient in using streaming, machine learning, SQL and graph processing on Spark. This candidate may be a member of a Data Scientist team and/or Analytics team and has applied knowledge with deployment architectures and can assist in tuning, troubleshooting, and optimization.

Section 1 - Architecture (12%)
Compare and contrast Spark with Hadoop MapReduce
Explain memory management in Spark
Explain concepts such as master, drivers, executors, stages and tasks
Explain Spark transformations and actions with respect to lazy evaluation
Configure your application to run on a cluster

Section 2 - Performance and Troubleshooting (22%)
Manage partitions to improve RDD performance and apply different partition strategies
Identify what operations cause shuffling
Optimize memory usage with serialization options
Use caching, checkpoint, and persistence in appropriate situations
Debug Spark code
Monitor Spark applications
Manage runtime issues and performance bottlenecks in Spark

Section 3 - Core Skills (48%)
Read/writre data from multiple data sources and file types
Create and work with RDDs and related APIs
Create and work with DataFrames and related APIs
Create Spark config contexts for different requirements
Work with key value pairs and associated Spark APIs for key value pairs
Work with SparkSQL
Define and work with accumulators
Define and work with broadcast variables
Launch applications with spark-submit

Section 4 - Advanced Skills (18%)
Build a pipeline with Streaming, MLLib, SQL, and Graph on Spark
Work with Spark Streaming APIs
Work with SparkML and MLLib APIs
Work with GraphX

IBM Certified Developer - Apache Spark 1.6

Job Role Description / Target Audience
This test will certify that the successful candidate has the necessary skills to work with, transform, and act on data at a very large scale. The candidate will be able to build data pipelines and derive viable insights into the data using Apache Spark. The candidate is proficient in using streaming, machine learning, SQL and graph processing on Spark. This candidate may be a member of a Data Scientist team and/or Analytics team and has applied knowledge with deployment architectures and can assist in tuning, troubleshooting, and optimization.

Recommended Prerequisite Skills
Read and write Python code
Read and write Scala code
Create and work with RDDs and related APIs
Create and work with DataFrames and related APIs
Create and work with Dstreams and related APIs
Read/writre data from multiple data sources and file types
Read and write SQL statements
Compare and contrast Spark with Hadoop MapReduce
Manage partitions to improve RDD performance and apply different partition strategies
Identify what operations cause shuffling
Optimize memory usage with serialization options
Create Spark config contexts for different requirements
Use caching, checkpoint, and persistence in appropriate situations
Explain memory management in Spark
Work with key value pairs and associated Spark APIs for key value pairs
Define and work with accumulators
Configure your application to run on a cluster
Explain core concepts such as master, drivers, executors, stages and tasks
Debug your spark code
Define and work with broadcast variables
Explain Spark transformations and actions with respect to lazy evaluation
Monitor Spark applications
Launch applications with spark-submit
Manage runtime issues and performance bottlenecks in Spark
Build a pipeline with Streaming, MLLib, SQL, and Graph on Spark
Work with Spark Streaming APIs
Work with SparkML and MLLib APIs
Work with GraphX
Work with SparkSQL

Friday, 12 May 2017

C2090-012 IBM SPSS Modeler Data Analysis for Business Partners v2

Test information:
Number of questions: 25
Time allowed in minutes: 60
Required passing score: 68%
Languages: English, Japanese

Business Understanding (5%)
Review the CRISP methodology

General Operations in Modeler (20%)
Build streams
Run streams
Read different types of files into Modeler

Data Understanding (30%)
Extent of missing data
Outliers
Field distribution and summary statistics
Auto checking for missing and out of bounds data
Bivariate relationships between variables

Data Preparation (40%)
Create new variables with the Derive Node
Create new variables with the Reclassify Node
Combine data files
Restructure data
Aggregate data
Remove duplicates
Sampling cases
Balance data
Data caching
Partitione data
Missing Value replacement

Modeling (5%)
Predictive models
Cluster models
Association models

NOTE: Business partners who take this exam are only expected to familiar with each class of model and the steps that are necessary to prepare the data prior creating the models in the software.

IBM Certified Associate - SPSS Modeler Data Analysis v2

Job Role Description / Target Audience
This certification is designed for Business Partners with a beginning knowledge of IBM SPSS Modeler version 14 or higher working in government, academia, or business who use the IBM SPSS Modeler product to perform data mining activities including data preparation, data understanding, and modeling.

The IBM Certified Associate - SPSS Modeler Data Analysis v2 may utilize the IBM SPSS Modeler product for applications such as fraud detection, customer management and churn, risk management, and so forth.
Question 1:
The optimal binning method in the Binning node uses a Supervisor field to determine the
binning cut points.

A. True
B. False

Answer: A

Question 2:
Which node is used to read data from a comma delimited text file?

A. Var. File
B. Data Collection
C. Fixed File
D. Statistics File

Answer: A