Thursday, January 3, 2013

Teradata Interview Questions

Why is BTET transaction processing overhead in Teradata? 

- BTET makes all the queries running between BT and ET as single transaction . If any of query fails then the entire set of queries will not be committed.

- BTET also has an overhead with locking , since it holds locks on tables till the ET is occured or all the queries have executed successfully

- DDL statements cannot be used  everywhere in BTET processing , but these statements can be given towards the end of BTET transaction.

- Using large number of BTET caused transient Journal to grow and has tendancy for System Restarts

What are the options that are not available for global temporary tables ? 
GTT  definitions are created in Data dictionary.  These table provide separate instance to each user who refers to them .

The following options are not available for global temporary tables:
- Any kind of constraints like check/referential cannot be applied on table
- Identity columns since data in GTT are materialized only for session
- Permanent Journaling cannot be done as data in tables are instances only to user in that session
- PPI cannot be applied as data does not get stored in PERM , only TEMP space is utilized here.


What are the options not available for volatile tables in teradata ? 
The following options are not available for volatile tables because table definition is not stored in data dictionary 
- Default values for columns  
- Title clause for columns
- Named Indexes for table
- Compression on columns/table level since table data are spooled
- Stats cannot be collected since data is materialized only for session  
- Identity columns as these again would need entry in IDcol tables
- PPI cannot be done on tables
- Any kind of constraints like check/referential cannot be applied on table


What are permanent journals in teradata? 
- Journals are used to capture information about table in Teradata.  In case of Permanent journals they capture details of Journal enabled tables in teradata   with all the pre transaction and post transaction details .  
- Journal tables are assigned PERM space and they reside in same database as of parent or they can reside on different database.
- They are mainly used for protection of data and sometimes  also for disaster recovery ( fallback is better in this case )
- Permanent journal tables can be enabled or disabled by running alter database <databasename> 'no journal' /' journal = <databasename.jrnltbl>'
- Arcmain  utility provides the feature of backing  up Journal tables 
- We can find details about all journal tables present in  teradata  database using DBC.JOURNALS table.
Posted by Vinay SHet at 12:11 AM 1 comment:  
Email ThisBlogThis!Share to TwitterShare to Facebook
Query to find skew factor of a particular table ? 
What is the acceptable range for skew factor in a table? 

There is no particular range for skew factor.  In case of production systems, it is suggested to keep skew factor between 5-10.  
There are various considerations for skew factor
- Number of AMPS
- Size of row in a table
- number of records in a table
- PI of a table
- Frequent access of table (Performance consideration)
- whether table getting loaded daily /monthly or how frequently data is being refreshed
Posted by Vinay SHet at 8:50 AM No comments:  
Email ThisBlogThis!Share to TwitterShare to Facebook
What is multivalued compression in Teradata? 
Multivalued compression or just MVC is a compression technique applied on columns in Teradata .  MVC has a unique feature of compressing up-to 255 distinct values per column in a given table.  

The advantage of compression are

Reduced Storage cost by storing more of a logical data than physical data. 
Performance is greatly improves due to  reduced retrieval of physical data for that column.    
Tables having compression always have an advantage since optimizer considers reduced I/O as one of the factors for generating EXPLAIN plan. 
What are the ways by which we can use zero to replace a null value for a given column ? 
Answer - 

1. By using Teradata SQL supported command as follows
Select Col1, ZEROIFNULL(Col2)  from Table_name;

2. By using ANSI SQL command as follows
a. Coalesce
Select Col1,COALESCE(Col2,0)    from Table_name;

b.Case operator 
Case When Col2 IS NOT NULL 
Then Col2
Else 0
from Table_name;

It is always suggested to use ANSI standard while coding in Teradata , since any changes in Teradata version due to upgrade/patches installation will lead to
- Time for regression testing
- rework of code.

What is a join index ? What are benefits of using Join index? 

It is a index that is maintained in a system .It maintains rows  joined on two or more tables. Join index is useful for queries where the index structure contains all the columns referenced by one or more joins, thereby allowing the index to cover all or part of the query

Benefits if using join index is
- to eliminate base table access
- Aggregate processing is eliminated by creating aggregate join index
- It reduces redistribution of data since data is materialized by JI.
- Reduces complex join conditions by using covering queries 

Can a macro be called inside a macro? 
The main purpose of run a set of repeated sql queries.   Macro supports only DML queries .
Hence We cant call any-other macro or not even a procedure in a macro.

One trick to have closest functionality of this is to copy all the sql queries from macro2 to macro1 and add parameters if it is necessary as shown below.

replace macro1( val int)
(  sel * from employee_table where empid= :val );

replace macro2( dept_no int)
(  sel * from employee_table where deptno= :dept_no );

so, to call a macro2 inside a is not possible. Hence follow this approach

Replace macro1 (val int, dept_no int)
sel * from employee_table where empid= :val;
sel * from employee_table where deptno= :dept_no ;
How do you find the list of employees named "john" in an employee table without using Like operator?? 
This question seems tricky.. but yes there is another way by which we can find names/patters without using like operator.
By using "BETWEEN" , we can find the lsit of employees named john... 
sel * from employee where name betwee 'J' and 'K';

But at times usage of between is tricky, if there are other employees starting with J, those employees will also be listed by this query.
Posted by Vinay SHet at 10:05 AM 2 comments:  
Email ThisBlogThis!Share to TwitterShare to Facebook

how do you list all the objects available in given database? 
1.)select * from dbc.tables where databasename='<DATABASENAME>';

2.) By running a normal help command on that database as follows.  
help database <DATABASENAME';
What are different types of Spaces available in Teradata ? 
There are 3 types of Spaces available in teradata ,they are

1. Perm space 
-This is disk space used for storing user data rows in any tables located on the database. 
-Both Users & databases can be given perm space.
-This Space is not pre-allocated , it is used up when the  data rows are stored on disk.

2.Spool Space 
-It is a  temporary workspace which is used for processing Rows for given SQL statements.
-Spool space is assigned only to users . -
-Once the SQL processing is complete the spool is freed and given to some other query.
-Unused Perm space is automatically available for Spool .  

3. TEMP space
-It is allocated to any databases/users where Global temporary tables are created and data is stored in them.
-Unused perm space is available for TEMP space
What is hash collision ? 

This occurs when there is same hash value generated for two different Primary Index Values. It is a rare occurance and Has been taken care in future versions of TD.
Posted by Vinay SHet at 5:47 AM No comments:  
Email ThisBlogThis!Share to TwitterShare to Facebook
What is RAID, What are the types of RAID? 

Redundant Array of Inexpensive Disks (RAID)  is a type of protection available in Teradata. RAID  provides Data protection at the disk Drive level.  It ensures data is available even when the disk drive had failed.

Th\ere are around 6 levels of RAID ( RAID0 to RAID5) .  Teradata supports Two levels of RAID protection
RAID 1 - Mirrored copy of data in other disk
RAID 5 - Parity bit (XOR) based Data protection on each disk array.

One of the major overhead's of RAID is Space consumption 
Posted by Vinay SHet at 4:59 AM No comments:  
Email ThisBlogThis!Share to TwitterShare to Facebook
How do you find the No of AMP 's in the teradata Database? 

1.) You can do a SELECT HASHAMP()+1 to get the total number of Amps in the  given teradata system.
2.) Details about amps can also be checked in Configuration management on  teradata PMON tool.

How do you see a DDL for an existing table in Teradata? 

By using show table command  as follows
show table tablename ;

This will display DDL structure of table along with following details
Before/after journal
set/multiset table type
details about column - datatype ,default,identity/primary -foreign key .

How to find duplicates in a table? 
To find duplicates in the table , we can use group by function on those columns which are to be used and then listing them if their count is >1 .

Following sample query can be used to find duplicates in table having  3 columns
select col1, col2,col3, count(*) from table 
group by col1, col2, col3  
having count  (*) > 1 ;

what are different types of journals in teradata? 
There are  3 different types of journals available in Teradata. They are

1. Transient Journal  - This maintains current transaction history. Once the query is successful it deletes entries from its table .  If the current query transaction fails, It rolls back data from its table.

2. Permanent Journal  - This is defined when a table is created.  It can store BEFORE or AFTER image of tables. DUAL copies can also be stored. Permanent Journal maintains Complete history of table.

3.Down AMP recovery Journal (DARJ)  - This journal activates when the AMP which was supposed to process goes down.  This journal will store all entries for AMP which went down.  Once that AMP is up, DARJ copies all entries to that AMP and makes that AMP is sync with current environment.

What are the reasons for product joins ? 
1. Stale or no stats causing optimizer to use product join 
2. Improper usage of aliases in the query.
3. missing where clause ( or Cartesian product join  1=1 )
4. non equality conditions like > ,< , between   example ( date)
5. few join conditions
6.  when or conditions are used.

last but not the least   product joins are not bad always!! sometimes PJ are better compared to other types of joins.

How to rename columns using views? 
Create a view which is a subset of the employee table as follows.
Renaming columns in views will help increase security of sensitive tables  and hiding columns under alias names

Replace  view EmployeeV( number,fullname,addr,phno,depnum,sal, expr) as 
locking row for access  
EmpNo ,                        
Name  ,                        
Phone   ,                      
DeptNo   ,                     
Salary    ,                    
from employee;


how do you you implement Multi valued compression in an existing table? 

MVC can be implemented on following conditions
1. A new column with multi valued compression can be added to an existing table, but cannot modify existing compressed column.
2. Create a new table with column which has MVC and do insert .. select from original table 
  CREATE TABLE... as with column designated MVC.

list Built-in functions used in teradata ? 

The main functionality of built in functions is  that they dont need any arguments or paramaters and can be used directly with select to return system values.
Some of them are

• SESSION: – Returns a number for the session for current user .
• TIME: – this function provides the current time based on a 24-hour day
• USER: –  This one gives the user name of the current user.
• ACCOUNT: – display's your Teradata Account information 
• CURRENT_DATE: – Returns the current system date 
• CURRENT_TIME: - This function returns the current system time and current session ‘Time Zone’ displacement. 
• CURRENT_TIMESTAMP: - Returns the current system timestamp  with TimeZone
• DATABASE: –  It  returns the name of the default database for the current user.
• DATE: – same as Current_DATE   and is teradata built in .

How do you whether table is locked or not? 
Just run the following query on the table.

Lock Table DBNAME.TABLENAME write nowait

If this table is locked currently then , then the above statement would fail as Nowait will not wait for locks to be released on that  table .

How many error tables are there in fload and Mload and what is their significance/use?

Fload uses 2 error tables
ET TABLE 1: where format of data is not correct.
ET TABLE 2: violations of UPI
It maintains only error field name, error code and data parcel only.

Mload also uses 2 error tables (ET and UV), 1 work table and 1 log table
1. ET TABLE - Data error
MultiLoad uses the ET table, also called the Acquisition Phase error table, to store data errors found during the acquisition phase of a MultiLoad import task.

2. UV TABLE - UPI violations
MultiLoad uses the UV table, also called the Application Phase error table, to store data errors found during the application phase of a MultiLoad import or delete task

Apart from error tables, it also has work and log tables
Mload loads the selected records in the work table

A log table maintains record of all checkpoints related to the load job, it is essential/mandatory to specify a log table in mload job. This table will be useful in case you have a job abort or restart due to any reason.
How do you find out number of AMP's in the Given system
1.running following query in queryman 
Select HASHAMP () +1;

2. We can find out complete configuration details of nodes and amps in configuration screen of Performance monitor 
What are the difference types of temporary tables in Teradata?

a.       Global temporary tables 
b.       Volatile temporary tables
c.       Derived tables

Global Temporary tables (GTT) –
1. When they are created, its definition goes into Data Dictionary.
2. When materialized data goes in temp space.
3. That's why, data is active up to the session ends, and definition will remain there up-to its not dropped using Drop table statement. If dropped from some other session then its should be Drop table all;
4. You can collect stats on GTT.

Volatile Temporary tables (VTT) -
1. Local to a session (deleted automatically when the session terminates)
2. Table Definition is stored in System cache .A permanent table definition is stored in the DBC data dictionary database (DBC.Temptables) .
3. Data is stored in spool space.
4. That’s why; data and table definition both are active only up to session ends.
5. No collect stats for VTT.If you are using volatile table, you can not put the default values on column level (while creating table)
6. Created by the CREATE VOLATILE TABLE sql statement

Derived tables
1 Derived tables are local to an SQL query.
2 Not included in the DBC data dictionary database, the definition is kept in cache.
3 They are specified on a query level with an AS keyword in an sql statement

List types of HASH functions used in Teradata?

The SQL hash functions are:

    * HASHROW (column(s))
    * HASHBUCKET (hashrow)
    * HASHAMP (hashbucket)
    * HASHBAKAMP (hashbucket)

            HASHROW ('Teradata')   AS "Hash Value"
            , HASHBUCKET (HASHROW ('Teradata')) AS "Bucket Num"
            , HASHAMP (HASHBUCKET (HASHROW ('Teradata'))) AS "AMP Num"
            , HASHBAKAMP (HASHBUCKET (HASHROW ('Teradata')))  AS "AMP Fallback Num" ;

This is really good, by looking into the result set of above written query you can easily find out the Data Distribution across all AMPs in your system and further you can easily identify un-even data distribution

Detailed Explanation......
They are 4 Type of Hash Function.

Returns the rowhash value of a given sequence of expressions in BYTE(4) data type.
SEL HASHROW(department_number)FROM DEPT

Returns the bucket number that corresponds to a HashRow in INTEGER data type.
Returns the identification number of the virtual AMP for the primary data row’s HashBucket in INTEGER data type.
SELECT Department_Number, 
AS PrimaryAMP
FROM Department 

Department_Number PrimaryAMP
----------------- -----------
501 1
403 3
301 3
402 6
201 9
302 10
600 11
401 13
100 14 
Returns the identification number of the virtual AMP for the fallback data row’s HashBucket in INTEGER data type.

SELECT Department_Number, 
FROM Department 

Department_Number FbAMP
----------------- -----------
501 0
403 2
301 2
402 7
201 8
600 10
302 11
401 12
100 15


What are the advantages and dis-advantages of secondary Indexes?
1. A secondary index might be created and dropped dynamically
2.A table may have up to 32 secondary indexes.
3. Secondary index can be created on any column. .Either Unique or Non-Unique
4. It is used as alternate path or Least frequently used cases.  ex. defining SI on non indexed column can improve the performance, if it is used in  join or filter condition of a given query.
5. Collecting Statistics on SI columns make sure Optimizer choses SI if it is better than doing Full Table Scans

1. Since Sub tables are to be created, there is always an overhead for additional spaces.
2. They require additional I/Os to maintain their sub tables.
3. The Optimizer may, or may not, use a NUSI, depending on its selectivity.
4. If the base table is Fallback, the secondary index sub table is Fallback as well.
5. If statistics are not collected accordingly, then the optimizer would go for Full Table Scan.
Where does TD store transient journal?

In perm space -> dbc.transientjournal
But that special table can grow over dbc's perm limit until the whole system runs out of perm space.
How to select first N Records in Teradata?

To select N records in Teradata you can use RANK function. Query syntax would be as follows
How to view every column and the columns contained in indexes in Teradata?
Following query describes each column in the Teradata RDBMS

Following query describes columns contained in indexes in the Teradata RDBMS
How Teradata makes sure that there are no duplicate rows being inserted when it’s a SET table?

Teradata will redirect the new inserted row as per its PI to the target AMP (on the basis of its row hash value), and if it find same row hash value in that AMP (hash synonyms) then it start comparing the whole row, and find out if duplicate.
If it’s a duplicate it silently skips it without throwing any error.

What is a clique?

A clique is a set of Teradata nodes that share a common set of disk arrays which are connected in daisy chain network to each disk array controller. 
Cliques provide data accessibility if a node fails for any reason,  Proms are distributed across all nodes in the system. Large multiple node systems will have clique mechanisms associated with them
What is the difference between MultiLoad & Fastload interns of Performance?

If you want to load, empty table then you use the fastload, so it will very useful than the MultiLoad ,because fastload performs the loading of the data in 2phase and its no need a work table for loading the data .
So it is faster as well as it follows the below steps to load the data in the table
Phase1 - It moves all the records to the entire AMP first without any hashing
Phase2 - After giving end loading command, Amp will hashes the record and send it to the appropriate AMPS.

It does the loading in the 5 phases
Phase1 - It will get the import file and checks the script
Phase2 - It reads the record from the base table and store in the work table
Phase3 - In this acquisition phase it locks the table header
Phase4 - In the DML operation will done in the tables
Phase5 - In this table locks will be released and work tables will be dropped.


 How does indexing improve query performance?

Indexing is a way to physically reorganize the records to enable some frequently used queries to run faster.
The index can be used as a pointer to the large table. It helps to locate the required row quickly and then return it back to the user.
The frequently used queries need not hit a large table for data. They can get what they want from the index itself. - cover queries.

Index comes with the overhead of maintenance. Teradata maintains its index by itself. Each time an insert/update/delete is done on the table the indexes will also need to be updated and maintained.
Indexes cannot be accessed directly by users. Only the optimizer has access to the index.
What is error table? What is the use of error table?

The Error Table contains information concerning:

- Data conversion errors Constraint violations and other error conditions:

* Contains rows which failed to be manipulated due to constraint violations or Translation error
* Captures rows that contain duplicate Values for UPIs.
* It logs errors & exceptions that occurs during the apply phase.
* It logs errors that are occurs during the acquisition phase

How to find out list of indexes in Teradata?

IndexType Description 
P Nonpartitioned Primary 
Q Partitioned Primary 
S Secondary 
J join index 
N hash index 
K primary key 
U unique constraint 
V value ordered secondary 
H hash ordered ALL covering secondary 
O valued ordered ALL covering secondary 
I ordering column of a composite secondary index 
M Multi column statistics 
D Derived column partition statistics 
1 field1 column of a join or hash index 
2 field2 column of a join or hash index 

databasename, tablename, columnname, indextype, indexnumber, indexname
databasename,  tablename, indexnumber;

When should the statistics be collected?

Here are some excellent guidelines on when to collect statistics:
·       All Non-Unique indices
·       Non-index join columns
·       The Primary Index of small tables
·       Primary Index of a Join Index
·       Secondary Indices defined on any join index
·       Join index columns that frequently appear on any additional join index columns that frequently appear in WHERE search conditions
·       Columns that frequently appear in WHERE search conditions or in the WHERE clause of joins.

How to make sure Optimizer chooses NUSI over Full Table Scan (FTS)?

A optimizer would prefer FTS over NUSI, when there are no Statistics defined on NUSI columns.
It would prefer scanning for full table instead of going for Subtables in NUSI  since optimized does not have information about subtables of NUSI 
It is always suggested to collect statistics whenever NUSI columns are defined on the table.
Verify whether index is being used by checking in Explain plan.

What are the basic rules that define how PI is defined in Teradata?

The following rules govern how Primary Indexes in a Teradata Database must be defined as well as how they function:

One Primary Index per table.
A Primary Index value can be unique or non-unique.
The Primary Index value can be NULL.
The Primary Index value can be modified.
The Primary Index of a populated table cannot be modified.
A Primary Index has a limit of 64 columns

What are the basic criteria to select Primary Index column for a given table?

A thumb rule of ADV demographics is followed.
Access Demographics
Identify index candidates that maximize one-AMP operations.
Columns most frequently used for access (Value and Join).

Distribution Demographics
Identify index candidates that optimize parallel processing.
Columns that provide good distribution.

Volatility Demographics
Identify index candidates with low maintenance I/O.

What is explain in teradata?

The EXPLAIN facility is a teradata extension that provides you with an "ENGLISH" translation of the
steps chosen by the optimizer to execute an SQL statement. It may be used on any valid teradata database with a preface called "EXPLAIN".

The following is an example:-
EXPLAIN select last_name first_name FROM employees;

The EXPLAIN parses the SQL statement but does not execute it. 
This provides the designer with an "execution strategy". 
The execution strategy provides what an optimizer does but not why it chooses them. 
The EXPLAIN facility is used to analyze all joins and complex queries.

What are the different return codes(severity errors)  in Teradata utilities?

There are 3 basic return codes (severity errors) in teradata utilities.

0 - success
4 - Warning
8 - User error
12 - System error
16 - system error   

Please note that apart from this there are separate error codes for each of the error  returned from sql queries.

How do you set default date setting in BTEQ?

There are two default date setting in BTEQ.  They have to be set after logging on to the session

set session dateform = ANSIDATE ;  /*format is yyyy-mm-dd */                 
set session dateform = integerdate ; /* format is yy/mm/dd   -teradata date format */

What does DROP table command do? 
What does DROP table table_name command do?

It deletes all data in table_name 
Removes the definition from the data dictionary 
Removes all explicit access rights on the table 

Is Like comparison case-sensitive in Teradata? 

LIKE operator is not case sensitive in Teradata session mode.
Consider the following example   
Select F_name from employee where F_name like '%JO%’; 
The following query will pick values matching with 'JO' and 'jo' as well, since Teradata is not case-sensitive

To overcome this problem, a new function called "CASESPECIFIC" is used in TERADATA as follows
Select F_name from employee where F_name (CASESPECIFIC) like '%JO%’; 

What are advantages of compression on tables? 
- They take less physical space then uncompressed columns hence reducing space cost
- They improve  system performance as less data will be retrieved per row fetched , more data is fetched per data block thus increasing data loading speed
- They reduce overall I/O 

How do you submit bteq script (batch mode)?

1. Start the BTEQ , by typing BTEQ
2. Enter the following command
            .run file = BTEQScript.btq 

1. Bteq < BTEQScript.btq 
BTEQScript.btq contains following
 .logon, dbc;
sel top 10 * from dbc.tables;

What are the benefits of Permanent Journal?

The benefits of Permanent Journal are 
Permits capture of before images for database rollback. 
Permits capture of after images for database roll forward. 
Permits archiving change images during table maintenance. 
Reduces need for full-table backups. 
Provides a means of recovering NO FALLBACK tables. 
Requires additional disk space for change images. 
Requires user intervention for archive and recovery activity 
What is the command in BTEQ to check for session settings ?

The BTEQ .SHOW CONTROL command displays BTEQ settings.
What are the benefits of fallback?

The benefits of fallback are
Protects your data from hardware (disk) failure. 
Protects your data from software (node) failure. 
Automatically recovers with minimum recovery time, after repairs or fixes are complete 
What’s the difference between TIMESTAMP (0) and TIMESTAMP (6)?

TIMESTAMP (0) is CHAR (19) and TIMESTAMP (6) is CHAR (26)
Timestamp(0) is YYYY-MM-DDbHH:MI:SS 
Timestamp(6) is YYYY-MM-DDbHH:MI:SS.ssssss  ( milliseconds extra) 
What is a Dirty-Read or Stale-Read Lock?

This occurs when a access lock is applied on the table which is doing a update.
May produce erroneous results if performed during table maintenance resulting in Dirty Read or stale read , which might  result in inconsistent result set.

Differences between BTEQ and Sql assistant (query man)?

Basic Teradata Query utility 
SQL front-end 
Report writing and formatting features 
Interactive and batch queries 
Import/Export across all platforms 
The default number of sessions, upon login, is 1. 
Teradata Query Manager / Queryman / TeradataSQL Assistant
SQL front-end for ODBC compliant databases 
Historical record of queries including: 
-         Timings
-         Status
-         Row counts
Random sampling feature 
Limit amount of data returned 
Import/Export between database and PC 
Export to EXCEL or ACCESS 
How do you execute the given SQL statement repeatedly in BTEQ?

Select top 1* from database.table1;
Here "=n” is to run the previous sql statement, "n" number of times. 
What are types of PARTITION PRIMARY INDEX (PPI) in Teradata?

1. Partition by CASE
ORD_number  integer NOT NULL,
customer_number integer NOT NULL,
order_date  date ,
order_total integer
PRIMARY INDEX (customer_number)
            order_total < 10000 ,
            order_total < 20000 ,
            order_total < 30000,
NO           CASE     OR        UNKNOWN ) ;

2. Partition by Range  - example using date range
ORD_number  integer NOT NULL,
customer_number integer NOT NULL,
order_date  date ,
order_total integer
PRIMARY INDEX (customer_number)
            Order_date BETWEEN date '2010-01-01'       AND      date '2010-12-01'
            EACH interval '1' month ,
        OR  UNKNOWN);

P.S:If  we use NO RANGE or NO CASE  - then all values not in this range will be in a single partition.
If we specify UNKNOWN, then all null values will be placed in this partition 
Can we define PARTITION PRIMARY INDEX (PPI) on a Primary Index column in Table? Explain Why?

PPI cannot be defined on PI column in Table.  Since PI is used to locate an AMP and store the data based on HASH value (ROW ID ) it cannot be used as PPI column.
In case of PPI , The data stored in AMP's are Partitioned based on PPI column after they are row hashed  (ROW KEY = ROW ID +PPI value )

P.S: If you want to create UPI on table, then PPI column can be added as part of PI . 

How to skip 1st record while using Bteq IMPORT?

By using SKIP=1 ;   , we can skip first record.
 .import infile=<filename>, skip=1; 

What is TENACITY? What is its default value?

TENACITY specifies the amount of time in hours, to retry to obtain a loader slot or to establish all requested sessions to logon. The default for Fast Load is “no tenacity”, meaning that it will not retry at all. If several FastLoad jobs are executed at the same time, we recommend setting the TENACITY to 4, meaning that the system will continue trying to logon for the number of sessions requested for up to four hours. 
What does SLEEP function does in Fast load?

The SLEEP command specifies the amount minutes to wait before retrying to logon and establish all sessions.
Sleep command can be used with all load utilities not only fastload.
This situation can occur if all of the loader slots are used or if the number of requested sessions is not available. The default value is 6 minutes. If tenacity was set to 2 hours and sleep 10 minutes, Sleep command will try to logon for every 10 minutes up to 2 hours duration. 

What is Cross Join?

It is a Teradata specified Join, which is used as equivalent to product join.
There is no “On” clause in case of CROSS join
SELECT  EMP.ename , DPT.Dname
FROM     employee EMP
Department DPT
EMp.deptno = DPT.depto ; 
How many types of Index are present in teradata?

There are 5 different indices present in Teradata
1. Primary Index
      a.Unique primary index
      b. non Unique primary index
2. Secondary Index
      a. Unique Secondary index
      b. non Unique Secondary index
3. Partitioned Primary Index
      a. Case partition (ex. age, salary...)
      b. range partition ( ex. date)
4. Join index
      a. Single table join index
      b. multiple table join index
      c. Sparse Join index ( constraint applied on join index in where clause)
5. Hash index 
Difference between Stored Procedure and Macro?

It does not return rows to the user.
It has to use cursors to fetch multiple rows
It used Inout/Out to send values to user
It Contains comprehensive SPL
It is stored in DATABASE or USER PERM
A stored procedure also provides output/Input capabilities

It returns set of rows to the user.
It is stored in DBC SPOOL space
A macro that allows only input values

Hi Vinay.

It is Possible to return the ROWS to user From Procedure.

Syntax is:--

OPEN cur;

What is a Sparse Index?

Sparse Join Indexes are a type of Join Index which contains a WHERE clause that reduces the number of rows which would otherwise be included in the index. All types of join indexes, including single table, multitable, simple or aggregate can be sparse. 

How to handle nulls in Teradata??? How many columns can be there in a table??? 
How to handle nulls in Teradata???
How many columns can be there in a table???

1. Use zeroifnull, nullifzero in select and NULL in insert directly.
2. 256 columns max per table.
How to find average sal with out using avg function????

Without using "avg" we can find the avg salary by using sum (sal)/count (sal);
sel sum(sal)/count(sal) as avgsal from tablename 

What is difference B/w User and database in Teradata?

- User is a database with password but database cannot have password
- Both can contain Tables , views and macros
- Both users and databases may or may not hold privileges
- Only users can login, establish a session with Teradata database and they can submit requests 

How do you create materialized view in Teradata?

There is no such thing as a "materialized view" in Teradata. The equivalent in Teradata would be a join index (or aggregate index) with a corresponding view put on top of it. The command to create one is "CREATE JOIN INDEX...(lots of options)".

Join indices are maintained when the data is maintained. They cannot be accessed directly, but are invoked when the optimizer determines it is beneficial. Even though you cannot invoke a join index directly, you create a view that looks like the join index and in that way, it would be similar to having direct access. However, this approach does not guarantee that the join index will be used. 

What are Differences between Teradata and ANSI Session modes in Teradata?

1. Comparison is not Case sensitive
2. Create table are default to SET tables
3. Each transaction is committed implicitly
4. Supports all Teradata commands 
5. It follows BTET (Begin and End Transaction) Mode

1. Comparison is CASE sensitive
2. Create table are default to MULTISET tables
3. Each transaction has to be committed explicitly
4. Does not support all Teradata commands 
5. It does not follow BTET Mode 
What are the scenarios in which Full Table Scans occurs?

1. The where clause in SELECT statement does not use either primary index or secondary index
2. SQL Statement which uses a partial value (like or not like), in the WHERE statement.
3. SQL Statement which does not contain where clause.
4. SQL statement using range in where clause. Ex. (col1 > 40 or col1 < =10000) 
How to identify PPI columns?

Select databasename , tablename , columnposition ,columnname from dbc.indices
where indextype ='Q'
order by 1 ,2,3 ; 
How to skip the header row in the fastload script

RECORD 2;   /* this skips first record in the source file */
Explain types of re-distribution of data happening for joining of columns from two tables

Case 1 - P.I = P.I joins
Case 2 - P.I = N.U.P.I joins
Case 3 - N.U.P.I = N.U.P.I joins

Case1 - there is no redistribution of data over amp's. Since amp local joins happen as data are present in same AMP and need not be re-distributed. These types of joins on unique primary index are very fast.
Case2 - data from second table will be re-distributed on all amps since joins are happening on PI vs. NUPI column.  Ideal scenario is when small table is redistributed to be joined with large table records on same amp
case3 - data from both the tables are redistributed on all AMPs.  This is one of the longest processing queries , Care should be taken to see that stats are collected on these columns 

Can you load multiple data files for same target table using Fastload? 
Can you load multiple data files for same target table using Fastload?

Yes, we can Load a table using multiple datafiles in Fastload.

Before giving "end loading" statement user can define file path and use insert sql for multiple source files and give "end loading" statement at the end 
Why does varchar occupy 2 extra bytes?

The two bytes are for the number of bytes for the binary length of the field.
It stores the exact no of characters stored in varchar 
How many types of Skew exist?

If you utilized unequally TD resources (CPU,AMP,IO,Disk and etc) this is called skew exists. Major are 3 types of skews (CPU skew, AMP/Data skew, IO Skew).

-Data skew?
When data is not distributed equally on all the AMPs.
-Cpu skew?
Who is taking/consuming more CPU called cpu skew.
-IO skew?
Who perform more IO Operation? Resulting in IO Skew 

Why Fload doesn’t support multiset table?

Fload does not support Multiset table because of following reason.

Say, the fastload job fails. Till the fastload failed, some number of rows was sent to the AMP's.
Now if you restart FLOAD, it would start loading record from the  last checkpoint and some of the consecutive rows are sent for the second time. These will be caught as duplicate rows are found after sorting of data.
   This restart logic is the reason that Fastload will not load duplicate rows into a MULTISET table. It assumes they are duplicates because of this logic. Fastload support Multiset table but does not support the duplicate rows. Multiset tables are tables that allow duplicate rows. When Fastload finds the duplicate rows it discards it. Fast Load can load data into multiset table but will not load the duplicate rows

What is Global Temporary table? What is the use of this Global Temporary table? 
Can we take collect stats on Derived Tables and Volatile tables and Temporary tables?
What is Global Temporary table?
What is the use of this Global Temporary table?

1. No for volatile and derived tables and yes for global tables.
2. Global tables are temp tables like volatile tables but unlike volatile tables, their definitions are retained in dd.
3. It is used whenever there is a need for a temporary table with same table definition for all users. 

What is the default join strategy in Teradata???
# 1
The Teradata Cost based optimizer will decide the join strategy based on the optimum path. The common strategies followed are from Merge, Hash & Nested Join

# 2 
Three strategies followed by optimizer are:
1. Duplication of rows of one table to every amp
--> This one is opted by optimizer when the non-PI column is on a small table.
2. Redistribution of the non PI rows of the table to the amp containing the matching PI row amp.
--> This one is opted when the tables are large. The non PI column is in one table is redistributed to the amp containing the matching PI.
3. Redistribute both the rows of the table by hash values.
--> This is chosen when the join is on a column that is not the PI in either table. This will require the most spool space. 
What do High confidence, Low confidence and No confidence mean in EXPLAIN plan?

Explain gives the execution strategy means what are the different steps that the query will go through.
HIGH CONFIDENCE:  Statistics are collected.
LOW CONFIDENCE:   Statistics are not collected.  But the where condition is having the condition on indexed column. Then estimations can be based on sampling.
NO CONFIDENCE: Statistics are not collected and the condition is on non indexed column. 
How to Skip or Get first and Last Record from Flat File through MultiLoad?

In .IMPORT command in Mload we have a option to give record no. from which processing should begin. i.e. ‘FROM m’ ‘m’ is a logical record number, as an integer, of the record in the identified data source where processing is to begin. You can mention ’m’ as 2 and processing will  start from second record.

THRU k and FOR n are two options in the same Mload command, functions same towards the end of the processing.

Adding to the above, if from n"start record" and for n "stop record" are not mentioned, mload considers records from   start till the end of the file. 
Which is faster – MultiLoad delete or Delete command?

MultiLoad delete is faster then normal Delete command, since the deletion happens in data blocks of 64Kbytes, where as delete command deletes data row by row.  Transient journal maintains entries only for Delete command since Teradata utilities doesn’t support Transient journal loading. 
What is Teradata Virtual storage?

This concept is introduced in TD12. It does the following tasks
- maintains information on frequency of data access 
- tracks data storage task on physical media
- migrating frequently used data to fast disks and less frequently used data to slower disks
- allocating cyclinders from storage to individual AMPs 
how to start / stop a database in windows?

1. logon to CMD
2. check for state pdestate -d
3.  run the following command to start  "net start recond"
4. check for status  pdestate -d
5.  to STOP the database  - Trusted Parallel Application or TPA
tpareset -x comment
The -x option stops Teradata without stopping the OS. 
What is a role?

A role is a set of access rights which can be assigned to the users.   They indirectly help in performance by reducing the number of rows entered in DBC.accessrights 

what is a profile?

A profile contains set of user parameters  like accounts, default database, spool
space, and temporary space for a group of users

To assign the profile to a user, use the AS PROFILE modified in a CREATE USER or
MODIFY USER statement:
MODIFY USER username AS PROFILE=profilename ;
To remove a profile from a member but retain the profile itself:
Why are   AMPs and PEs called as vprocs ?

AMPs and PEs are implemented as “virtual processors - vprocs”. 
They run under the control of PDE and their number is software configurable.
AMPs are associated with “virtual disks – vdisks” which are associated with logical units (LUNs) within a disk array

Vprocs:Virtual process From PE to AMP (This is the network root via MSP(message passing layer),The processing data will store in Disks(These are Physical disks),Each Amp have too many P.disks,to migrate these P.disks The Bynet Network maintains Virtual disks.These V.disks will responsible for data migration.hence they are called as Virtual Process(VPROCS). 
What is residual condition in explain plan ?

It is a condition which help u to reduce the number used for join condition. Residual condition does not help in locating a row 
How to check if given object is a database  or user ?

To check whether the given object is user or database , we can use following query
sel * from dbc.databases where dbkind  ='U'  or dbkind='D'; 

What are set tables and multiset tables in Teradata?Explain with an appropriate example?
1) Set tables cannot accept duplicate at row level not Index or key level.
Example of rows for set table:
R1 c2 c3 c4
1 2 3 4 ... 9

1 2 3 4 ... 9
Duplicate is Rejected
2 1 2 4 ... 9
3 2 4 4 ... 9
4 3 4 4 ... 9

2) Multi set Tables can accept
duplicate at row level not Index or key level.Exmaple of rows for multi set table:
R1 c2 c3 c4 .. cn
1 2 3 4 ... 9
1 2 3 4 ... 9
Duplicate is Accepted
2 1 2 4 ...9
3 2 4 4 ...9
3 2 4 4 ...9
Duplicate is Accepted 

In a table can we use primary key in one column and in another column both unique and not null constrains.if yes how?

Yes, you can have a column for Primary key and have another column which will have no duplicates or null.e.g.A Salary Table will have employee ID as primary key.
The table also contains TAX-ID which can not be null or duplicate 

What is use of compress in teradata?Explain?

Compression is used to Minimize the table size, for example when the table size is increasing anonymously We can use Compression to reduce the size of the table
Conditions:1.Compression can be declared at the time of table creation2.We can compress up to 256 column values(not columns) 3.We can't compress variable length fields (vartext,varchar..)

for 2 condition:create table tab1(::Order_type char(25) compress ('air','sea','road'):)in the above example order type have 3 fields, one should be selected by the user, so one of the field will repeat for every order, like these column values we can use compress statement because these are repeating for
entire table, like these column values TD supports 256 col generally NCR people will provides ready made scripts for these type of compressions However, we can store only one value per column and not 3(air, sea, road). The compressed value is
stored in column header and is to be used as default of that column unless a value is present.
Dept of Taxes in Washington has a database that tracks all people working in Washington. Around 99.9% of the tax payers would have Washington as their state code on their address. Instead of storing “Washington” in millions of records the compress will store the value “Washington” in the table header. Now, a value can be assumed in a row as a default unless another value exists inside the column 
What is the process to restart the multiload if it fails?

MULTILOAD will creates 2 error tables, 1 work table When MULTILOAD fails We have to unlock the Main Table, here error tables and work tables are not locked like FLOAD.

To Unlock the Main Table in  case of acquisation Phase :
To release lock in application phase failure :
Can we load a Multi set table using MLOAD?

We can Load SET, MULTISET tables using Mload, But here when loading into MULTISET table using MLOAD duplicate rows will not be rejected, we have to take care of them before loading.But in case of Fload when we are loading into MULTISET duplicate rows are automatically rejected, FLOAD will not load duplicate rows weather table is SET or MULTISET 
Can I use “drop” statement in the utility “fload”?

YES,But you have to declare it out of the FLOAD Block it means it should not come between .begin loading,.end loading FLOAD also supports DELETE,CREATE,DROP statements which we have to declare out of FLOAD blocking the FLOAD Block we can give only INSERT 
Is it possible that there r two primary key will be in a same table?

Primary key
1. A table should have only one primary key
2. More than one column can consist of a primary key – upto 64 columns
3. Can not be NULL values (missing values)
4. Should be unique values (no duplicate value)

Foreign key
1. A table may have zero or more than that up-to 32 keys
2. More than one column can consist a primary key – up to 64 columns
3. Can have NULL values
4. Can have duplicate values
5. There should be a corresponding primary key in the parent table to enforce referential integrity for a foreign key 
Teradata performance tuning and optimization

1. collecting statistics
2. Explain Statements
3. Avoid Product Joins when possible
4. select appropriate primary index to avoid skewness in storage
5. Avoid Redistribution when possible
6. Use sub-selects instead of big "IN" lists
7. Use derived tables
8. Use GROUP BY instead of DISTINCT ( GROUP BY sorts the data locally on the VPROC. DISTINCT sorts the data after it is redistributed)
9. Use Compression on large tables 
What is the difference between Global temporary tables and Volatile temporary tables?

Global Temporary tables (GTT) –
1. When they are created, its definition goes into Data Dictionary.
2. When materialized data goes in temp space.
3. That's why, data is active upto the session ends, and definition will remain there up-to its not dropped using Drop table statement.If dropped from some other session then its should be Drop table all;
4. you can collect stats on GTT.

Volatile Temporary tables (VTT) -
1. Table Definition is stored in System cache
2. Data is stored in spool space.
3. thats why, data and table definition both are active only upto session ends.
4. No collect stats for VTT.If you are using volatile table, you can not put the default values on column level ( while creating table ) 


Join strategies? 
Join Strategies There are 2 tables, table A with 10 million records, table B has 100 million records, now we are joining both tables, when we seen Explain Plan the plan showing TD will took the table A and it will redistributes itNow the Question is: By that plan is the optimizer is correct job or not ? Justify Ans2. From the same above example now the optimizer is taking Table B (100 million records) and it is distributing it, Now is the optimizer is doing best? and How you avoid this situation

Teradata is smart enough to decide when to redistribute and when to copy.
It compares the tables. Are they comparable? or one is big as compared to the other?Based on simple logic it decides whether to distribute the smaller table on all the AMPs or to copy.
what I mean is the small table is copied into all the AMPs in the SPOOL space.Remember all always the Join's has to take place on the AMPs SPOOL Space.By redistributing it is making sure that the 100 million rows table gets the feeling that
it is making AMP local JOIN. Remember the basic thing what ever Teradata does.
It does keeping in consideration for Space and Performance and not to forget the Efficiency.

My simple formula:
If the table is small redistribute them to all the AMPs to have the AMP local Join.
Always JOINs are made AMP local if it cannot then you have the high chance of running out of SPOOL space. 
How can you track Login Parameters of users in Teradata?

You can view all these parameters in this Data Dictionary Table DBC.LOGONOFF

How can you find the Teradata Release and Version information from Data Dictionary Table?

To find Release and Version information you can query this Data Dictionary table DBC.DBCINFO

How can you find the Table Space Size of your table across all AMPs?

You can find the Table Space Size of your table from this Data Dictionary Table DBC.TABLESIZE

How can you determine I/O and CPU usage at a user level in Teradata?

You can find out I/O and CPU Usage from this Data Dictionary Table DBC.AMPUSAGE;

Explain TPUMP (Teradata Parallel Data Pump) Utility in Teradata?

* TPUMP allows near real time updates from Transactional Systems into the Data Warehouse.
* It can perform Insert, Update and Delete operations or a combination from the same source.
* It can be used as an alternative to MLOAD for low volume batch maintenance of large databases.
* TPUMP allows target tables to have Secondary Indexes, Join Indexes, Hash Indexes, Referential Integrity, Populated or Empty Table, Multiset or Set Table or Triggers defined on the Tables.
* TPUMP can have many sessions as it doesn’t have session limit.
* TPUMP uses row hash locks thus allowing concurrent updates on the same table. 
Explain Ferret Utility in Teradata?

Ferret (File Reconfiguration tool) is an utility which is used to display and set Disk Space Utilization parameters within Teradata RDBMS. When you select the Ferret Utility parameters, it dynamically reconfigures the data on disks. We can run this utility through Teradata Manager; to start the Ferret Utility type (START FERRET) in the database window.

Following commands can be used within Ferret Utility:

1. SHOWSPACE – Well this command reports you the amount of Disk Cylinder Space is in use and the amount of Disk Cylinder Space is available in the system. This will give you an information about Permanent Space cylinders, Spool Space cylinders, Temporary Space cylinders, Journaling cylinders, Bad cylinders and Free cylinders. For each of these 5 things it will present you 3 parameters i.e. Average Utilization per cylinder, % of total avaliable cylinders and number of cylinders.
2. SHOWBLOCKS – This command will help you in identifying the Data Block size and the number of Rows per data block. This command displays the Disk Space information for a defined range of Data Blocks and Cylinders. 
Can you recover the password of a user in Teradata?

No, you can’t recover the password of a user in Teradata. Passwords are stored in this data dictionary table (DBC.DBASE) using a one-way encryption method. You can view the encrypted passwords using the following query



What is Reconfiguration Utility in Teradata and What it is used for?

* When we feed Primary Index value to Hashing Algorithm then it gives us Row Hash(32 bit number) value which is used to make entries into Hash Maps.
* Hash Maps are the mechansim for determining which AMP will be getting that row.
* Each Hash Map is an array of 65,536 entries and its size is close to 128KB.

When Teradata is installed on a system then there are some scrpits which we need to execute i.e. DIP Scripts. So it creates a Hash Maps of 65,536 entries for the current configuration. But what if you want to add some more AMPs into your system?

Reconfiguration (Reconfig) is a technique for changing the configuration (i.e. changing the number of AMPs in a system) and is controlled by the Reconfiguration Hash Maps. System builds Reconfiguration Hash Maps by reassigning hash map entries to reflect new configuration of system.

Lets understand this concept with the help of an example; suppose you have a 4 AMPs system which holds 65,536 entries. Each AMP is responsible for holding (65,536/4=16,384) 16,384 entries.

Now you have added 2 more AMPs in your current configuration so you need to reconfigure your system. Now each AMP would be responsible for holding (65,536/6=10922) 10,922 entries. 
What is the difference between Sub-Query & Co-Related Sub-Query?

When queries are written in a nested manner then it is termed as a sub-query. A Sub-Query get executed once for the parent statement whereas Co-Related Sub-Query get executed once for each row of the parent query.

Select Empname, Deptno, Salary from Employee Emp where Salary = (Select Max(Salary) from Employee where Deptno = Emp.Deptno) order by Deptno 

What is the difference between Access Logging and Query Logging in Teradata?

1. Access Logging is concerned with security (i.e. who’s is doing what). In access logging you ask the database to log who’s doing what on a given object. The information stored is based on the object not the SQL fired or the user who fired it.
2. Query Logging (DBQL) is used for debugging (i.e. what’s happening around ?). Incase of DBQL database keep on tracking various parameters i.e. the SQLs, Resource, Spool Usage, Steps and other things which help you understand what’s going on, the information is fruitful to debug a problem. Further DBQL is enabled on a User id rather than an object like say Table or so. 
What is FILLER command in Teradata?

While running Fastload or Multiload if you don’t want to load a particular field from the datafile to the target table then use the FILLER command to achieve this. Syntax for FILLER command would be as following:

.LAYOUT FILE_PRODUCT; /* It is input file layout name */
.FIELD Prod_No * char(11); /* To load data into Prod_No */
.FIELD Prod_Name * char(11); /* To load data into Prod_Name */
.FIELD Location * char(11); /* To load data into Location */
.FILLER Prod_Chars * char(20); /* To skip the value for the next 5 locations */ 
What are TPUMP Utility Limitations?

Following are the limitations of Teradata TPUMP Utility:
* Use of SELECT statement is not allowed.
* Concatenation of Data Files is not supported.
* Exponential & Aggregate Operators are not allowed.
* Arithmatic functions are not supported. 

What are the MultiLoad Utility limitations?

MultiLoad is a very powerful utility; it has following limitations:

* MultiLoad Utility doesn’t support SELECT statement.
* Concatenation of multiple input data files is not allowed.
* MultiLoad doesn’t support Arithmatic Functions i.e. ABS, LOG etc. in Mload Script.
* MultiLoad doesn’t support Exponentiation and Aggregator Operators i.e. AVG, SUM etc. in Mload Script.
* MultiLoad doesn’t support USIs (Unique Secondary Indexes), Refrential Integrity, Join Indexes, Hash Indexes and Triggers.
* Import task require use of PI (Primary Index). 

What are the functions of a Teradata DBA?

Following are the different functions which a DBA can perform:
1. User Management – Creation and managing Users, Databases, Roles, Profiles and Accounts.
2. Space Allocation – Assigning Permanent Space, Spool Space and Temporary Space.
3. Access of Database Objects – Granting and Revoking Access Rights on different database objects.
4. Security Control – Handling logon and logoff rules for Users.
5. System Maintenance – Specification of system defaults, restart etc.
6. System Performance – Use of Performance Monitor(PMON), Priority Scheduler and Job Scheduling.
7. Resource Monitoring – Database Query Log(DBQL) and Access Logging.
8. Data Archives, Restores and Recovery – ARC Utility and Permanent Journals. 
What are the 5 phases in a MultiLoad Utility?

* Preliminary Phase – Basic Setup
* DML Phase – Get DML steps down on AMPs
* Acquisition Phase – Send the input data to the AMPs and sort it
* Application Phase – Apply the input data to the appropriate Target Tables
* End Phase – Basic Cleanup
How to eliminate Product Joins in a Teradata SQL query?

1. Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.
2. Make sure you are referencing the correct alias.
3. Also, if you have an alias, you must always reference it instead of a fully qualified tablename.
4. Sometimes product joins happen for a good reason. Joining a small table (100 rows) to a large table (1 million rows) a product join does make sense.
How does Hashing happens in Teradata?


* Hashing is the mechanism through which data is distributed and retrieved to/from AMPs.
* Primary Index (PI) value of a row is the input to the Hashing Algorithm.
* Row Hash (32-bit number) value is the output from this Algorithm.
* Table Id + Row Hash is used to locate Cylinder and Data block.
* Same Primary Index value and data type will always produce same hash value.
* Rows with the same hash value will go to the same AMP.

So data distribution depends directly on the Row Hash uniqueness; be careful while Choosing Indexes in Teradata. 


  1. really good collection of questions

  2. please anyone publish more questions on Teradata utilities

  3. Straight lift from my blog ...
    You are killing creativity ....

    Next time, please credit the original poster///

    You have forgot to remove my name from posts... looks like someone was in hurry...

    Vinay SHet