Data warehousing with Teradata: January 2011

Friday, January 28, 2011

What iS PDE (Parallel Database Extension)?

a software interface layer on top of the operating
system that enables the database to operate in a
parallel environment.

Can you list different modules in Parsing Engine?

* Session controller
• Parser
• Optimizer
• Step Generator
• Dispatcher

what is UPSET command in teradata?

In Teradata, we can combine both UPDATE AND insert statements into
a single statement, we called it as UPSERT

what is the size of BYTEINT,SMALLINT,INTEGER?

BYTEINT - 1 Bytes - 8 bits -- > -128 to 127

SMALLINT - 2 bytes - 16 bites: ---> -32768 to 32767

INTEGER - 4 BYTES - 32 BITS ---> -2,147,483,648 to 2,147,483,647

How to see teradata version?

SELECT * FROM DBC.DBCINFO;

Is there any difference between user and database in teradata?

Database and a User are almost the same in Teradata because both can have Perm and Spool space, which allow them to hold objects. The only difference between a user and a database is that a user has a password and can logon and run queries.

A two-byte Partition number is added to the ROW-ID and it is now called a ROW KEY. The two-bytes per row will add more Perm Space to a table.

Joins to Non-Partitioned Tables can take longer and become more complicated for Teradata to perform.

Basic select queries utilizing the Primary Index can take longer if the Partition number is not also mentioned in the WHERE clause of the query.

You can't have a Unique Primary Index (UPI) if the Partition Number is not at least part of the Primary Index. You must therefore create a Unique Secondary Index to maintain uniqueness.

what are advantages of PPI?

Range queries don't have to utilize a Full Table Scan.

Deletions of entire partitions are lightning fast.

PPI provides an excellent solution instead of using secondary indexes

Tables that hold yearly information don't have to be split into 12 smaller tables to avoid Full Table Scans (FTS). This can make modeling and querying easier.

Fastload and Multiload work with PPI tables, but not with all Secondary Indexes

can you apply a read lock on table where write lock is already aplied?

No .

At what level teradata can apply a LOCK?

Teradata can applye 4 types of locks at three levels

1. database
2. user
3. rowhash

List different types of LOCKS in teradata?

Teradata can apply four types of LOCKS

1. EXCLUSIVE LOCK
2. WRITE LOCK
3. READ LOCK
4. ACCESS LOCK

what is RANGE_N?

RANGE_N is similar to CASE_N can be used for partition creations

RANGE_N can only be applied to INTEGER or DATE fields

ex: create table employee(
empno integer,
empname varchar(40),
empsal decimal(10,2),
empjoindate DATE
)
UNIQUE PRIMARY INDEX(empno,empjoindate)
PARTITION BY RANGE_N (
empjoindate between '2005-01-01' and '2006-12-31' each interval '1' month,
empjoindate between '2007-01-01' and '2008-12-31' each interval '7' day
);

How many partions in given SQL ?

Partition by CASE_N
( empSalary < 10000,
empSalary < 20000,
empSalary < 300000
empSalary < 4000000,
NO CASE OR UNKNOWN)
)

Ans: 5 Partitions

How many partions in given SQL ?

Partition by CASE_N
( empSalary < 10000,
empSalary < 20000,
empSalary < 300000
empSalary < 4000000,
NO CASE OR UNKNOWN)
)

Ans: 5 Partitions

How many partions in given SQL?

Assume SQL,

Partition by CASE_N
( empSalary < 10000,
empSalary < 20000,
empSalary < 300000
empSalary < 4000000,
NO CASE, UNKNOWN)
)

teradata creates 6 partitions....

If value is not NULL and first 4 cases are not satisfied then data goes
to "NO CASE" partition

and NULL values goes to UNKNOWN partition

Can you alter NO RANGE and Unknow partions on a table?

Yes if table is empty. we can alter NO RANGE and UnKnown Partions on empty tables

Can this SQL be okay now?

create table employee(
empno integer,
ename varchar(20),
esal
)
unique primary index(empno,empsal)
partition by case_n(
empsal<10000,
empsal<20000,
empsal<30000,
NOCASE,
Unknown
);

YESSSSSSSSSSSSSSSSSSSSSS . Now partition is created using empsal
which is part of UPI

Can this SQL be okay>

create table employee(
empno integer,
ename varchar(20),
esal
)
unique primary index(empno)
partition by case_n(
empsal<10000,
empsal<20000,
e,psal<30000,
NOCASE,
Unknown
);

NOOOOOOOOOOOOOOOOOOOOOOOOOO..... Because you created unique primary index on the
column. so your partition must be created from one of the columns used in UPI creation.

can we use CASE_N statement to define partion?

Yes. Case statement generally used to handle different cases, if case is true value to be used in SQL statements. same thing can be applied in creation of partion

for example , if empsal <10000 one partion, 10000
create table employee(
empno integer,
ename varchar(20),
esal
)
primary index(empno)
partition by case_n(
empsal<10000,
empsal<20000,
e,psal<30000,
NOCASE,
Unknown
);

)

what is the disadvantage of PPIs?

Well , if we are retrieving data based on a column which is not
part of PPI column, then AMP has to search every partions.

AMPs cant apply Binary search on ROWID. Because rows are ordered
using partition column

Ex: Assume employee table where primary index is empno and partion is created using
deptno..

select * from employee where eno=1000;

The above query goes to one AMP. But with in that AMP it can't Apply
Binary search(on Row HASH) to find row quickly. because data is ordered
using dept no.

So we have to be careful

To handle this situavations,

1. we can create a secondary index on that column (if seems to be good)

2. Include dept no also in the query

Ex: select * from employee where eno=1000 and deptno=20;

what is the use of Partition?

If you created PPI on table, then data at AMPs is ordered using Partion column

for example, if we have Partion on deptno then all dept 10 records at one place at AMP.
and dept 20 records at one place.....etc

I.e it creates partitions

When you query data on deptno, only corresponding partition will be read

Can Partion column and Primary Index should be same?

YESSSSSSSS NOOOOOOOOOOO...

we can create a Non Unique Priamry index (NUPI) on one column and chooose
other column for Partion

Ex: create an employee table with primary index on employee number and
partion on deptno.

if table has Unique Primary Index , then you should choose one of the
columns used in UPI as Partion column.

I.e If create a UPI on empno and we can't choose deptno for partion

Ex 1: This query is okay
create table employee(eno integer,ename varchar(20),deptno)
primary index(eno)
partition by dept;

Ex 2: This query is Not okay
create table employee(eno integer,ename varchar(20),deptno)
unique primary index(eno)
partition by dept;

what is a partion primary Index (PPI)?

Partion Primary Index (PPI) is a new concept introduced in V2R5 release of teradata.

users can choose PPI on table instead of UPI / NUPI.... PPI acts same as primary index to distribute rows but AMPs ordered data based on PPI column.... In case of Priamry Index AMPs order data using ROWID. In case of PPI AMPs order data using column and then ROW ID.

This is very useful for range queries.....

For example, Many times users use order date range to see orders from orders table

Ex : select * from orders where order_date between '01/01/2009' to '12/31/2009';

For these kind of queries if we can create PPI on order_date then performance will be
good

Can a user creates any object who assigned no perm space?

A user with No permspace can't create any tables in his user space. But He can create Macro, Views and triggers....

Because these objects will be stored in Global Space

what is the difference between database and user in Teradata?

Unlike Oracle, Database and user are same in teradata except that
user has password to login. database is not

Both database and user can have permspace and can store
objects

Its completely different from Oracle database

what is the use of spool space?

Spool space is useful to store intermediate and final results when AMPs
running SQL queries

what is the use of spool space?

When AMPs processing queries, to store intermediate results and
final results spool space is required

When a user query fails because of spool space?

Query may fails because of spool space in two cases

1.When There is no empty perm space. i.e complete system is full

2. User query requires more spool space than user spool space limit

When a user query fails because of spool space?

Query may fails because of spool space in two cases

1.When There is no empty perm space. i.e complete system is full

2. User query requires more spool space than user spool space limit

List different types of Spaces in teradata?

There are three important spaces:

Permspace : Permanent space for storing objects (users,tables, permanent journals,secondary indexes...etc) . Permanent space always specifies upper limit of the space can be used by database or user.

Spool Space : Space required to run queries. spool is calculated on user basis. It specifies upper limit of space can be used to run his sql queries. it comes from unused permanent space from entire system. As long as there is unused perm space in the system , spool space is available.
Each user have a limit saying how much spool space he can use for his queries.
It also calculated on per AMP basis

Temp Space : For Global Temporary Tables while running queries

who owns entire space in teradata when initially system delivered?

DBC is the User who owns entire space.

Thursday, January 27, 2011

How many rows will be returned in case of Primary Index access?

Primary Index access is always a one AMP operation.

In case of UPI, you may get Zero 0r one row

In case of NUPI, you may get Zero or Many rows

table has UPI on EMP no. I wrote a query in which i used deptno in emp table? How many amps gets this data?

Even thouh table has Primary Index , my SQL query written on
dept no on which there is not index.... So its a Full table scan

How many AMPs involved in case of Primary Index?

Its always one AMP.... Doesnt matter table has UPI or NUPI

Access data using Primary Index is always one AMP operation

How many AMPs will store a tables data for whcih Primary Inedx is created on Gender? Assume onlt two gender M or F ?

All table data goes into two AMPs.....

Male data to one AMP and Female data to another AMP

How many AMPs will store a tables data for whcih Primary Inedx is created on Gender? Assume onlt two gender M or F ?

All table data goes into two AMPs.....

Male data to one AMP and Female data to another AMP

does teradata double checks the data once it gor row hash?

Yes. Once AMP find Row with a given hash value its double checks data for integrity

Ex: Select * from student where sname='suresh';

AMP first check on row hash from PE and once it finds the row , it checks
that row has sname 'suresh' or not

Scary about its hashing ? Its is important for Integrity

Does AMP will do row by row search once it has Hash value to retrieve a row?

AMP receives a hash value from PE for which data is required.

AMP applies a Binary Search to find that row.... Its never do
linear search

Does AMP will do row by row search once it has Hash value to retrieve a row?

AMP receives a hash value from PE for which data is required.

AMP applies a Binary Search to find that row.... Its never do
linear search

what is the unique value added by AMP for UPI?

It always 1.

How long ROW ID is?

64 Bits...

AMP receives 32 bit hash value and row data from PE. AMP then append a 32 bit unique value to row HASH to make 64 bit ROW ID.

ROW ID is used to sort rows in AMP

First AMP checks is there any row with same HASH value? If yes, then it append32 bit 1 to (0000 0000 0000 0000 0000 0000 0000 0001) to row hash and make it 64 bit ROW ID... If one more row comes with same hash value then it appends 2( 0000 0000 0000 0000 0000 0000 0000 0010) to Row HASH....It goes like that

So how many rows can be inserted with same value on a column/columns on which Primary index is created ? ( 2^32).... after that? I dont Know

How logn row hash is?

32 Bits. PE generates 32 bits hash by passing primary index value to a HASH formula.
From this 32 bits, first 16 bits are used to find AMP number from HASH MAP Bucket.

Then SE sends 32 bits hash value and ROW to AMP

How Many Columns can be used in Primary Index creation?

A Primary Index can be created on a single column or Maximum of 64 columns in V2R5 and V2R6

Can a table have two Primary Indexes?

NOOOOOOOOOOOO...... Every table can have only one Primary Index. It can be either
Unique Primary Index or Non unique Primary Index.

I.e UPI or NUPI

Where Primary Index is important?

Primary Index involves in three situations

1. Data Distribution

2. To find fastest way or path to retrieve data (Very Important)

3. Very very very Important for Joins

Does Primary Index column choosing is important?

The Success of teradata warehouse starts with choosing of correct column
to create primary index on that.....

Try to Choose a column which has unique values. so that data can be distributed evenly among
all AMPs ..... Otherwise Skewness will come into picture. and also primary index is useful to get a straight path to retrieve data

Can you create a table with no Primary Index in teradata?

NO. Its not possible.... if you dont mention external priamry index, teradata creates one for you

Teradata uses this Priamry index to distribute data among AMPs

Can you create a table with no Primary Index in teradata?

NO. Its not possible.... if you dont mention external priamry index, teradata creates one for you

How many BYNETs in a teradata system?

Well, BYNET is both a hard ware and software... Hard ware BYNET connects every
PE and AMP in the sytem. It can be scalable at any time.

Every teradata system has two BYNETS . one for backup in case of one failed

However both BYNETS can be used at same time by AMP or PES

Does an AMP can connect to two disks?

NO. One AMP always connects to One virtual disk

who read and write data in teradata?

AMP works like a hard worker. AMP read and write data to their own assigned
disks.

who read and write data in teradata?

AMP works like a hard worker. AMP read and write data to their own assigned
disks.

How many sessions a PE can handle?

A parsing Engine can handle up to 120 user sessions....

So if a teradata has 2 PES, it can handle 240 users at a time....

Does your company has more than 240 users who needs teradata connections
at same time?

can you explain and PE and AMPs communicate?

When user connects to teradata database he opened a session with parsing Engine(PE)

there after when user submits a query,

1. First PE takes query, checks syntax, verifies his access rigthts
2. If every thing looks okay, PE prepare an action plan for AMP.
Which AMP to respond , what is row ID to read ...etc
3. Then PE sends action plan to AMP via BYNET signals.
4. Then corresponding AMP takes action plan and reads data and send back to PE
5. Then PE sends data to USER

Does DBC has any history?

Every teradata system when initially shifted to client, it has one user called " DBC" who
is Super DBA and owns all space in Terdata system.

Initially teradata was successed well on DBC/1012 computer which was developed/manufactured
by teradata

Most databases were born to be OLTP. Teradata born to be parallel.

Does teradata warehouse user uses Start or snow flake schemas?

Interesting : Because Teradata supports high speed parallelism and tera data recommends
for 3NF tables. Most warehouses use third normal form tables instead of Star or snow flake schemas.

Then How OLAP softwares get Start schemas, which is heart for them?

When for OLAP softwares, people creates start schemas using Views.

How teradata got its name?

The first developer of this datawase aim was to support to Tera bytes(10^12) of data. so they named their database as Teradata.

They first installed their production copy at Wells Fargo

what is VPROC in Teradata?

VPROC means virtual processor in teradata. its just a process in teradata system. It imitates
a processor by taking instructions from other proeces.

Teradata has two types of VPROCS

1. PE ( Parsing Engine) , Written in C .
2. AMP (Access Module Processor)

some times PDE ( Parallel database extension) process is also called VPROC.

what is teradata?

Teradata is a RDBMS system like other databases oracle,db2... but built with Parallelism and
supports terabytes of data. It mainly developed for Data warehousing area..... Parallelism or parallel processing is heart of Tera data which comes with Share Nothing architecture

Teradata loads data in parallel, export in parallel, read in parallel

Read Write and every thing is parallel

Wednesday, January 26, 2011

Can you list different versions of teradata?

TD 13.10
TD 13.0
TD 12.0
TD 8.0
V2R6.1.1
V2R6.1.0
V2R6.0.2
V2R6.0.1
V2R6.0.0
V2R5

Tuesday, January 25, 2011

Can you tell two macros in teradata?

SHOW and HELP are two macros

what are default login details after installting teradata?

For BTEQ:

tp ID : localtd
user name: tduser
password: tduser

ex: .logon localtd/tduser,tduser; Press enter

From Tera data administrator console,

we can login using, tduser or tdadmin

tduser and tdadmin are DSN s created on ODBC connection area

Useful URLs

Teradata Documentation

http://www.info.teradata.com/

http://www.info.teradata.com/templates/eSrchResults.cfm?prodline=&txtpid=&txtrelno=&txtttlkywrd=TTU13.10&rdsort=Title&srtord=Asc&nm=Teradata+Tools+and+Utilities+13.10

http://teradata.uark.edu/doc2.html

http://www.coffingdw.com/

http://banglahouse.tripod.com/tutorial/intro2td.htm

http://www.scribd.com/doc/7266269/Teradata-Best-Practices-Using-a-711

http://www.coffingdw.com/Teradata_Basics/teradata_basics.htm#chapter_9___advanced_topics_that_you_will_be_tested_on/teradata_cabinets_nodes_vprocs_and_disks.htm

http://www.teradatatips.info/

http://teradataquestions.com/

Thursday, January 20, 2011

what is Full table scan (FTS) ?

A Full Table Scan (FTS) is a query that reads every row of a table

How many columns can be used in secondary index?

16?

How many columns can be used in primary Index?

PRIMARY INDEXES may be defined on one column, or on a set of columns viewed as a composite unit. Up to 16 columns may be defined as a Primary Index.

How Many secondary indexes can be created on table?

We can create up to 32 secondary indexes

How Many secondary indexes can be created on table?

We can create up to 32 secondary indexes

How many primary indexes can be created on a table?

Only One

can you create a secondary index after table is created?

Yes. We can create secondary indexes (USI/آ NUSI ) at any point of time

can you create Primary index after a table is created?

No. Its not possible. If you want to create a primary index we should
create at time of table creation. if you are not creating any primary index
system creates a default one.

We are not able to edit primary index once table is created. to edit a
UPI/ NUPI you need to recreate table

In case of NUSI, does AMP has local subtable?

Yes. In case of NUSI, AMP is going to have a subtable
which has only information for the rows in the same AMP.

But in case of USI, Each row in AMP is hashed and decide AMP number
to store its information in Secondary index Sub table

In case of NUSI, does AMP has local subtable?

Teradata Join Indexes equal to what kind of concept in oralce?

Join Indexes on Teradata similar to materalized views in Oracle

Teradata Join Indexes equal to what kind of concept in oralce?

Join Indexes on Teradata similar to materalized views in Oracle

List different types of indexes in Teradata?

Primary Indexes: Unique Primary Index(UPI) , Non Unique Primary Index( NUPI)

Secondary Index : Unique Secondary Index(USI), Non Unique secondary Index(NUSI)

Join Index

1. Each table may only have one Primary Index, but every table must have a Primary Index defined.
2. PRIMARY INDEXES may be defined on one column, or on a set of columns viewed as a composite unit. Up to 16 columns may be defined as a Primary Index.
3. A table can have 32 secondary indexes.
4.Once created, the primary index cannot be dropped or modified, the index must be changed by recreating the table
5. Unlike the primary index, a secondary index can be added or dropped without recreating the table.

waht are different types of spaces in teradata?

There are three types of space with Teradata. They are:

Perm Space,
Spool Space, and
Temp Space

"Perm space" defines the upper limit of space that a database or user can use to hold tables, secondary index sub-tables, and permanent journals

Spool space defines the upper limit of space that a user has to run a query. When a user runs a query, AMPs build the answer set in spool space. Once the query is done, the spool space is released. If the query exceeds the spool space's upper limit, the query aborts. Then, the user is out of spool space.

Temp space defines the upper limit that a user or database can have to hold Global Volatile Temporary tables.

What's the difference between a database and a user?

Remember either a database or a user can own space

Unlike other database products, Teradata sees little difference between a user and a database. Both need space to contain or own data. In fact, the only real difference is that a user has a password and he or she can log-on and submit SQL requests.

Both a database and a user can own perm space; therefore both can actually own tables.

what is DBC in System?

Every Teradata system that was ever built has a user called "DBC." The acronym is derived from the first Teradata machine called the "DBC/1012". DBC stands for Database Computer, and 1012 stands for 10 to the 12^th power – or a Terabyte. There is no user with greater privileges than the DBC.

How many table can you join in a single query in teradata?

Teradata can join up to 64 tables in a single query.

What is the difference between a PRIMARY KEY and a PRIMARY INDEX?

A Primary Key is a logical term used to label column(s) that enforce the uniqueness of each row in a table. PKs determine relationships among tables. A Primary Index is a physical term used to label column(s) that is used to store and locate rows of data

Does every table has primary index in Teradata?

PI is so important to Teradata functionality that every table in the database is required to have one
The Main uses of PI are

The Primary Index WILL DETERMINE which rows go to which AMPs; and
The Primary Index is ALWAYS the FASTEST RETRIEVAL method.

if the user doesn't define a PRIMARY INDEX when creating a table, the system will automatically choose one by default. Once it is defined, the PI column cannot be dropped or changed. The table would need to be re-created in order to change the PI.

What is BYNET in teradata?

the Bynet is an internal Teradata network over which the PEs and the AMPs communicate

What forms the relationship between the tables in a relational database?

A key that is common to each table forms it. A "Foreign Key (FK)" is a key in a table that is a Primary Key (PK) in another table. The PK and FK relationship allows the two tables to relate to one anothe

Tuesday, January 18, 2011

can you list some vendors in data mining?

SAS , SPSS, IBM Intelligent Miner, Microsoft SQL server data mining, oraclem, Angoss,KXEN

what is data mining?

data mining is about analyzing data and finding hidden patterns using automatic or semiautomatic ways/methods

data mining is key member in Business Intelligence product family