Introduction to

Sybase and the

Structured Query Language (SQL)

 

September 27 1999

 

 

SQL and the Relational Database Model

 

·        Relational database management system (RDBMS) stores data reliably, with strong integrity checking

 

·        Database is a collection of objects, including Tables, Views, Users, Rules, Procedures (sprocs) and others

 

·        Business logic can be stored within Database as well, in the form of rules, procedures  and triggers

( The application which is run by end users can also contain business rules )

 

·        Users of RDBMS Databases must log in to a named Database, in order to access its data

( Rich security features are tied to users of a Database, using GRANT and REVOKE )

 

·        All data is stored in Tables, like lists or files

( Typical RDBMS contains dozens/ hundreds/ thousands of related Tables )

 

·        Tables are like spreadsheets---data stored in a Table is available as Rows and Columns

              ( Cannot store calculation formulas as in a spreadsheet, however - one salient difference )

 

·        Structured Query Language ( SQL ) is the only way to 'talk' to RDBMS

( There is a standards committee, and a largely faithful implementaion of the SQL standard by vendors such as  Oracle, Sybase, Informix, Microsoft )

 

·        SQL provides command-driven interface.  All data operations done through 4 'verbs' which are:

1.     SELECT        for reporting and nesting queries – all reports are done with SELECT, except for sprocs, which are run with a different command verb, but contain one or more SELECT's within them

2.     INSERT         for adding Rows to Tables

3.     UPDATE        for modifying Rows in Tables

4.     DELETE        for deleting Rows from Tables


SQL Data Structures

 

 

·        Each Table in a Relational Database has a name and one or more data Columns

 

·        Columns have names and store data which must conform to a certain Data Type, which cannot be changed

( See Column Functions )

 

·        Individual 'line items' in the listing of a Table are called Rows --- some call them 'Records'

 

·        A Row often mirrors an individual real-world entity, such as a customer, employee, bond, country

 

·        A Row may represent a line item in a ledger, such as a cost item associated with lot shares

 

·        A Row may represent an event or transaction, such as a purchase of a security instrument

 

·        All data in an RDBMS can be found at the 'intersection' of a Row and Column in a Table

(This imposes requirement to know the data structures )

 

·        Each Row can be found by knowing its unique identifier, or Primary Key

( One, sometimes more, Columns are specified as holding this Key data )

 

·        There are powerful ways of finding Rows by knowing something about their data Columns

( See WHERE Clauses for Selecting Rows )

 


Specifying Tables and Columns

 

 

·        SQL 'Verb' for report sentences is SELECT

 

 

·        Minimal syntax provides complete view of Table

 

SELECT

*

FROM

brokermastertable

 

·        SELECT sentence is associated with one or more Tables, and specifies which Columns to return

 

SELECT

accountname, accountno

FROM

account

 

·        SQL queries are case-sensitive only for Table and Column names; UPPERCASE is used in this material for easy distinction between data names and SQL syntax

 

 

·        SQL queries produce Result Sets which are in Row and Column format

 

 

·        Some queries naturally produce a single Row --- see Aggregate Functions

 

 

·        Columns in a Result Set can consist of  data from Columns in specified Tables, data generated by Column-level functions, or any combination of the above

 

Using Distinct to Eliminate Duplicate Values

 

 

·        Duplication of data can occur easily in a Relational Database

 

1.     Relationships between Rows in Tables often stored in separate Tables

 

2.     Multi-part Keys can easily cause redundancy

 

 

·        Eliminate duplicate rows with keyword DISTINCT

 

SELECT DISTINCT

majoraccountno

FROM

account

 

 

·        DISTINCT applies to complete combination of all Columns specified

 

SELECT DISTINCT

acprodgrp, acbcountry

FROM

account

 


Sorting the Result Set

 

 

·        Control sort order of Result Set, if necessary,  with ORDER BY clause

(Sometimes, due to indexing, data is provided in useful order by default )

 

SELECT

accountno , majoraccountno , minoraccountno , accountname

FROM

account

ORDER BY

accountname

 

 

·        Multi-level sorts are possible with ORDER BY clause

 

SELECT

accountno , majoraccountno , minoraccountno , accountname

FROM

account

ORDER BY

majoraccountno , minoraccountno

 

 

·        Reverse sort order of Result Set with ORDER BY clause and BY DESC option

 

SELECT

accountname

FROM

account

ORDER BY

accountname DESC

 

 ·        See later discussion of Group By

 

Alternate Column Headings

  

·        Columns have 'column heading' consisting of Column name, which can be modified with an Alias

(One way to create alias is to follow Column name with AS, then alias.  Multiple word aliases in Sybase are delimited with single or double quotation marks)

 

SELECT

accountno , accounttype  AS  “Record type” ,   majoraccountno  AS   account_id ,

minoraccountno  AS  “Portfolio #” ,     accountname  AS  “Portfolio Name”,

acbcountry  AS   base_currency_code

FROM

account

ORDER BY

majoraccountno , minoraccountno

  

·        The word AS is optional;  you could create 'accidental aliases' by forgetting the comma

 

·        Yet another method is shown below one the first line, along with other methods on subsequent lines

 

SELECT

accountno , “Record type” = accounttype,  “Account ID”  = majoraccountno ,

minoraccountno  AS  “Portfolio #” ,  accountname AS  “Portfolio Name”,

acbcountry   base_currency_code

FROM

account

ORDER BY

majoraccountno , minoraccountno

 

Suggestion:  Don't obsess too much with these refinements; focus more on selection criteria and other advanced topics; the SQL language does not provide much support for formating, such as fonts, colors, pictures, etc.


Column-level Expressions

 

·        Column-level Expressions can consist of Column contents, constant values, and Column-level Functions in combination

 

·        TIP: Expressions of this kind can be helpful in posing criteria for finding one or more Rows --- see WHERE Clauses later

 

·        Column-level Functions in Sybase are categorized by Data Type, with the CONVERT function straddling Data Types

 

 

·        String (Character ) Functions are useful in manipulating text contents of a database, or column contents which are temporarily CONVERT'ed into text

 

SUBSTRING ( expression , startingbyteposition, length )

SUBSTRING is for when you only know part of the spelling of a word, or want to say 'column contents start with '

 

UPPER ( expression )        

UPPER turns the column or data expression within the parentheses to all uppercase     

 

LOWER ( expression )

LOWER turns the expression within the parentheses to all lowercase (but you knew that, didn't ya?)

 

·        See also the '+' operator for string concatenation ( glueing string expressions together)

 

·        Examples

 

SELECT gldesc , UPPER ( gldesc ) FROM glchart

 

SELECT lastname + ', ' + firstname FROM personnel

 

 

·        Date Functions are very useful in doing 'date arithmetic' and extremely useful in WHERE Clauses

 

DATEADD ( datepart, number, date )

DATEADD can let you do a projection, as in 'what would be the date 200 days hence?', or 'what would be the date 30 days after the position dates of specified entries in the net_costs table'

 

DATEDIFF ( datepart, date, date )

DATEDIFF will provide the interval between two dates in either days, weeks, months, etc.

 

DATEPART ( datepart, date )  returns a portion of a date-bearing column or other such expression

 

datepart in all these functions can be the words: 

year, quarter, month, dayofyear, day, week, weekday, hour, minute, second

 

GETDATE() returns the 'system date'.  This is more useful than may appear, since any SQL query can be stored and run as needed.  Using GETDATE() obviates the need to keep changing the query to plug in today's date as a string.

 

·        Examples

 

SELECT positiondate , DATEPART ( month , positiondate ) , DATEPART ( dayofyear, positiondate ) ,

DATEADD ( month , 4, positiondate ) , DATEDIFF ( GETDATE() , positiondate )

FROM net_costs

 

SELECT positiondate, positionenddate, DATEDIFF ( positionenddate , positiondate )

FROM lot_costs

ORDER BY DATEDIFF ( positionenddate ,  positiondate )

 

  

·        Numeric / Math Functions

 

ROUND ( numeric_expr, integer_expr )

Use to round a figure up to the nearest integer, or round and convert to a named number of decimal positions

 

ABS ( numeric_expr )

If you are not sure whether a numeric value is signed or not (positive vs. negative) or you do not care, or when you want to find values which are a certain distance from a value in either direction, you can specify the ABSolute value

 

Do calculations on-the-fly by adding the SQL equivalent of an arithmetic formula to the query, generating a column of output which can get a column name with the 'alias' feature mentioned earlier. Good for 'what if' projections

 

·        CONVERT – The 'Translator' Function changes data types; often used to convert to and from a string

CONVERT (  desired new data type , expression )

 

·        Examples – (Let Bill 'build' the third query with you, or just watch )

 

SELECT trdamtbase AS “Trade Date Pos'n”, trdamtbase * 1.004  AS “Raw Adjusted Pos'n”,  ROUND(trdamtbase * 1.004 , 2 )  AS “Adjusted Pos'n” 

FROM net_costs

 

SELECT CONVERT( integer, tmesource )

FROM pt_trmast

 

SELECT positiondate, DATEDIFF(  CONVERT(  DATE, '01/01/' + CONVERT( CHAR(2) ,DATEPART( year, positiondate ) ) ), positiondate ) AS “Julian Date”

FROM net_costs

 

SELECT SUBSTRING ( CONVERT (  CHAR( 12 ) , positiondate ) , 1 , 5 ) FROM ledgers

 

WHERE Clauses for Selecting Rows

  

·        To restrict which Rows appear in your Result Set, add a WHERE clause

 

·        Relational Operators are:  =,  <,  >, <=,  >=,  !=     only

 

·        Column contents can be compared to constants, expressions of appropriate Data Type, and other Column contents

 

SELECT

majoraccountno ,  minoraccountno   “Portfolio #” , accountname 

FROM

account

WHERE

accountno = 1

 

SELECT

*

FROM

ledgers

WHERE

gcaccount = '11110'

 

SELECT

accountno ,  majoraccountno ,  minoraccountno   “Portfolio #”

FROM

account

WHERE

accounttype <> 'A'

 

·        Single quotes are suggested around any comparison value which is not numeric; sometimes an implicit conversion will be done by Sybase if you use inappropriate, but compatible/convertible data type; second example above is one instance


Compound WHERE Clauses

 

 

·        WHERE clauses support multiple AND'ed and OR'ed conditions

 

·        WHERE is never specified more than once

 

 

SELECT

accountno, smsecid, effamt

FROM

 netshares

WHERE

           positiondate <= '08/30/99'

AND  positionenddate > '08/30/99'

AND  effamt <> 0  

ORDER BY

 positiondate

 

·        Whenever you use AND to link conditions, they must all be true, or the row will not appear in the result

 

·        Think of each WHERE clause as being applied on a Row-by-Row basis

 

·        AND and OR conjunctions have different precedence --- AND is more 'magnetic' than OR and will automatically 'attach' to the criteria it straddles.  TIP: Try the query below with and without the parentheses

 

SELECT

accountno,  majoraccountno ,  minoraccountno  , accountname 

FROM

account

WHERE

           ( acporttype = 1 OR accporttype = 5 )

AND   acbusunit = 3

 

TIP:  Rather than diving right into a complex WHERE clause, get a sense of what's in the table first with a more straightforward listing.  It is easy to pose 'legal' but incorrect queries


Compound WHERE Clauses - Continued

 

Here is another example of AND and OR in a query

 

SELECT

*

FROM

ledgers

WHERE

        ( accountno = 1 AND gcaccount = '11110' AND positiondate >  '4/30/1999' )

OR  ( accountno = 2 AND SUBSTRING( gcaccount , 1, 1 )  != '1'  )

ORDER BY

accountno, gcaccount, positiondate

 

TIP:  You do not have to type the queries as seen in these examples; they are structured this way to reinforce the different parts of the query

 

 

More about WHERE Clauses

 

 

·        The WHERE clause always precedes any ORDER BY clause

 

·        The word NOT can be used to precede and negate any conditions, especially complex ones

 

SELECT

*

FROM

ledgers

WHERE

                 accountno = 1

AND        NOT ( gcaccount = '11110' AND positiondate <= '3/31/1998' )

ORDER BY

gcaccount, positiondate

 


 

·        Sybase functions can be used in the WHERE Clause

SELECT

*

FROM

ledgers

WHERE

                 accountno = 1

AND        SUBSTRING ( gcaccount , 4, 1 ) = '1'

ORDER BY

gcaccount, positiondate

 


WHERE Clauses for Working with Ranges of Values

 

 

·        Use BETWEEN ...  AND to specify inclusive range of values in a WHERE clause

 

SELECT

accountno ,  majoraccountno ,  minoraccountno   “Portfolio #”

FROM

account

WHERE

accountno BETWEEN 3 AND 16

 

 

·        Use NOT to negate BETWEEN ...  AND and therefore specify any values before or after the range of values

 

SELECT

accountno ,  majoraccountno ,  minoraccountno   “Portfolio #”

FROM

account

WHERE

accountno NOT BETWEEN 3 AND 16

ORDER BY

accountno , majoraccountno , minoraccountno

 

 

SELECT

*

FROM

lotshares

WHERE

         positiondate BETWEEN '1/1/1998 12:00AM' AND '3/31/1998 11:59PM'

AND accountno = 1

ORDER BY

positiondate DESC, accountno, gcaccount

  


WHERE Clauses for Working with Lists of Values

 

 

·        Use IN (  ,  ,  ... )  to specify individual  values in a WHERE clause

 

SELECT

*

FROM

account

WHERE

           accountno IN ( 1, 2, 3, 8, 16 )

 

AND   accounttype = 'A'

ORDER BY

majoraccountno

 

 

SELECT

*

FROM

ledgers

WHERE

gcaccount IN ( '11110' , '11130' , '12110' )

ORDER BY

accountno

 

·        IN ( , , … ) is a shorter alternative to OR'ed equality condition

 

 

SELECT

*

FROM

ledgers

WHERE

gcaccount NOT IN ( '11110' , '11130' , '12110' )

ORDER BY

accountno

 

·        Shorter alternative to AND'ed inequality condition

 

·        IN (  ) only works as a replacement for equals or not equals; it cannot be applied to 'greater than' or other relational operations

 

Missing (NULL) Data Values

 

 

·        In an RDBMS, data values for some Columns may not (yet) have been provided during INSERT operation of a Row, or an explicit NULL value may have been assigned to Columns

 

·        Such missing or unknown values are stored with special data value of NULL, which means “Don't know”, “unassigned”, “N/A”, but does not represent “” (empty string) or a space

 

·        Special syntax test for NULL is provided for your WHERE clause

 

 

SELECT

accountno ,  majoraccountno

FROM

account

WHERE

acprodsubcd IS NULL

 

 

SELECT

accountno ,  majoraccountno ,  minoraccountno   “Portfolio #”

FROM

account

WHERE

acbusdomcd IS NOT NULL

 

 

·        Use special ISNULL( ) test Function for conditional runtime substitution of a data value for NULL Column contents

 

SELECT

accountno ,  majoraccountno ,  ISNULL ( acprodsub , “N/A” )  AS  “Portfolio #”

FROM

account

ORDER BY 

minoraccountno

 

·        Note: DO NOT USE  = NULL; that does not work


Aggregate Functions

 

 

·        Sybase supports five Aggregate Functions

 

·         A Row in a Result Set may therefore represent an aggregate ( grouping of Rows ), but is still available for Column-level operations such as Column aliases

 

·        COUNT() answers the question “How many?”

 

SELECT

COUNT(*)

FROM

netshares

WHERE

positiondate <= '7/1/1998'

 

·        COUNT() permits one to only count DISTINCT occurrences of data values, when used with a Column name:

 

SELECT

COUNT(*), COUNT( actaxid ) , COUNT( DISTINCT actaxid )

FROM

account

 

 

·        AVG() and SUM() do commonly useful math operations on a Column or Column Expression

 

SELECT

AVG( price )

FROM

 price

WHERE

            pdate > '07/01/99'

AND    smcec_id = 10938

 

 

Aggregate Functions

 

 

·        MAX() and MIN() provide end points in ranges of values found in a Column, either for the entire Table, or only those Rows which satisfy the WHERE clause

 

SELECT

MIN( pdate )

FROM

price

WHERE

smsecid = 10938

 

SELECT

MAX( pdate )

FROM

price

WHERE

smsecid = 10938

 

SELECT

MAX(( DATEDIFF( day, positionenddate,  positiondate ) )

FROM

lotshares

WHERE

accountno = 1

 

·        Aggregate Functions ignore NULL values; consider using ISNULL() to adjust AVG() function, for example

 

 

·        Also remember the usefulness of IS NOT NULL in your WHERE clause to ignore Nulls in general

 


Aggregates and GROUP BY

 

 

·        While useful, the Aggregate Functions introduced on the previous page operate only on the entire Table, or a subset derived from a WHERE clause

 

·        Summarization / Aggregation based on a common identifying Column is available by combining Aggregate Functions with the special GROUP BY syntax

 

SELECT

accountno, COUNT(*) AS “Total Entries”, COUNT(DISTINCT gcaccount ) AS “Accounts”

FROM

ledgers

GROUP BY

accountno

WHERE

positiondate = '08/02/99'

 

 

·        GROUP BY queries should group by all non-Aggregate Column expressions but cannot include Aggregates

 

SELECT

accountno, gcaccount, curr , AVG ( current_bal )

FROM

ledgers

GROUP BY

accountno, gcaccount, curr

WHERE

gcaccount = '11100'

 

 

·        GROUP BY and ORDER BY are both supported in the same query, although default sorting may occur

 

SELECT

accountno, COUNT(*)

FROM

ledgers

GROUP BY

accountno

ORDER BY

accountno DESC


Testing Groups with the HAVING Clause

 

 

·        A Group, once generated by the RDBMS, can then be tested with the HAVING clause

 

SELECT

gcaccount, curr , AVG ( current_bal )

FROM

ledgers

WHERE

accountno = 1

GROUP BY

gcaccount, curr

HAVING

AVG ( current_bal ) > $1000

 

·        The HAVING clause is like the WHERE clause of a Group

 

 

·        Generally, in the query shown above, the order of operation is the following:

 

1.     The WHERE clause is applied

  

2.     Those Rows which pass the WHERE test are then grouped

  

3.     Once the groups are assembled, aggregation is done ( AVG, MIN, MAX, COUNT )

  

4.     Only those groups which meet the condition in the HAVING clause actually appear in the result


Nesting Queries

  

·        Sometimes we do not know search values to be used in a WHERE clause until we run a separate query

 

·        SQL permits you to have the results of one query 'hand off' to another, 'outer' query; this is done by forming a completely self-sufficient query and 'nesting' it as seen below; its result is used in place of an explicit value

 

 

SELECT

*

FROM

securitiesmaster

WHERE 

mktcode IN

  ( SELECT mktcode

    FROM securitiesmarket

    WHERE ctrycode IN ( 'US' , 'JP', 'NZ' ) )

 

·        If you are using IN as in example above, you may return more than one value; notice the 'fit' between the Column being generated by the inner query ( in italics ) and the Column in the outer WHERE clause

 

·        With all other relational operators, the only valid return value from the Nested Query must be a single value, as seen in the example below

 

SELECT

accountno , COUNT( * )

FROM

account

GROUP BY

accountno

HAVING 

 COUNT(*)    >

 

   ( SELECT COUNT(*)

       FROM account

 WHERE accountno = 1  )

 


Using Relational Features to 'join' Tables in a Result Set

 

 

·        In a Relational Database, the RDBMS makes it possible to create Result Sets which are composed of 2 or more Tables

 

·        Normal requirement for this 'Join' capability is storage of data in matching Columns in Tables you wish to join

 

·        Such a relationship typically runs from Foreign Key Columns in one Table to Primary Key

 

·        Example:

 

·        The accountno Column in account is the Primary Key which uniquely identifies an account

 

·        The accountno Column in ledgers,  lotshares, netshares, lot_costs and net_costs is:

 

1.      part of the Primary Key

 

2.      a Foreign Key, since it is designed to 'point' to a matching account Row ( it is 'related' )

  

·        Knowledge of such data relationships is crucial to doing Joins


Join Syntax

  

·        Specifying more than one Table in the FROM clause always implies a Join (whether you intend it or not)

 

·        The WHERE clause must contain necessary Join information ( the 'Join Predicate' ) to only find those matchings where a common data value is found

 

·        The WHERE clause contains as a minimum, Join syntax showing the linkage among all joined Tables

 

·        If Column names in multiple Tables are spelled the same, ambiguity would result, and the RDBMS will reject the query; this requires the use of a Qualified Column Name, using 'dot notation' as seen below

 

 

SELECT

smsecid , smtype , instrdesc

FROM

securitiesmaster , instrtype

WHERE

securitiesmaster.instrtype = instrtype.instrtype

ORDER BY

smtype

 

·        Column names in multiple Tables may also be qualified with a table alias as seen below

 

 

SELECT

smsecid , smtype , A.mktcode , securitiesmarket.mktname , ctrycode

FROM

securitiesmaster  A, securitiesmarket  B

WHERE

A.mktcode = B. mktcode

ORDER BY

ctrycode, smsecid

 

 

Join Examples

 

 

SELECT

smsecid , smtype , A.mktcode , securitiesmarket.mktname , ctrycode

FROM

securitiesmaster  A , securitiesmarket B, securities_issuer C

WHERE

            A.mktcode = B. mktcode

AND    A.secisscode = C.secisscode

ORDER BY

ctrycode, smsecid

 

 

·        Joins which find no matching Rows in one of the Tables create an empty set

 

SELECT

smsecid , smtype , A.mktcode , securitiesmarket.mktname , ctrycode

FROM

securitiesmaster  A , securitiesmarket B, securities_issuer C

WHERE

            A.mktcode = B. mktcode

AND    A.secisscode = C.secisscode

AND     C.secisscode = '9999'

ORDER BY

ctrycode, smsecid

 

 

·        Joins which do not specify an appropriate WHERE clause can cause a potentially massive Result Set, since every Row in each Table specified is actually 'multiplied' by every other Row in every other Table ( Unrestricted Join )

 

·        Large unrestricted Joins may crash the RDBMS Server, as potentially millions or more Rows are generated for the result.