|
Introduction to Sybase and the Structured Query Language (SQL)
September 27 1999
SQL and the Relational Database Model
· Relational database management system (RDBMS) stores data reliably, with strong integrity checking
· Database is a collection of objects, including Tables, Views, Users, Rules, Procedures (sprocs) and others
· Business logic can be stored within Database as well, in the form of rules, procedures and triggers ( The application which is run by end users can also contain business rules )
· Users of RDBMS Databases must log in to a named Database, in order to access its data ( Rich security features are tied to users of a Database, using GRANT and REVOKE )
· All data is stored in Tables, like lists or files ( Typical RDBMS contains dozens/ hundreds/ thousands of related Tables )
· Tables are like spreadsheets---data stored in a Table is available as Rows and Columns ( Cannot store calculation formulas as in a spreadsheet, however - one salient difference )
· Structured Query Language ( SQL ) is the only way to 'talk' to RDBMS ( There is a standards committee, and a largely faithful implementaion of the SQL standard by vendors such as Oracle, Sybase, Informix, Microsoft )
· SQL provides command-driven interface. All data operations done through 4 'verbs' which are: 1. SELECT for reporting and nesting queries – all reports are done with SELECT, except for sprocs, which are run with a different command verb, but contain one or more SELECT's within them 2. INSERT for adding Rows to Tables 3. UPDATE for modifying Rows in Tables 4. DELETE for deleting Rows from Tables
· Each Table in a Relational Database has a name and one or more data Columns
· Columns have names and store data which must conform to a certain Data Type, which cannot be changed ( See Column Functions )
· Individual 'line items' in the listing of a Table are called Rows --- some call them 'Records'
· A Row often mirrors an individual real-world entity, such as a customer, employee, bond, country
· A Row may represent a line item in a ledger, such as a cost item associated with lot shares
· A Row may represent an event or transaction, such as a purchase of a security instrument
· All data in an RDBMS can be found at the 'intersection' of a Row and Column in a Table (This imposes requirement to know the data structures )
· Each Row can be found by knowing its unique identifier, or Primary Key ( One, sometimes more, Columns are specified as holding this Key data )
· There are powerful ways of finding Rows by knowing something about their data Columns ( See WHERE Clauses for Selecting Rows )
Specifying Tables and Columns
· SQL 'Verb' for report sentences is SELECT
· Minimal syntax provides complete view of Table
· SELECT sentence is associated with one or more Tables, and specifies which Columns to return
· SQL queries are case-sensitive only for Table and Column names; UPPERCASE is used in this material for easy distinction between data names and SQL syntax
· SQL queries produce Result Sets which are in Row and Column format
· Some queries naturally produce a single Row --- see Aggregate Functions
· Columns in a Result Set can consist of data from Columns in specified Tables, data generated by Column-level functions, or any combination of the above
Using Distinct to Eliminate Duplicate Values
· Duplication of data can occur easily in a Relational Database
1. Relationships between Rows in Tables often stored in separate Tables
2. Multi-part Keys can easily cause redundancy
· Eliminate duplicate rows with keyword DISTINCT
· DISTINCT applies to complete combination of all Columns specified
Sorting the Result Set
· Control sort order of Result Set, if necessary, with ORDER BY clause (Sometimes, due to indexing, data is provided in useful order by default )
· Multi-level sorts are possible with ORDER BY clause
· Reverse sort order of Result Set with ORDER BY clause and BY DESC option
· See later discussion of Group By
Alternate Column Headings
· Columns have 'column heading' consisting of Column name, which can be modified with an Alias (One way to create alias is to follow Column name with AS, then alias. Multiple word aliases in Sybase are delimited with single or double quotation marks)
· The word AS is optional; you could create 'accidental aliases' by forgetting the comma
· Yet another method is shown below one the first line, along with other methods on subsequent lines
Suggestion: Don't obsess too much with these refinements; focus more on selection criteria and other advanced topics; the SQL language does not provide much support for formating, such as fonts, colors, pictures, etc. Column-level Expressions
· Column-level Expressions can consist of Column contents, constant values, and Column-level Functions in combination
· TIP: Expressions of this kind can be helpful in posing criteria for finding one or more Rows --- see WHERE Clauses later
· Column-level Functions in Sybase are categorized by Data Type, with the CONVERT function straddling Data Types
· String (Character ) Functions are useful in manipulating text contents of a database, or column contents which are temporarily CONVERT'ed into text
SUBSTRING ( expression , startingbyteposition, length ) SUBSTRING is for when you only know part of the spelling of a word, or want to say 'column contents start with '
UPPER ( expression ) UPPER turns the column or data expression within the parentheses to all uppercase
LOWER ( expression ) LOWER turns the expression within the parentheses to all lowercase (but you knew that, didn't ya?)
· See also the '+' operator for string concatenation ( glueing string expressions together)
· Examples
SELECT gldesc , UPPER ( gldesc ) FROM glchart
SELECT lastname + ', ' + firstname FROM personnel
· Date Functions are very useful in doing 'date arithmetic' and extremely useful in WHERE Clauses
DATEADD ( datepart, number, date ) DATEADD can let you do a projection, as in 'what would be the date 200 days hence?', or 'what would be the date 30 days after the position dates of specified entries in the net_costs table'
DATEDIFF ( datepart, date, date ) DATEDIFF will provide the interval between two dates in either days, weeks, months, etc.
DATEPART ( datepart, date ) returns a portion of a date-bearing column or other such expression
datepart in all these functions can be the words: year, quarter, month, dayofyear, day, week, weekday, hour, minute, second
GETDATE() returns the 'system date'. This is more useful than may appear, since any SQL query can be stored and run as needed. Using GETDATE() obviates the need to keep changing the query to plug in today's date as a string.
· Examples
SELECT positiondate , DATEPART ( month , positiondate ) , DATEPART ( dayofyear, positiondate ) , DATEADD ( month , 4, positiondate ) , DATEDIFF ( GETDATE() , positiondate ) FROM net_costs
SELECT positiondate, positionenddate, DATEDIFF ( positionenddate , positiondate ) FROM lot_costs ORDER BY DATEDIFF ( positionenddate , positiondate )
· Numeric / Math Functions
ROUND ( numeric_expr, integer_expr ) Use to round a figure up to the nearest integer, or round and convert to a named number of decimal positions
ABS ( numeric_expr ) If you are not sure whether a numeric value is signed or not (positive vs. negative) or you do not care, or when you want to find values which are a certain distance from a value in either direction, you can specify the ABSolute value
Do calculations on-the-fly by adding the SQL equivalent of an arithmetic formula to the query, generating a column of output which can get a column name with the 'alias' feature mentioned earlier. Good for 'what if' projections
· CONVERT – The 'Translator' Function changes data types; often used to convert to and from a string CONVERT ( desired new data type , expression )
· Examples – (Let Bill 'build' the third query with you, or just watch )
SELECT trdamtbase AS “Trade Date Pos'n”, trdamtbase * 1.004 AS “Raw Adjusted Pos'n”, ROUND(trdamtbase * 1.004 , 2 ) AS “Adjusted Pos'n” FROM net_costs
SELECT CONVERT( integer, tmesource ) FROM pt_trmast
SELECT positiondate, DATEDIFF( CONVERT( DATE, '01/01/' + CONVERT( CHAR(2) ,DATEPART( year, positiondate ) ) ), positiondate ) AS “Julian Date” FROM net_costs
SELECT SUBSTRING ( CONVERT ( CHAR( 12 ) , positiondate ) , 1 , 5 ) FROM ledgers
WHERE Clauses for Selecting Rows
· To restrict which Rows appear in your Result Set, add a WHERE clause
· Relational Operators are: =, <, >, <=, >=, != only
· Column contents can be compared to constants, expressions of appropriate Data Type, and other Column contents
· Single quotes are suggested around any comparison value which is not numeric; sometimes an implicit conversion will be done by Sybase if you use inappropriate, but compatible/convertible data type; second example above is one instance Compound WHERE Clauses
· WHERE clauses support multiple AND'ed and OR'ed conditions
· WHERE is never specified more than once
· Whenever you use AND to link conditions, they must all be true, or the row will not appear in the result
· Think of each WHERE clause as being applied on a Row-by-Row basis
· AND and OR conjunctions have different precedence --- AND is more 'magnetic' than OR and will automatically 'attach' to the criteria it straddles. TIP: Try the query below with and without the parentheses
TIP: Rather than diving right into a complex WHERE clause, get a sense of what's in the table first with a more straightforward listing. It is easy to pose 'legal' but incorrect queries
Here is another example of AND and OR in a query
TIP: You do not have to type the queries as seen in these examples; they are structured this way to reinforce the different parts of the query
More about WHERE Clauses
· The WHERE clause always precedes any ORDER BY clause
· The word NOT can be used to precede and negate any conditions, especially complex ones
· Sybase functions can be used in the WHERE Clause
WHERE Clauses for Working with Ranges of Values
· Use BETWEEN ... AND to specify inclusive range of values in a WHERE clause
· Use NOT to negate BETWEEN ... AND and therefore specify any values before or after the range of values
WHERE Clauses for Working with Lists of Values
· Use IN ( , , ... ) to specify individual values in a WHERE clause
· IN ( , , … ) is a shorter alternative to OR'ed equality condition
· Shorter alternative to AND'ed inequality condition
· IN ( ) only works as a replacement for equals or not equals; it cannot be applied to 'greater than' or other relational operations
Missing (NULL) Data Values
· In an RDBMS, data values for some Columns may not (yet) have been provided during INSERT operation of a Row, or an explicit NULL value may have been assigned to Columns
· Such missing or unknown values are stored with special data value of NULL, which means “Don't know”, “unassigned”, “N/A”, but does not represent “” (empty string) or a space
· Special syntax test for NULL is provided for your WHERE clause
· Use special ISNULL( ) test Function for conditional runtime substitution of a data value for NULL Column contents
· Note: DO NOT USE = NULL; that does not work
· Sybase supports five Aggregate Functions
· A Row in a Result Set may therefore represent an aggregate ( grouping of Rows ), but is still available for Column-level operations such as Column aliases
· COUNT() answers the question “How many?”
· COUNT() permits one to only count DISTINCT occurrences of data values, when used with a Column name:
· AVG() and SUM() do commonly useful math operations on a Column or Column Expression
Aggregate Functions
· MAX() and MIN() provide end points in ranges of values found in a Column, either for the entire Table, or only those Rows which satisfy the WHERE clause
· Aggregate Functions ignore NULL values; consider using ISNULL() to adjust AVG() function, for example
· Also remember the usefulness of IS NOT NULL in your WHERE clause to ignore Nulls in general
Aggregates and GROUP BY
· While useful, the Aggregate Functions introduced on the previous page operate only on the entire Table, or a subset derived from a WHERE clause
· Summarization / Aggregation based on a common identifying Column is available by combining Aggregate Functions with the special GROUP BY syntax
· GROUP BY queries should group by all non-Aggregate Column expressions but cannot include Aggregates
· GROUP BY and ORDER BY are both supported in the same query, although default sorting may occur
Testing Groups with the HAVING Clause
· A Group, once generated by the RDBMS, can then be tested with the HAVING clause
· The HAVING clause is like the WHERE clause of a Group
· Generally, in the query shown above, the order of operation is the following:
1. The WHERE clause is applied
2. Those Rows which pass the WHERE test are then grouped
3. Once the groups are assembled, aggregation is done ( AVG, MIN, MAX, COUNT )
4. Only those groups which meet the condition in the HAVING clause actually appear in the result Nesting Queries
· Sometimes we do not know search values to be used in a WHERE clause until we run a separate query
· SQL permits you to have the results of one query 'hand off' to another, 'outer' query; this is done by forming a completely self-sufficient query and 'nesting' it as seen below; its result is used in place of an explicit value
· If you are using IN as in example above, you may return more than one value; notice the 'fit' between the Column being generated by the inner query ( in italics ) and the Column in the outer WHERE clause
· With all other relational operators, the only valid return value from the Nested Query must be a single value, as seen in the example below
Using Relational Features to 'join' Tables in a Result Set
· In a Relational Database, the RDBMS makes it possible to create Result Sets which are composed of 2 or more Tables
· Normal requirement for this 'Join' capability is storage of data in matching Columns in Tables you wish to join
· Such a relationship typically runs from Foreign Key Columns in one Table to Primary Key
· Example:
· The accountno Column in account is the Primary Key which uniquely identifies an account
· The accountno Column in ledgers, lotshares, netshares, lot_costs and net_costs is:
1. part of the Primary Key
2. a Foreign Key, since it is designed to 'point' to a matching account Row ( it is 'related' )
· Knowledge of such data relationships is crucial to doing Joins Join Syntax
· Specifying more than one Table in the FROM clause always implies a Join (whether you intend it or not)
· The WHERE clause must contain necessary Join information ( the 'Join Predicate' ) to only find those matchings where a common data value is found
· The WHERE clause contains as a minimum, Join syntax showing the linkage among all joined Tables
· If Column names in multiple Tables are spelled the same, ambiguity would result, and the RDBMS will reject the query; this requires the use of a Qualified Column Name, using 'dot notation' as seen below
· Column names in multiple Tables may also be qualified with a table alias as seen below
Join Examples
· Joins which find no matching Rows in one of the Tables create an empty set
· Joins which do not specify an appropriate WHERE clause can cause a potentially massive Result Set, since every Row in each Table specified is actually 'multiplied' by every other Row in every other Table ( Unrestricted Join )
· Large unrestricted Joins may crash the RDBMS Server, as potentially millions or more Rows are generated for the result. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||