Monday, March 26, 2012

Python basic tutorial

C:\Python27\ python         #cmd prompt sart
import os                          # import a module
dir(os)                              # list all parameters in os module
os.listdir("C:\users\iss")  # list * files in a folder
help(modulename)           #To get the docs on all the functions at once

import aa                          #import a method 1
from aa import *               #import a method 2

# String Formatting
name = "John"
age = 23
print "%s is %d years old." % (name, age)

mylist = [1,2,3]
print "A list: %s" % mylist       # List print

%s  - String (or any object with a string representation, like numbers)
%d  - Integers
%f   - Floating point numbers
%. f - Floating point numbers with a fixed amount of digits to the right of the dot.
%x/%X - Integers in hex representation (lowercase/uppercase)

s = "Hey there! what should this string be?"
print "Length of s = %d" % len(s)                                                # Length = 38
print "The first occurrence of the letter a = %d" % s.index("a")  # First occurrence of "a" =13
print "a occurs %d times" % s.count("a")                                    # Number of a's =1
# Slicing the string into bits
print "The first five characters are '%s'" % s[:5]                # Start to 5
print "The next five characters are '%s'" % s[5:10]            # 5 to 10
print "The twelfth character is '%s'" % s[12]                     # Just number 12
print "The last five characters are '%s'" % s[-5:]               # 5th-from-last to end

print "String in uppercase: %s" % s.upper()                      # Convert everything to uppercase
print "String in lowercase: %s" % s.lower()                     # Convert everything to lowercase

# Check how a string starts
s = "Str Hey there! what should this string be? some"
if s.startswith("Str"):
    print "String starts with 'Str'. Good!"
# Check how a string ends
if s.endswith("ome!"):
    print "String ends with 'ome!'. Good!"
# Split the string into three separate strings
print "Split the words of the string: %s" % s.split(" ")

#Basic Operators-------------------------------------------------
number = 1 + 2 * 3 / 4.0
remainder = 11 % 3
squared = 7 ** 2
cubed = 2 ** 3
helloworld = "hello" + " " + "world"
lotsofhellos = "hello" * 10                               # multiplying strings to form a string
even_numbers = [2,4,6,8]
odd_numbers = [1,3,5,7]
all_numbers = odd_numbers + even_numbers #Lists can be joined
print [1,2,3] * 3                                                #repeating sequence

x = 2
print x == 2  # prints True
print x == 3  # prints False
print x < 3   # prints True

name = "John"
age = 23
if name == "John" and age == 23:
    print "Your name is John, and you are also 23 years old."
if name == "John" or name == "Rick":
    print "Your name is either John or Rick."

#The "for" loop - Prints out the numbers 0,1,2,3,4
for x in xrange(5):
    print x
# Prints out 3,4,5
for x in xrange(3,6):
    print x

#"while" loops - Prints out 0,1,2,3,4
count = 0
while count < 5:
    print count
    count += 1
#break is used to exit a for loop or a while loop, whereas continue is used to skip the
#current block, and return to the "for" or "while"
count = 0
while True:
    print count
    if count > 5:
# Prints out only odd numbers - 1,3,5,7,9
for x in xrange(10):
    # Check if x is even
    if x % 2 == 0:
    print x

def func2(x, y, z, m):
    print x+y+z+m

#function within fuunction
def wrapper1(func, *args): # with star
def wrapper2(func, args): # without star ..positional args
def func2(x, y, z, m):
    print x+y+z+m

wrapper1(func2, 1, 2, 3, 4)
wrapper2(func2, [1, 2, 3, 4])

#multiple function argument
def foo(xxx, yyy, zzz, *mmm):
    print "First: %s" % xxx
    print "Second: %s" % yyy
    print "Third: %s" % zzz
    print "And all the rest... %s" % mmm

#classes and objects-----------------------------------------------
class MyClass:
      variable = "blah" # define variable withing functtion
      def function(self): # define variable
           print "This is a message inside the class."
myobjectx = MyClass()      # assign class to an object
print myobjectx.variable    # access variable within class
print myobjectx.function()  # access function within class

myobjecty = MyClass()
myobjecty.variable = "yackity"
print myobjecty.variable

#A dictionary is a data type similar to arrays, but works with keys and values instead of
#indexes. Each value stored in a dictionary can be accessed using a key, which is any type of
#object (a string, a number, a list, etc.) instead of using its index to address it

phonebook = {}
phonebook["John"] = 938477566
phonebook["Jack"] = 938377264
phonebook = {
    "John" : 938477566,
    "Jack" : 938377264
#dictionary, unlike a list, does not keep the order of the values stored in it.
for name, number in phonebook.iteritems():
    print "Phone number of %s is %d" % (name, number)

#To remove a specified index, use either one of the following notations:
del phonebook["John"] or

#Generators are simple functions which return an iterable set of items, one at a time, in a special way.
import random
def lottery():
    # returns 6 numbers between 1 and 40
    for i in xrange(6):
        yield random.randint(1, 40)
    # returns a 7th number between 1 and 15
    yield random.randint(1,15)

for random_number in lottery():
    print "And the next number is... %d!" % random_number

#List Comprehensions--------------------------------------------------
#creates a new list based on another list, in a single, readable line
sentence = "the quick brown fox jumps over the lazy dog"
words = sentence.split()
word_lengths = [len(word) for word in words if word != "the"]
print words
print word_lengths

#regular expression----------------------------------------------------
#Search and Replace: Some of the most important re methods that use regular expressions is sub.
#re.sub(pattern, repl, string, max=0)
#replace all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided.
#This method would return modified string.
phone = "2004-959-559          #This is Phone Number"
num = re.sub(r'#.', "", phone)  # replace "This"  = 2004-959-559 #is Phone Number
num = re.sub(r'#.*$', "", phone) # replace All after # mark with "" = 2004-959-559
num = re.sub(r'\D', "", phone)   # replace anything other than digits with ""  = 2004959559
print "Phone Num : ", num

#Pattern Description
^ Matches beginning of line.
$ Matches end of line.
. Matches any single character except newline. Using m option allows it to match newline as well.
[...] Matches any single character in brackets.
[^...] Matches any single character not in brackets
re* Matches 0 or more occurrences of preceding expression.
re+ Matches 1 or more occurrence of preceding expression.
re? Matches 0 or 1 occurrence of preceding expression.
re{ n} Matches exactly n number of occurrences of preceding expression.
re{ n,} Matches n or more occurrences of preceding expression.
re{ n, m} Matches at least n and at most m occurrences of preceding expression.
a| b Matches either a or b.
(re) Groups regular expressions and remembers matched text.
(?imx) Temporarily toggles on i, m, or x options within a regular expression. If in parentheses, only that area is affected.
(?-imx) Temporarily toggles off i, m, or x options within a regular expression. If in parentheses, only that area is affected.
(?: re) Groups regular expressions without remembering matched text.
(?imx: re) Temporarily toggles on i, m, or x options within parentheses.
(?-imx: re) Temporarily toggles off i, m, or x options within parentheses.
(?#...) Comment.
(?= re) Specifies position using a pattern. Doesn't have a range.
(?! re) Specifies position using pattern negation. Doesn't have a range.
(?> re) Matches independent pattern without backtracking.
\w Matches word characters.
\W Matches nonword characters.
\s Matches whitespace. Equivalent to [\t\n\r\f].
\S Matches nonwhitespace.
\d Matches digits. Equivalent to [0-9].
\D Matches nondigits.
\A Matches beginning of string.
\Z Matches end of string. If a newline exists, it matches just before newline.
\z Matches end of string.
\G Matches point where last match finished.
\b Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets.
\B Matches nonword boundaries.
\n, \t, etc. Matches newlines, carriage returns, tabs, etc.
\1...\9 Matches nth grouped subexpression.
\10 Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code.

File handler and line counts
f = open('received/20000402172012MARK0OUT.DAT', 'r')
s = open('received/20000402172012MARK0.NA', 'r')
num_lines = sum(1 for line in f)
num_liness= sum(1 for line in s)
print num_lines + num_liness

Advantage database commands

Permanently delete recodes from a table
execute procedure sp_zaptable 

Select column, data type and status from a table
select name, field_type, field_can_be_null from System.columns
where parent = 'rec_sample_test'
order by 2,1

Merge tables

MERGE web_sample_test p1 USING "//loki/web/secure/82124/cust/US/live/rec/study.add".sample p2 
ON ( p1.sam_no = p2.sam_no and p1.waveno =p2.waveno) 
UPDATE SET p1.surname = p2.surname, p1.forename = p2.forename 
INSERT (sam_no, waveno, forename, surname) VALUES (p2.sam_no, p2.waveno, p2.forename, p2.surname)

Advantage database server index

PostgreSQL 9.2 Volume 2 Part ii

Chapter 11. Indexes
If there are many rows  a tabl and only a few rows (perhaps zero or one) that would be returned by a query, this is clearly an inefficient method. But if the system has been instructed to maintain an index on the id column, it can use a more efficient method for locating matching rows. For instance, it might only have to walk a few levels deep into a search tree.
CREATE INDEX test1_id_index ON test1 (id);
B-trees indexes can handle equality and range queries on data.involved in a comparison using one of these operators: <, <=, =, >=, >, BETWEEN, IN, IS NULL, IS NOT NULL and patter matching (LIKE). B-tree indexes can also be used to retrieve data in sorted order. This is not always faster than a simple scan and sort, but it is often helpful.
B-tree indexes can also be used to retrieve data in sorted order. This is not always faster than a simple scan and sort, but it is often helpful. Hash indexes can only handle simple equality comparisons. Involved in a comparison using the = operator. 
CREATE INDEX name ON table USING hash (column);
Multi column Indexes 
CREATE INDEX test2_mm_idx ON test2 (major, minor);
Chapter 12. Full Text Search
Chapter 13. Concurrency Control
data consistency is maintained by using a multiversion model (Multiversion Concurrency Control, MVCC). This means that while querying a database each transaction sees a snapshot of data as it was some time ago, regardless of the current state of the underlying data. This protects the transaction from viewing inconsistent data that could be caused by (other) concurrent transaction updates on the same data rows, providing transaction isolation for each database session.
MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data  which guarantee even when providing the strictest level of transaction isolation through the use of an innovative Serializable Snapshot Isolation (SSI) level.

13.2. Transaction Isolation

The SQL standard defines four levels of transaction isolation. The most strict is Serializable.The phenomena which are prohibited are various levels are:
Dirty readA transaction reads data written by a concurrent uncommitted transaction.
Non-repeatable read - A transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read).
phantom read - A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction.
The four transaction isolation levels and the corresponding behaviors are described in Table 13-1.
Transaction Isolation Levels

Isolation LevelDirty ReadNonrepeatable ReadPhantom Read
Read uncommittedPossiblePossiblePossible
Read committedNot possiblePossiblePossible
Repeatable readNot possibleNot possiblePossible
SerializableNot possibleNot possibleNot possible
13.3. Explicit Locking 
Chapter 14. Performance Tips
14.1. Using EXPLAIN

PostgreSQL devises a query plan for each query it receives. You can use the EXPLAIN command to see what query plan the planner creates for any query. 

The structure of a query plan is a tree of plan nodes. The first line (topmost node) has the estimated total execution cost for the plan; it is this number that the planner seeks to minimize.

                         QUERY PLAN
 Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)
The numbers that are quoted by EXPLAIN are (left to right): 
  • Estimated start-up cost (time expended before the output scan can start, e.g., time to do the sorting in a sort node) 
  • Estimated total cost (if all rows are retrieved, though they might not be; e.g., a query with a LIMIT clause will stop short of paying the total cost of the Limit plan node's input node) 
  • Estimated number of rows output by this plan node (again, only if executed to completion) 
  • Estimated average width (in bytes) of rows output by this plan node 

14.4. Populating a Database

One might need to insert a large amount of data when first populating a database.

** Disable Auto commit

When using multiple INSERTs, turn off autocommit and just do one commit at the end. An additional benefit of doing all insertions in one transaction is that if the insertion of one row were to fail then the insertion of all rows inserted up to that point would be rolled back, so you won't be stuck with partially loaded data.

** Use COPY
Use COPY to load all the rows in one command, instead of using a series of INSERT commands. The COPY command is optimized for loading large numbers of rows; it is less flexible than INSERT, but incurs significantly less overhead for large data loads. Since COPY is a single command, there is no need to disable autocommit if you use this method to populate a table.
COPY is fastest when used within the same transaction as an earlier CREATE TABLE or TRUNCATE command. In such cases no WAL needs to be written, because in case of an error, the files containing the newly loaded data will be removed anyway. However, this consideration only applies when wal_level is minimal as all commands must write WAL otherwise.
** Remove Indexes
If you are loading a freshly created table, the fastest method is to create the table, bulk load the table's data using COPY, then create any indexes needed for the table. Creating an index on pre-existing data is quicker than updating it incrementally as each row is loaded.
If you are adding large amounts of data to an existing table, it might be a win to drop the indexes, load the table, and then recreate the indexes. Of course, the database performance for other users might suffer during the time the indexes are missing. One should also think twice before dropping a unique index, since the error checking afforded by the unique constraint will be lost while the index is missing.
** Remove Foreign Key Constraints
Just as with indexes, a foreign key constraint can be checked "in bulk" more efficiently than row-by-row. So it might be useful to drop foreign key constraints, load data, and re-create the constraints. Again.
when you load data into a table with existing foreign key constraints, each new row requires an entry in the server's list of pending trigger events (since it is the firing of a trigger that checks the row's foreign key constraint). Loading many millions of rows can cause the trigger event queue to overflow available memory, leading to intolerable swapping or even outright failure of the command.Alternative method is to split up the load operation into smaller transactions

** Increase maintenance_work_mem

Temporarily increasing the maintenance_work_mem configuration variable. This will help to speed up CREATE INDEX commands and ALTER TABLE ADD FOREIGN KEY commands. It won't do much for COPY itself, so this advice is only useful when you are using one or both of the above techniques.

** Increase checkpoint_segments

Temporarily increasing the "checkpoint_segments" configuration variable. This is because loading a large amount of data into PostgreSQL will cause checkpoints to occur more often than the normal checkpoint frequency (specified by the checkpoint_timeout configuration variable). Whenever a checkpoint occurs, all dirty pages must be flushed to disk. By increasing checkpoint_segments temporarily during bulk data loads, the number of checkpoints that are required can be reduced.
** Run ANALYZE Afterwards
Whenever you have significantly altered the distribution of data within a table, running ANALYZE is strongly recommended. which ensures that the planner has up-to-date statistics about the table. With no statistics or obsolete statistics, the planner might make poor decisions during query planning. Note that if the autovacuum daemon is enabled, it might run ANALYZE automatically