CAP Theorem

January 25, 2013

The CAP theorem or the Brewer’s theorem states that a distributed system can only guarantee two out of the three :

  • Consistency – results of earlier writes on a node are read by read operations on the node
  • Availability – a guarantee that every request receives a success/failure response
  • Partition – the system continues to operate despite a failure of a subsystem or some message loss

This is similar to the three constraints of project management where you can choose two constraints and the third gets decided : time, cost and scope.

Henry Robinson has given a shown a very good understanding of the concept on Quora .

An informal proof which helps the intuition from above :

             “The intuition behind this result is as follows: to be consistent, all nodes have to see the same set of updates in the same order. But if the network suffers a partition, updates in one partition might not make it to the other partition before a client reads from the out-of-date partition *after* having read from the up-to-date one. The only thing you can do to cope with this possibility is to stop serving requests from the out-of-date partition, but then the service is no longer 100% available. “

                “Since, until there is a failure, it is relatively easy to guarantee availability and consistency in well-behaved executions, so some systems gracefully degrade their consistency or availability guarantees only at the point of failure. “

After gaining this understanding, some notable points from InfoQ article on CAP

  • CAP prohibits only a tiny part of the design space: perfect availability and consistency in the presence of partitions, which are rare.
  • ACID properties focuss on consistency
  • BASE properties focus on availability

Relevant links

ACID is a set of properties that apply specifically to database transactions
The CAP theorem is a set of basic requirements that describe any distributed system (not just storage/database systems).
http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed


Architecture related posts on Quora

January 25, 2013

1) http://www.quora.com/What-is-LinkedIn-s-database-architecture-like
2) http://www.quora.com/eBay/What-is-eBays-architecture
3) http://www.quora.com/Facebook-Engineering/What-is-Facebooks-architecture
4)


Revising awk

January 15, 2013

Awk basics :

NR: Keeps a current count of the number of input lines.

• NF: Keeps a count of the number of words in an input line. The last field in the input line can be designated by $NF.

• FILENAME: Contains the name of the current input file.

• FS: Contains the field separator character. The default is “white space”, meaning space and tab characters. FS can be reassigned to another character to change the field separator.

• RS: Stores the current “record separator” character. Since, by default, an input line is the input record, the default record separator character is a “newline”.

• OFS: Stores the “output field separator”, which separates the fields when Awk prints them. The default is a “space” character.

• ORS: Stores the “output record separator”, which separates the output lines when Awk prints them. The default is a “newline” character.

• OFMT: Stores the format for numeric output. The default format is “%.6g”, which will be explained when “printf” is discussed

Awk examples :

  1.  Write a command to find the sum of bytes (size of file) of all files in a
    directory.
    ls -l | awk ‘BEGIN {sum=0} {sum = sum + $5} END {print sum}’

References :

  1. http://oreilly.com/catalog/unixnut3/chapter/ch11.html
  2. http://www.chemie.fu-berlin.de/chemnet/use/info/gawk/gawk_17.html

http://www.catonmat.net/blog/awk-one-liners-explained-part-one/


Learnings this week

January 11, 2013

http://www.ibm.com/developerworks/xml/tutorials/x-schematron/

Feels good to be back to RoR


FIX protocol review

January 11, 2013

Important FIX messages :
1) D : New Order
2) G: Order Replace Request
3) F : Order Cancel Request
4) 8 : Execution Request
5) 9 : Order Cancel Reject


Introduction to R

January 7, 2013

1) How to read a csv file in R ?

data<-read.csv(filename,header=TRUE)

2) How to display the first n lines of the file ?

head(data,n) : The default value of n is 6.

3) How to display the last n lines of the file ?

tail(data,n) 

4) Calculate missing values in all the columns in the data set ?

colSums(data)

Other functions that can be used for this purpose are sapply and apply.

5) Calculate the mean of a column without the missing values ?

colMeans(data,na.rm=TRUE)
     Ozone    Solar.R       Wind       Temp      Month        Day 
 42.129310 185.931507   9.957516  77.882353   6.993464  15.803922 
 colMeans(data)
    Ozone   Solar.R      Wind      Temp     Month       Day 
       NA        NA  9.957516 77.882353  6.993464 15.803922 
 colMeans(data["Ozone"],na.rm=TRUE)
   Ozone 
42.12931 

6) Extract the subset of rows of the data frame where Ozone values are above 31 and Temp values are above 90. What is the mean of Solar.R in this subset?

colMeans(subset(data,(Ozone&gt;31 &amp; Temp&gt;90)))
 Ozone Solar.R    Wind    Temp   Month     Day 
 89.5   212.8     5.6    93.4     8.2    14.5

Additional info on Subset

7) Find the mean temperature in the Month of n ?

colMeans(subset(data,Month==n))
    Ozone   Solar.R      Wind      Temp     Month       Day 
    NA 190.16667  10.26667  79.10000   6.00000  15.50000 

Additional Resources :
1) Filling in nas with column medians in R

2) Apply function and its variants