Spread your wings

April 24, 2014

The very first time a baby bird tumbles of its nest it has one of two options. It can give in to the fear, forget what it was designed for, and plummet to the earth. Or it can spread its wings, draw forth all its courage, and leap into that great big sky with the intention of soaring across it. You are now faced with a similar choice. You are no baby bird, but you are just as sensitive and at times as vulnerable. But if you make up your mind, there is noting you cant do. Spread your wings.

How do you find the current url of a document in phantomjs ?

February 24, 2014

If you simply do the following in PhantomJS:

console.log( "- current url is " + document.URL );

then you will see the javascript filename you are running with PhantomJS.

If you want to see the URL of the currently loaded page, however, then you have to do it within the loaded page’s sandbox:

var url = page.evaluate(
    function () {
        return document.URL;
console.log( "- current url is " + url );

Difference between text based, headless and normal browser

February 24, 2014


How to find the memory consumption of a particular process in linux for every 5 seconds

December 19, 2013

top -p PID

What does the US Government Shutdown mean for the economy

October 1, 2013

The Government shutdown means that from tomorrow government employees would not be paid for work or would have to work without pay in some cases for an indefinite time.

This means a lot of hardships for their families. Though President Obama has signed a law confirming that citizens on military duty will be paid their salaries and will continue to be on duty.

Nevertheless, what this means for the economy is that the spending power of the consumer is going to go down and so is the investor confidence. Thought the real impact of this shutdown will depend on the time for which the shutdown happens. The longer it stays the worse it is going to get. Businesses are going to shrink as the consumer spending power goes down, companies will stop new investment plans and hiring plans and this spiral will begin. This may potentially slash off some GDP points from the economy.

Now, what makes it more scary is the timing. This is because of the overlooming debt ceiling issue. When the debt ceiling will be reached is a difficult question to answer. However, according to predictions, its going to be somewhere between the October 18 to November 5 period.

If both these issues are not handled with care, it could have huge recessionary impacts on the US economy.

What it means for various US services :


Methods of Data Collection and Biases

September 30, 2013

Methods :

  1. Simple Random Sampling
  2. Stratified Sampling – divide the population into non-overlapping subgroups called strata and choose SRS within each subgroup. Thus the variance within each subgroup is less than the overall population variance.
  3. Cluster Sampling
  4. Systematic Sampling – select the kth item – hidden patterns
  5. Convenience or Volunteer Sampling : select the first n points
  6. Convenience or Volunteer Sampling


Bias : 

  1. Selection Bias – predicting polls from twitter data.
  2. Measurement or Response Bias – the type of questions such that the people who answer it differ from the people who are not answering it. 
  3. Non-response Bias – if the individuals responding differ systematically from the people who are not responding. For example : a mandatory survey in canada which was sent to 1/5th of the people was changed to optional and was sent to 1/3rd of the people. Since, the response was voluntary and not mandatory, new immigrants were much less likely to respond to this survey.

Descriptive Statistics – starting with the data

September 27, 2013

There are the kinds of analysis that you can do when you start with any data set. This may be the starting point of all data science projects and it will give insights about the data. This is essential for both statisticians and also for consumer of statistical reports.

For quatitative variables :

  1. minimum, maximum
  2. median, quartile, inter quartile rang
  3. box plots
  4. mean
  5. spread of the data – standard deviation – sometimes there may be gaps in the data when we plot it as a histogram – outliers. When there are underlying special rules in the way the data is being generated, then there will be outliers in the data. For example : Some football clubs can play foreign players salaries above the salary cap, this will produce outlier salaries for those players. Another example : the top deal or product in an ecommerce site, gets the highest clicks by virtue of its position. This will create an outlier if ctr is considered, if the deals are ranked. Cleaning the data is an important first step in any statistical analysis. It is important to understand the reasons behind the outliers. In some cases, it is good to remove the outliers and in some cases it is not so good as we might lose valuable data signals. It is not unusual to report findings both with and without outliers.
  6. shape of the data – histograms
  7. skewed vs non-skewed, symmetric vs non-symmetric
  8. left skewed or negatively skewed – where it has a long left tail – mean < median < mode – the difference between the 3rd quartile and the median is smaller than the difference between the 1st quartile and median
  9. right skewed or positively skewed – where it has a long right tail
  10. extreme values or outliers – sometimes the data has a much better uniform shape when the outliers are removed

For categorical variables

  1. bar charts
  2. pie charts
  3. Examining the relationship between a quantitative variable and a categorical variable involves comparing the values of the quantitative variable among the groups defined by the categorical variable.

Missing Values

We must understand why the data for some of the variables are missing and the fact that they are missing might bias the result of our work.