October 1, 2013
The Government shutdown means that from tomorrow government employees would not be paid for work or would have to work without pay in some cases for an indefinite time.
This means a lot of hardships for their families. Though President Obama has signed a law confirming that citizens on military duty will be paid their salaries and will continue to be on duty.
Nevertheless, what this means for the economy is that the spending power of the consumer is going to go down and so is the investor confidence. Thought the real impact of this shutdown will depend on the time for which the shutdown happens. The longer it stays the worse it is going to get. Businesses are going to shrink as the consumer spending power goes down, companies will stop new investment plans and hiring plans and this spiral will begin. This may potentially slash off some GDP points from the economy.
Now, what makes it more scary is the timing. This is because of the overlooming debt ceiling issue. When the debt ceiling will be reached is a difficult question to answer. However, according to predictions, its going to be somewhere between the October 18 to November 5 period.
If both these issues are not handled with care, it could have huge recessionary impacts on the US economy.
What it means for various US services :
September 27, 2013
There are the kinds of analysis that you can do when you start with any data set. This may be the starting point of all data science projects and it will give insights about the data. This is essential for both statisticians and also for consumer of statistical reports.
For quatitative variables :
- minimum, maximum
- median, quartile, inter quartile rang
- box plots
- spread of the data – standard deviation – sometimes there may be gaps in the data when we plot it as a histogram – outliers. When there are underlying special rules in the way the data is being generated, then there will be outliers in the data. For example : Some football clubs can play foreign players salaries above the salary cap, this will produce outlier salaries for those players. Another example : the top deal or product in an ecommerce site, gets the highest clicks by virtue of its position. This will create an outlier if ctr is considered, if the deals are ranked. Cleaning the data is an important first step in any statistical analysis. It is important to understand the reasons behind the outliers. In some cases, it is good to remove the outliers and in some cases it is not so good as we might lose valuable data signals. It is not unusual to report findings both with and without outliers.
- shape of the data – histograms
- skewed vs non-skewed, symmetric vs non-symmetric
- left skewed or negatively skewed – where it has a long left tail – mean < median < mode – the difference between the 3rd quartile and the median is smaller than the difference between the 1st quartile and median
- right skewed or positively skewed – where it has a long right tail
- extreme values or outliers – sometimes the data has a much better uniform shape when the outliers are removed
For categorical variables
- bar charts
- pie charts
- Examining the relationship between a quantitative variable and a categorical variable involves comparing the values of the quantitative variable among the groups defined by the categorical variable.
We must understand why the data for some of the variables are missing and the fact that they are missing might bias the result of our work.
September 25, 2013
Dependent variable: a variable that represents the aspect of the world that the experimenter predicts will be affected by the independent variable.
Descriptive statistics: procedures used to summarize, organize, and simplify data.
Double blind experiment: an experiment in which neither the experimenter nor the subject knows whether the treatment is experimental or control.
Independent variable: a variable manipulated by the experimenter.
Inferential statistics: procedures that allow for generalizations about population parameters based on sample statistics.
Parameter: a numerical measure that describes a characteristic of a population.
Population: the entire collection of cases to which one attempts to generalize.
Sample: a subset of the population.
Statistic: a numerical measure that describes a characteristic of a sample.
Quasi-independent variable: a variable that resembles an independent variable but is not manipulated by the experimenter.
September 25, 2013
Today I started reading the Moneyball. Back to Michael Lewis after almost two years. And guess what, the current buzz word in the valley is “Data Science”.
I was having a discussion with my manager regarding hiring a candidate for an open position. During the reviews meeting, we reached to a conclusion that the candidate was not so ok on machine learning and not so ok on programming. So, somebody in the room cracked a joke “sounds like a data scientist”.
But jokes apart, statistics, machine learning and programming put together is a formidable skillset in the industry today. So, I have decided to start a series of blog posts as a statistics refresher for myself.
And guess what, 2013 is also the international year of statistics. Sounds coincidental.
September 25, 2013
Michael Lewis has a gripping writing style. He talks about different industries. Two years back, I had read Lewis Poker, before I joined investment banking. At that point of time, I had only some idea that I had gathered from variable sources on the internet about banking. What I found in Lewis Poker was that Michael Lewis made me feel a part of the industry. Two years later, now that I am reading Moneyball, I am going through the same feeling again. He talks about terms like “a soft tosser” which means not worth my time which the scouts used, .. which makes me feel that I am a part of the industry. His way of engaging the reader is emphatic.
Michael Lewis builds his character in front of the reader and then names the character. The reader goes through the process of transformation of the character quickly and this relates easily. He introduces the character David Beck and gives illustrations of how his hand might twist and turn in different directions. The reader can almost see that in front of him and then he names the David Beck as “The Creature”. Its as if, the reader sees his arms movement and then hears his name and agrees to the fact that he should be called “The Creature”.