Thursday, July 13, 2006

 

Post the Eighteenth

Wherein your Host Announces that Phase II is Almost Complete

Life Plan
Phase I: Graduate School
Phase II: ICPSR
Phase III: ????
Phase IV: PROFIT

Only a week and some change left, I'm coming home soon everyone!

Labels:


Monday, July 10, 2006

 

Post the Seventeenth

Wherein your Host Praises the Gamma Distribution

Your host is a stupid man. But he is a stupid man who recently discovered one of his own mistakes and so is feeling pretty good right now!

I have a project that I’ve been working on for a year or more now and it just isn't going anywhere. I think the underlying ideas are interesting and important but the analysis wasn't holding up to statistical scrutiny.

One of the problems I have had with quantitative methods is learning to think statistically. This is hard to do when there is just so much out there that I simply haven’t learned yet. How could I think of framing a research question on differences in variation across some category when I didn’t know that heteroscedastic regression existed? How could I come up with a topic that examines different effects across various levels an observation is nested in when I didn’t know about hierarchical models?

It is also hard for me to think statistically because, as I have related previously I am math phobic. So today in class we were covering Generalized Linear Models. This is something a very able professor at UVA had attempted to teach me previously but we only covered the binomial and Poisson distributions. She had informed us that other probability distributions were available for analysis but, in my math-stupidity, I didn’t fully grasp what this meant.

Now I have discovered the glorious gamma distribution which far better fits my data than the normal, binomial or Poisson distributions. I get it now!

This:

Looks more like this (the red line):


Than it does like any of these:


After very quickly running some new analyses, it appears that, indeed, my hypothesized relationships do hold up to scrutiny (given a gamma distribution) and I might have a good conference paper (cross my fingers and pray it is publishable) on my hands now.

Oh the things one learns at math camp! Angels, saints, ministers of grace and methodologists pray for the humble student Nathan of modest mind who tries so hard yet has so far to go.

Credo ut intelligam.

Labels:


 

Post the Sixteenth

Wherein your Host Displays "Non-Quantitative" Tables

Today in class, a professor referred to "quantitative tables” – certainly a redundancy which he recognized – but it made me wonder what “non-quantitative tables” would look like...

... so I made up two sample "non-quantitative" tables for the qualitative methods folks and historians to use. Knock yourselves out guys:

Labels: ,


Thursday, July 06, 2006

 

Post the Fourteenth

Wherein your Host Demonstrates Why R is Superior to STATA

To run a heteroskedastic regression in STATA where the independent variable is the vote gap between Republican and Democratic candidates (votegap) and the dependent variables are the partisan polls of the two major party candidates (dempoll, reppoll), the gap between these two polls (pollgap), dummy variables showing whether an incumbent is running (deminc, repinc) and we wanted to see if variance changed by days before the election (days2go) and depending on who was conducting the poll (dempoll, repopll) we would type:

ml model lf hetreg (slopes:votegap=dempoll reppoll pollgap deminc repinc) (variance: days2go dempoll reppoll)

-----------------------------------------------------------------------------------------------

In R, to run the same model, we would type:

hetreg<-function(y,X,Z,method=’BFGS’,Xnames=colnames(X),Znames=colnames(Z)) X<-cbind(1,X) colnames(X)[1]<-“Constant” nx<-ncol(X) Z<-cbind(1,Z) colnames(Z)[1]<-“Z Constant” nz<-ncol(Z)

negln<-function(theta,X,Z,y){
b<-theta[1:ncol(x)]
g<-theta[ncol(X)+1:ncol(Z)] lnl<-as.vector(-.5*(Z%*%g)-(.5/exp(Z%*%g))*(y-X%*%b)^2) -sum(ln)}

result<-c(optim(c(mean(y),rep(0,ncol(X)-1),log(var(y)),
rep(0,ncol(Z=neglnl, hessian=T, method=method, X=X, Z=Z, y=y),
list(varnames=c(Xnames,Znames),nx=nx,nz=nz))
class(result)<-“hatreg” return(result)

print.hetreg<-function(object{
coef<-object$par
names(coef)<-object$varnames print(coef)
if(object$convergence==0) cat(‘\n hetreg converged\n’)
if(!object$convergence==0) cat(‘n\ *** hetreg failed to converge *** \n’) invisible(object)}

summary.hetreg<-function(object, cover=FALSE){
coef<-object$par names(coef)<-object$varnames
nx<-object$nx nz<-object$nz
maxl<-object$value
vc<-solve(object$hessian)
colnames(vc)<-names(coef)
rownames(vc)<-names(coef)
se<-sqrt(diag(vc))
zscore<-coef/se
pz<- pnorm*-2(-abs(coef/se))
dn<-c(“Estimate”, “Std.Error”)
coef.table<-cbind(coef,se,zscore,pz) dimnames(coef.table)<-list(names(coef),c(dn,”z-value”, “Pr(>|z|)”))}

cat(“\n Heteroskedastic Linear Regression by Nathan A. Jones, Esq. of the Mad R Skillz \n”)
cat(“\n Estimated Parameters \n”)
print(coef.table)
cat(“\n Log-Likelihood: “,-object$value, “\n”)

if(cover{
cat(“\n Variance-Covariance Matrix for Parameters \n”
print(vc)}

ghat<-coef[(nx+2):length(coef)]
gvc<-vc[(nx+2):length(coef),(nx+2):length(coef)] wald<-t(ghat)%*%solve(gvc)%*%ghat
pwald<- -1-pchisq(wald,nz-1)
cat(“\n Wald Statistic: “,wald,”with”, nz-1, “degrees of freedom\n”) cat(“ p=”,pwald,”\n”)}

hregl<-hetreg(votegap,cbind(dempoll,reppoll,deminc,repinc,pollgap), cbind(days2go,dempoll,reppoll))

summary(hreg1)

-----------------------------------------------------------------------------------------------

The question here is: why code myself in R when someone far smarter than I am (Charles Franklin) has already coded the same formula in STATA for me?

One obvious answer is that by coding myself (see above), I can make the printout say "Heteroskedastic Linear Regression by Nathan A. Jones, Esq. of the Mad R Skillz" at the top of my computer screen. That, in and of itself, must be worth SOMETHING because the STATA print out just says "Results." Bo-RING.

I think I can see now why R is so much better than STATA.

Labels:


 

Post the Thirteenth

Wherein Your Host Presents A Comic



Labels: , ,


 

Post the Twelfth

Wherein your Host Recounts a Math Joke

A herpetologist grew frustrated while trying to mate two endangered snakes. After months of work she threw up her hands and exclaimed, “Nothing I’ve tried will get these snakes to breed!” One of the snakes looked up and said to her, “you could try dimming the lights.” The herpetologist was surprised at the talking snake but turned down the lights anyway.

A few weeks later, the snakes had still not yet mated and the herpetologist asked them: “I turned down the lights, is there anything else you need?” The second snake said, “Dimming the lights helped, but it still isn’t very romantic – could you put on some good music?” So the herpetologist got a Barry White album from her car and played some sweet soulful tunes near their cage.

A few weeks later, the snakes had still not yet mated and the herpetologist asked: “I turned down the lights and put on some romantic music, why aren’t you breeding?” The first snake said: “Well, it might seem silly, but back in our native jungle we had a coffee table made of wood that we really liked. If you built a table just like that in our cage, it would probably help.”

So the herpetologist got some logs and built the table and left it in the cage. A few weeks later she came back and there were hundreds of baby snakes. This story just goes to show that “with a log table even an adder can multiply.”

log(ab)=log(a)+log(b)

exp(log(a)+log(b))=ab

Labels: ,


Sunday, July 02, 2006

 

Post the Tenth

Wherein your Host Submits a T-Shit Design

So there is a contest to design the t-shirt for summer methods camp and your host intends to win. My “real” entry will be a rather tame shirt that says “ICPSR Summer Methods Camp 2006” on the front and then has the R code to program that display on the back.

(Match THAT shirt with a pair of shorts and some black socks and you will really turn on the ladies, methods boys.)

My “other” design is “Chuck Norris versus the Quantitative Methodologists” and is two columns on the back:
* Chuck Norris does not sleep – he waits.
* Methodologists do not sleep – we do problem sets.

* Outer space exists because it is afraid to be in the same place with Chuck Norris.
* Residuals exist because data is afraid to be in the same place as our predictions.

* There is no evolution, just animals Chuck Norris allows to live.
* There is no population, just samples methodologists allow to represent it.

* Chuck Norris is the reason Waldo is hiding.
* Methodologists are the reason undergraduates don’t come to class.

* Chuck Norris counted to infinity – twice.
* Methodologists approach a limit of infinity – every day.

* The chief export of Chuck Norris is pain
* The chief export of methodologists are journal articles you can’t understand.

* Oscar Wilde is the Chuck Norris of words.
* R.A. Fisher is the Chuck Norris of regression.
Which shirt would you rather wear?

More Chuck Norris FACTS

Labels:


Wednesday, June 28, 2006

 

Post the Ninth

Wherein your Host Renounces TeX (with Latin goodness)

Abrenuntias LaTeX?
I renounce thee, LaTeX.

Et omnibus operibus eus?
I renounce thee, LaTeX, since ye will not work on my laptop.

Omnibus pompis eus?
I renounce thee, LaTeX, and all pompous statements about the inferiority of off-the-shelf products.

Exorcie te. Omnis spiritus immunde. Adaperiae!
I will humble myself to Microsoft and repent – LaTeX begone from my laptop!

(posted after a failed epic 8-hour battle just to INSTALL TeX)

Labels: ,


Tuesday, June 27, 2006

 

Post the Eighth

Wherein your Host Witnesses His First Methods Debate

The quantitative methods field is rife with debate and since I am a (lightly-salted) peanut in the Land of Elephants, I have decided to keep a low profile when discussions of preferred methods or software come up.

Personally your host is of opinion that various techniques we learn are like tools in a tool-box. Just as one wouldn’t use a chainsaw to hang a picture on the wall, there are situations when we might prefer OLS to MLE to Bayesian analysis. I would further say that canned statistical software is probably appropriate for most researchers. The extra effort required to learn R is wasted when STATA does 90% of what we need and add-ons like Zelig or R-Commander are available for free.

Your host has been told that publicly admitting to these opinions would cause me to be labeled a methods “goober.”

So I keep quiet and observe. It is interesting to hear dismissive snorts about how such-and-such is a “frequentist” or so-and-so “STILL uses STATA.” But today I witnessed my first bona-fide methods skirmish, however brief, between two methods elephants.

Elephant 1 made the simple statement that OLS should be the preferred method for analysis if a thorough pre and post-regression analysis has been conducted and more complicated models have confirmed the results. In your host’s opinion, this is a rather non-controversial statement since OLS is easiest mode of analysis for the reader (and writer) to interpret and this was precisely Elephant 1’s point.

He was immediately challenged by Elephant 2, apparently a breed of the Bayesian pacaderm family, who preferred a method with “no assumptions.” (This peanut didn’t want to ask about priors.) Elephant 1 was too kind in response (he is Canadian-bred) and simply re-iterated his point that, if other methods confirm the results, OLS was a reliable and time-tested approach. Elephant 2 stormed away muttering “garbage” – since it was, afterall, Elephant 1’s class. Why Elephant 2 was in the class, I will never know…

And so, gentle reader, if you ever want to go to Math Camp, you have two options: be a methods chauvinist or be quiet. Some might be lucky enough to be born Canadian and, hence, able to express reasonable opinions but the rest of us have to choose.

Labels:


Monday, June 26, 2006

 

Post the Seventh

Wherein your Host Displays the His First Feeble R Efforts

A. The Old Standby:



B. Another Good Friend (Improved with 95% BOOTSTRAPPED confidence interval goodness):



C. Ye Olde Box Plots (Note the Outliers -- In R you can click on them to identify -- woot!!!):



D. Regular and "Jittered" Scatterplots (with a random element added to "stacked observations")


Labels: ,


Sunday, June 25, 2006

 

Post the Sixth

Wherein your Host Arrives in Ann Arbor

Arrived in Ann Arbor today for Math Camp.

I have decided to stay in a student-run co-op since it is half the price of all other living options. For the uninitiated, a co-op is a run-down building where students live in squalor for roughly half the price of all other living options.

Appropriately, the co-op is called "Jones House." One shudders to imagine the dastardly crime one of my ancestors must have committed in order to have a co-op named after him...

... I missed this when I signed up for the place. Apparently members of Jones House are on probation for "deliberate destruction of property."

Should be a fun month...

Labels:


Monday, May 22, 2006

 

Post the Third

Wherein your Host Discusses His Occupation

Ye might ask: if our Host is not very good with numbers, then why is he be going to Math Camp?

The background story goes that, after graduating from Cal, like all good political science majors, your Host wanted to be a lawyer. Unfortunately I was poor and I graduated in the fall semester (3.5 years in college) so I had to get a job. Attempting to kill two birds with one stone, I worked as a paralegal in a lawfirm.

And I hated it.

I hated it so much that I quit within six months. Working in a law firm taught me one thing: I didn't want to be a lawyer. Learning this one thing, however, deprived me of the only reason I went to college (to be a lawyer). So I banged around for a bit (5 years) doing different jobs and eventually I decided that the NEW one-thing-I-really-wanted-to-do was school. I applied to graduate programs in political science (ending up at WW-DOP) and decided that I really liked the SCIENCE part.

That means I need to study more math. That means I need to go to Math Camp. That means I'm nervous.

Now you get it.

Labels: ,


 

Post the Second

Wherein your Host Discusses Math Camp

My math background is weak and so I find the prospect of Math Camp quite frightening.

A quick story from high school will let you know just how poor my math skillz are: once during a trig test, the teacher allowed us students "one page of notes." Rather than study, the 16-year-old-version-of-your-Host, who has always been creative in a pinch and terminally lazy, secured a giant piece of graph paper about 3' x 5' and transcribed the entire text book chapter upon it. When the test was about to start, I produced said 3' x 5' "one page of notes" to much mirth and amusement. The teacher rolled her eyes and said: "it won't help you anyway." It didn't.

The moral of the story is that future tests specified 'one 8.5" x 11" sheet of paper' and I didn't learn or care to learn much math as a young fellow. If only the 30 year-old me would have appeared to the 16 year-old me and ordered myself to study. I wonder what the Math Camp Instructors will do if I use 3' x 5' note paper in class?

Labels: ,


 

Post the First

Wherein your Host Starts a Blog

At the suggestion of some friends, I will start a blog to keep track of my exploits in Michigan at Math Camp in July. It is also true that I have never been very good at keeping in touch with family and friends and so I hope that those who have an inerest in my (mis)adventures might see the posts here and know that I am alive and well. Welcome all.

Labels:


This page is powered by Blogger. Isn't yours?