Nathan's Notes: Post the Fourteenth

Thursday, July 06, 2006

Post the Fourteenth

Wherein your Host Demonstrates Why R is Superior to STATA

To run a heteroskedastic regression in STATA where the independent variable is the vote gap between Republican and Democratic candidates (votegap) and the dependent variables are the partisan polls of the two major party candidates (dempoll, reppoll), the gap between these two polls (pollgap), dummy variables showing whether an incumbent is running (deminc, repinc) and we wanted to see if variance changed by days before the election (days2go) and depending on who was conducting the poll (dempoll, repopll) we would type:

ml model lf hetreg (slopes:votegap=dempoll reppoll pollgap deminc repinc) (variance: days2go dempoll reppoll)

-----------------------------------------------------------------------------------------------

In R, to run the same model, we would type:

hetreg<-function(y,X,Z,method=’BFGS’,Xnames=colnames(X),Znames=colnames(Z)) X<-cbind(1,X) colnames(X)[1]<-“Constant” nx<-ncol(X) Z<-cbind(1,Z) colnames(Z)[1]<-“Z Constant” nz<-ncol(Z)

negln<-function(theta,X,Z,y){
b<-theta[1:ncol(x)]
g<-theta[ncol(X)+1:ncol(Z)] lnl<-as.vector(-.5*(Z%*%g)-(.5/exp(Z%*%g))*(y-X%*%b)^2) -sum(ln)}

result<-c(optim(c(mean(y),rep(0,ncol(X)-1),log(var(y)),
rep(0,ncol(Z=neglnl, hessian=T, method=method, X=X, Z=Z, y=y),
list(varnames=c(Xnames,Znames),nx=nx,nz=nz))
class(result)<-“hatreg” return(result)
print.hetreg<-function(object{
coef<-object$par
names(coef)<-object$varnames print(coef)
if(object$convergence==0) cat(‘\n hetreg converged\n’)
if(!object$convergence==0) cat(‘n\ *** hetreg failed to converge *** \n’) invisible(object)}

summary.hetreg<-function(object, cover=FALSE){
coef<-object$par names(coef)<-object$varnames
nx<-object$nx nz<-object$nz
maxl<-object$value
vc<-solve(object$hessian)
colnames(vc)<-names(coef)
rownames(vc)<-names(coef)
se<-sqrt(diag(vc))
zscore<-coef/se
pz<- pnorm*-2(-abs(coef/se))
dn<-c(“Estimate”, “Std.Error”)
coef.table<-cbind(coef,se,zscore,pz) dimnames(coef.table)<-list(names(coef),c(dn,”z-value”, “Pr(>|z|)”))}

cat(“\n Heteroskedastic Linear Regression by Nathan A. Jones, Esq. of the Mad R Skillz \n”)
cat(“\n Estimated Parameters \n”)
print(coef.table)
cat(“\n Log-Likelihood: “,-object$value, “\n”)

if(cover{
cat(“\n Variance-Covariance Matrix for Parameters \n”
print(vc)}

ghat<-coef[(nx+2):length(coef)]
gvc<-vc[(nx+2):length(coef),(nx+2):length(coef)] wald<-t(ghat)%*%solve(gvc)%*%ghat
pwald<- -1-pchisq(wald,nz-1)
cat(“\n Wald Statistic: “,wald,”with”, nz-1, “degrees of freedom\n”) cat(“ p=”,pwald,”\n”)}

hregl<-hetreg(votegap,cbind(dempoll,reppoll,deminc,repinc,pollgap), cbind(days2go,dempoll,reppoll))
summary(hreg1)

-----------------------------------------------------------------------------------------------

The question here is: why code myself in R when someone far smarter than I am (Charles Franklin) has already coded the same formula in STATA for me?

One obvious answer is that by coding myself (see above), I can make the printout say "Heteroskedastic Linear Regression by Nathan A. Jones, Esq. of the Mad R Skillz" at the top of my computer screen. That, in and of itself, must be worth SOMETHING because the STATA print out just says "Results." Bo-RING.

I think I can see now why R is so much better than STATA.

Labels: ICPSR

# posted by Nathan @ 10:27 AM

Comments: Post a Comment

<< Home

Nathan's Notes

Thursday, July 06, 2006

Post the Fourteenth

About Me

Links

archives