Importing presidential approval poll results

Category : Uncategorized

Importing presidential approval poll results

Download the PDF from SSRN.
Examples: approval.

Abstract

The American Presidency Project provides presidential job approval poll results. These data are available for each U.S. president since President Franklin D. Roosevelt and for all the job approval polls conducted. The proposed Stata command, approval, downloads these presidential approval poll results in its original format, an HTML table. The approval then parses the HTML table and prepares the data as a usable Stata dataset.
Keywords: Presidential job approval, presidential popularity, U.S. presidents, parse HTML

Introduction

The American Presidency Project provides wide range of valuable data related to the U.S. presidents. Among these publicly available data, presidential job approval poll results are compiled by Gerhard Peters using the Gallup Polls. These data are available for each U.S. president and for all the presidential job approval polls conducted since President Franklin D. Roosevelt. The approval data are available through The American Presidency Project web page in HTML format. The poll results can be copied and pasted into a text editor for further editing before the data can become usable by Stata.

The proposed Stata command, approval, automates the process of accessing and parsing the presidential approval data. The data are available for each president separately. With the use of the approval, poll results are accessed, downloaded as HTML and then parsed. The end result is the presidential job approval poll results dataset usable for Stata. With approval, poll results may be processed either for an individual U.S. president or for multiple presidents. If multiple presidents are preferred then the data are appended as the presidency number may be used as the panel variable.

The approval command

The presidential job approval poll results are provided through the following web site; http://www.presidency.ucsb.edu/data/popularity.php?pres=44. As the number for the field pres within the url changes, the results for the corresponding U.S. president are provided. Through the above url, list of HTML tables are provided. All but one of these tables are related to the web page content other than the presidential job approval poll results. As its first step, approval fetches the above url as a string variable. After the web content retrieval, each table within the table (“table”) HTML tags are parsed into a string vector. Since the table with the first column and first row content that is equal to “President:” belongs to the presidential job approval poll results, the corresponding vector cell is kept and others are discarded. The vector cell that contains the data is then assigned to a string and all end of row HTML tags (“/tr”) are replaced with carriage return (char(13)). Up to this point, the approval uses MATA code. The resulting string variable is tokenized by carriage returns, transposed and transferred to Stata as a string variable. The final processing with Stata splits each observation (each table row of the data) using the end of column HTML tags (“/td”). The resulting data have columns of the original table as the variables and rows of the original table as the observations. Two additional variables are generated; 1) president which contains the name of the president and 2) president2 which contains the presidency number of the president. All variables are formatted to their original formats; string for president, float for president2, byte for approving/disapproving/unsure and float for startdate/enddate.

Important Mata functions used in the approval code

The following paragraphs provide the Mata functions and code used in the approval code. These functions are for general parsing purposes and can be used in creating other Stata commands that parse HTML code.

*
* Get HTML source code from WWW
string file_get_contents (string scalar raw)
{
     fh = fopen(raw, "r")
     raw=""
     while ((line=fget(fh))!=J(0,0,"")) {
          raw=raw+line
     }
     fclose(fh)
     return (raw)
}

* Strip common HTML tags
string strip_tags (string scalar raw)
{
     tags = ("tr", "TR", "td", "TD", "strong", "STRONG", 
          "/strong", "/STRONG", "span", "SPAN", "/span", 
          "/SPAN", "img", "IMG", "/img", "/IMG", "br", 
          "BR", "!-", "table", "TABLE", "/table", "/TABLE")
     for (j=1; j<=cols(tags); j++) {
          tag = tags[j]
          while (strpos(raw, "<" + tag)) {
               bas_pos = strpos(raw, "<" + tag)
               bas_txt = substr (raw, 1, bas_pos - 1)
               son_txt = substr (raw, bas_pos, .)
               bas_pos2 = strpos(son_txt, ">")
               son_txt = substr (son_txt, bas_pos2 + 1, .)
               raw = bas_txt + son_txt
          }
     }
     return (raw)
}

* Strip specific HTML tags
string remove_tags (string scalar raw, string scalar tag)
{
     while (strpos(strlower(raw), "<" + tag)) {
          bas_pos = strpos(strlower(raw), "<" + tag)
          bas_txt = substr (raw, 1, bas_pos - 1)
          son_txt = substr (raw, bas_pos, .)
          bas_pos2 = strpos(strlower(son_txt), "") + 3 + strlen(tag)
          son_txt = substr (son_txt, bas_pos2 + 1, .)
          raw = bas_txt + son_txt
     }
     return (raw)
}

* Remove unnecessary white space
string remove_space (string scalar raw)
{
     while (strpos(raw, "  ")) {
          raw = subinstr(raw, "  ", " ")
     }
     return (raw)
}
*


Syntax
*
approval president(numlist>31 integer)
*


Options
  • president(numlist>31 integer) is the list of U.S. presidents’ presidency numbers. The list may contain only one president or multiple presidents. The name of the president will become the content of the variable president which will be based on the presidency number provided. The presidency number will become the content of the variable president2. Presidential numbers are as follows; Franklin D. Roosevelt is the 32nd president, Harry S. Truman is the 33rd president, Dwight D. Eisenhower is the 34th president, John F. Kennedy is the 35th president, Lyndon B. Johnson is the 36th president, Richard Nixon is the 37th president, Gerald R. Ford is the 38th president, Jimmy Carter is the 39th president, Ronald Reagan is the 40th president, George Bush is the 41st president, William J. Clinton is the 42nd president, George W. Bush is the 43rd president and Barack Obama is the 44th president.


How to install
*
net from "http://researchbythenumbers.com/stata/103/"
*

Then, click on the approval link and then “click here to install”.

Example #1: Single U.S. president’s job approval poll results

With this example presidential job approval poll results for President Barack Obama, 44th U.S. president, are downloaded and parsed.

*
approval 42 43 44
sum
*

Screen Shot 2016-08-14 at 1.21.16 PM

approval.ado
*
program define approval, rclass
	
	version 10.0
	
	syntax anything(name=presidents)
	
	qui: {

		foreach name in `presidents' {
			clear 
			mata: get_profile("`name'")
			drop if myvar=="@"
			capture: split myvar, parse("#") gen(mfd)

			* Process downloaded data
			if (_rc==0) {				
				drop myvar
				drop if _n==1
				rename mfd1 president
				gen start_date=date(mfd2,"MDY",2050)
				format %td start_date
				gen end_date=date(mfd3,"MDY",2050)
				format %td end_date
				rename mfd5 approving
				destring approving, replace
				rename mfd6 disapproving
				destring disapproving, replace
				rename mfd7 unsure_or_no_data
				destring unsure_or_no_data, replace
				capture: drop mfd*
				replace president = proper(trim(president))
				replace president = president[_n-1] if (president == "")
				gen president_num = `name'
				order president president_num start_date end_date
			}
			capture: append using temp0000000000001.dta
			save temp0000000000001.dta, replace
			noi: di "`name' is downloaded."
		}
	}
	erase temp0000000000001.dta
	sort president_num start_date
end



mata:
	void get_profile (string scalar president)
	{
		icerik = file_get_contents("http://researchforprofit.com/posts/stata_approval.php?president=" + president)

		satir = tokens(icerik, "@")

		sutun=satir'
		st_addvar("str244", "myvar")
		st_addobs(rows(sutun))
		st_sstore(.,"myvar",sutun)

	}

	string file_get_contents (string scalar raw)
	{
		fh = fopen(raw, "r")
		raw=""
		while ((line=fget(fh))!=J(0,0,"")) {
			raw=raw+line
		}
		fclose(fh)
		return (raw)
	}

end
*


approval.hlp
*
{smcl}
{* 14aug2016}{...}
{cmd:help approval}
{hline}


{title:Title}

{p2colset 5 22 24 2}{...}
{p2col:{hi: approval} {hline 2}} downloads the presidential approval poll results from "The American Presidency Project" {p_end}
{p2colreset}{...}



{title:Syntax}

{p 8 18 2}
{cmdab:approval}{cmd: president(numlist >31 integer)}


{synoptset 16 tabbed}{...}
{synopthdr}
{synoptline}
{synopt:{opt president}} number of the president(s) for the presidential approval poll results. {p_end}

{synoptline}
{p2colreset}{...}


{title:Description}

{pstd}

{opt approval} downloads the presidential approval poll results from "The American Presidency Project"
available at: http://www.presidency.ucsb.edu. The programmers and the program has no association with the data source.
The poll results data are not available as downloadable delimited file. Thus, approval is used to parse the 
poll results from the HTML table available at the data source web site.

Note: 
Presidential numbers are as follows;
Franklin D. Roosevelt is the 32nd president
Harry S. Truman is the 33rd president
Dwight D. Eisenhower is the 34th president
John F. Kennedy is the 35th president
Lyndon B. Johnson is the 36th president
Richard Nixon is the 37th president
Gerald R. Ford is the 38th president
Jimmy Carter is the 39th president
Ronald Reagan is the 40th president
George Bush is the 41st president
William J. Clinton is the 42nd president
George W. Bush is the 43rd president
Barack Obama is the 44th president


{title:Example}

{cmd: . approval 42}
{cmd: . approval 42 43 44}


{title:Web support}
{p 5 5 2}
{browse "http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1927957":{it:Download via SSRN}.}{break} 
{browse "http://researchbtn.com/?p=105":{it:More examples can be found at www.researchbtn.com}.}


{title:Authors}
{p 5 5 2}
{hi:Mehmet F. Dicle}, Loyola University New Orleans, USA ({hi:mfdicle@gmail.com}){break} 
{hi:Betul Dicle}, Research by the Numbers, LLC, USA{break} 
{browse "http://researchbtn.com":{it:www.researchbtn.com}}

*


approval.pkg
*
d approval -- Downloads the presidential approval poll results from "The American Presidency Project"
d 
d Program by 
d Betul Dicle, Louisiana State University
d Mehmet F. Dicle, Loyola University New Orleans
d
d approval downloads the presidential approval poll results from "The American Presidency Project"
d 
d
d Created: 11feb2011
d Updated: 14aug2016

f approval.ado
f approval.hlp
*


stata.toc
*
d
d
p approval Downloads the presidential approval poll results from "The American Presidency Project"
*