Pollutantmean Assignment Help

I am taking the R programming course from the Data Science Specialization offered by the John Hopkins University on Coursera. This blog post is a personal notes taking where we can follow the reasoning during the exercices.

Today I try to complete the Assignement 1 “Air Pollution” Part 1. We are given a .zip file that contains 332 *.csv files containing pollution monitoring data for fine particulate matter (PM) air pollution at 332 locations in the United States. Each file contains data from a single monitor and the ID number for each monitor is contained in the file name. Here is my walkthrough.

Part 1 : pollutantmean()

The Part 1 is about writing the pollutantmean(directory, pollutant, id=1:332) function which returns the mean of a specified pollutant out of one or many CSV (requested by id) in the specified directory.

The results should be:

My try :

There are 2 cases: when ID is given for one single monitor, when ID is given for many monitors in a row.

pollutantmean <- function(directory, pollutant, id = 1:332) { files <- list.files(directory, full.names = TRUE) # Case where id indicates 1 file if (length(files[id])==1){ mean(read.csv(files[id])[,pollutant], na.rm=1) } # Case where id indicates many files in a row else { datas <- data.frame() for (i in 1:length(files[id])){ datas <- rbind(datas, read.csv(files[i])) } mean(datas[,pollutant], na.rm=1) } }

Results are:

> pollutantmean("specdata", "sulfate", 1:10) [1] 4.064128 > pollutantmean("specdata", "nitrate", 70:72) [1] 0.8599547 > pollutantmean("specdata", "nitrate", 23) [1] 1.280833

The first and the third requests works but not the second one… The mistake is that the loop is always starting at i=1 instead of the given set (that is why 1:10 returns the right answer, but 70:72 actually returns the result for 1:72). By simply fixing the loop, the results are all right:

## Fixed loop for (i in id){ datas <- rbind(datas, read.csv(files[i])) }> pollutantmean("specdata", "sulfate", 1:10) [1] 4.064128 > pollutantmean("specdata", "nitrate", 70:72) [1] 1.706047 > pollutantmean("specdata", "nitrate", 23) [1] 1.280833

What I try do next is to fix the function to makes it works with disparate ID given. I do :
– Read the monitor files list into the files vector, then binding into the bind23_26 vector files 23 and 26 (it actually adds the 26’s datas just after the 23’s datas into one single data.frame).
– Create a vector containing id=23 and id=26 and requesting them into the pollutantmean() function.

> files <- list.files("specdata", full.names=1) > bind23_26 <- read.csv(files[23]) > bind23_26 <- rbind(bind23_26, read.csv(files[26])) > mean(bind23_26[,"nitrate"], na.rm=1) [1] 4.169054 > v <- c(23,26) > pollutantmean("specdata", "nitrate", v) [1] 4.169054

Surprisingly it works without fixing the loop. I learned that loops can works with (i in c(1, 4, 5, …) ).

Next, I guess I have to fix the results to be shown at 10-3 just like the example, but the assignment asks not to round the values…

Finally, I can erase the case where ID is a single element since for loop can obviously browse a set of 1 number.

## pollutantmean.R pollutantmean <- function(directory, pollutant, id = 1:332) { files <- list.files(directory, full.names = TRUE) datas <- data.frame() for (i in id){ datas <- rbind(datas, read.csv(files[i])) } mean(datas[,pollutant], na.rm=1) }

Part 2 : complete()
Part 3 : corr()

Like this:

LikeLoading...

This question already has an answer here:

This is the first time I'm trying to import multiple CSV files in R and to solve that part of the assignment, using some of the csv files to calculate the mean of sulfate and nitrate. I searched for answers here in stackoverflow and other sites but I wasn't able to fix that issue based on what is told in questions here about the topic. I'm also new in R Programming.

If its useful: R version is 3.2.1 Mac OS X version 10.7.5

I have an assignment in Coursera where I have 332 CSV files that I have to calculate the mean of pollutants.

Link to download the file: https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2Fspecdata.zip

Assignment Part 1:

Write a function named 'pollutantmean' that calculates the mean of a pollutant (sulfate or nitrate) across a specified list of monitors. The function 'pollutantmean' takes three arguments: 'directory', 'pollutant', and 'id'. Given a vector monitor ID numbers, 'pollutantmean' reads that monitors' particulate matter data from the directory specified in the 'directory' argument and returns the mean of the pollutant across all of the monitors, ignoring any missing values coded as NA.

Prototype of the function:

My outcome should be that:

I already created my working directory and this is were I wasn't able to go further.

Whenever I try to do use F1 <-read.csv("name of the file", header=TRUE) the error that appears is Error in file(file, "rt") : not possible to open a connection In addition: Warning message: In file(file, "rt") : not possible to open the file 'nameoffile.csv': No such file or directory When I use the command read.table(filechoose(), header=TRUE) works for all the files except for the first file (001.csv) which says Error in scan (file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 don't have 7 elements When I try sapply(filelist, read.csv) appears the same error. When I use read.csv, sapply or lapply for the "specdata" the error is Error in read.table(file = file, header = header, sep = sep, quote = quote,: no lines available in input although I have all the 332.csv files in the "specdata" file.

I hope I posted everything needed for a reproducible exercise. If there is anything more needed, just let me know.

Thanks !

rcsvmultiple-files

0 Replies to “Pollutantmean Assignment Help”

Lascia un Commento

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *