R-Code Yahoo Finance Data Loading

Posted on 17th January 2012 in r-bloggers, Research

Here is an R script that downloads Yahoo Finance Data without the need of additional packages/libraries.

In the .zip file is the code with an example on how to use it.

Download the code here: R Code - Yahoo Data Loading

You can also find it under the “Downloads” Section of our site.

# Function and example code for loading finance data from Yahoo
# without the need of any additional package.
#
# Written by Fotis Papailias & Dimitrios Thomakos on Dec. 31, 2011
# Contact Details: papailias@quantf.com
#                  dimitrios.thomakos@gmail.com, thomakos@quantf.com
#
# All material is provided for use as is, with no guarrantees, either expressed or implied.
# Copyright (C) under the authors' names Papailias, Fotis and Thomakos, Dimitrios for both
#
#-------------------------------------------------------------------------#
#             Quantitative Finance & Technical Trading                    #
#                     http://www.quantf.com                               #
#-------------------------------------------------------------------------#
#
# PLEASE MAINTAIN THIS HEADER IN ALL COPIES OF THIS FILE THAT YOU USE

###############################################################################################
# Main Function
#
# Input
# -----
#   tickers (text strings)
#   start.date (dates)
#   end.date (dates)
#
# Output
# -------
# 6 Double Matrices: Open, High, Low, Close, Volume, Adj. Close
###############################################################################################

data.loading <- function(tickers, start.date, end.date)
{
  # Change the locale
  sl <- Sys.setlocale(locale="US")

  # Create the universe of dates
  all.dates <- seq(as.Date(start.date), as.Date(end.date), by="day")
  all.dates <- subset(all.dates,weekdays(all.dates) != "Sunday" & weekdays(all.dates) != "Saturday")
  all.dates.char <- as.matrix(as.character(all.dates))

  # Create sparse matrices
  open <- matrix(NA, NROW(all.dates.char), length(tickers))
  hi <- open
  low <- open
  close <- open
  volume <- open
  adj.close <- open

  # Name the rows correctly
  rownames(open) <- all.dates.char
  rownames(hi) <- all.dates.char
  rownames(low) <- all.dates.char
  rownames(close) <- all.dates.char
  rownames(volume) <- all.dates.char
  rownames(adj.close) <- all.dates.char

  # Split the start and end dates to be used in the ULR later on
  splt <- unlist(strsplit(start.date, "-"))
  a <- as.character(as.numeric(splt[2])-1)
  b <- splt[3]
  c <- splt[1]

  splt <- unlist(strsplit(end.date, "-"))
  d <- as.character(as.numeric(splt[2])-1)
  e <- splt[3]
  f <- splt[1]

  # Create the two out of the three basic components for the URL loading
  str1 <- "http://ichart.finance.yahoo.com/table.csv?s="
  str3 <- paste("&a=", a, "&b=", b, "&c=", c, "&d=", d, "&e=", e, "&f=", f, "&g=d&ignore=.csv", sep="")

  # Main loop for all assets
  for (i in seq(1,length(tickers),1))
    {
      str2 <- tickers[i]
      strx <- paste(str1,str2,str3,sep="")
      x <- read.csv(strx)

      datess <- as.matrix(x[1])

      replacing <- match(datess, all.dates.char)
      open[replacing,i] <- as.matrix(x[2])
      hi[replacing,i] <- as.matrix(x[3])
      low[replacing,i] <- as.matrix(x[4])
      close[replacing,i] <- as.matrix(x[5])
      volume[replacing,i] <- as.matrix(x[6])
      adj.close[replacing,i] <- as.matrix(x[7])
  }

  # Name the cols correctly
  colnames(open) <- tickers
  colnames(hi) <- tickers
  colnames(low) <- tickers
  colnames(close) <- tickers
  colnames(volume) <- tickers
  colnames(adj.close) <- tickers

  # Return the ouput
  return(list(open=open, high=hi, low=low, close=close, volume=volume, adj.close=adj.close))
}
Be Sociable, Share!
comments: 10 »

10 Responses to “R-Code Yahoo Finance Data Loading”

  1. Carlos Azevedo says:

    I would like here to leave my sincere thanks and my appreciation for the assistance and courteous fashioner that you have extended me whenever I requested.

    Fantastic

    and the CODE: Works like a CHARM!

    Thanks

    Carlos

  2. Pat Burns says:

    I object to the line:

    rm(list=ls(all=TRUE))

    People could easily miss that that line is there and destroy the data they have in their global environment.

  3. Fotis says:

    Hi Josh,

    Thanks for your comment.

    This is for those who do not want to load additional packages just for one function (the data loading).

    I only agree about the list of multiple returns because in its current form it returns everything (open, hi, lo, close, vol, adj. close). In a future updated version I will make this user dependent.

    Apart from the above, the objects are at most 6 double matrices (if someone’s using open, hi, lo, close, vol, adj. close) and each column will be a specific asset.

    The above makes it even easier to be used in loops where indicator “i” will always be the same asset/same column across different matrices.

    At least, it was easier for me and hence I ‘ve decided to share.

    Cheers

    • Looping like that can be slow. You could use getSymbols() to load all the instruments into an environment, then use eapply() to apply an indicator to every symbol in the environment.

      You should also consider how you adjust the OHLC data. Using the adjusted close column will not be accurate over long time spans because it only has 2 decimal places of precision, so you need to re-calculate the split / dividend ratios yourself from the raw split and dividend data.

      A lot of people have spent a lot of time and frustration sorting this stuff out, so I would encourage you to ensure you understand why the functions you’re replacing are written the way they are before replacing them.

      • Fotis says:

        Hi Josh,

        Thanks for your useful comments. In a future update of the code I will try to take this into account.

        Of course, I am not replacing anything. There is neither intention (nor the knowledge or the time) to create a package of tools to replace existing packages (or to compete in any way).

        I have just uploaded what it helped me in case that someone else wants to use it.

  4. Download speed is dependent on your internet connection, not any specific R package. All the Yahoo download functions do basically the same thing, which you have re-done yet again. So, what is the benefit of not using additional packages / libraries?

    The structure you return also seems like it would be very hard to work with, since all the columns of a given security are spread across multiple list elements. That will require much more subsetting later on, which will create copies of objects and therefore be slow.

Leave a Reply

Your email address will not be published. Required fields are marked *


*

Comment

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>