Hooligans Sportsbook

MrX

  • Start date
  • Replies
    69 Replies •
  • Views 4,555 Views
Baseball hacks is pretty technical and most of the scripts don't work anymore, so unless you are pretty familiar with programming I don't think it will help that much. There are a couple of places that show how to get the retrosheet database into mysql, let me look around.
 
Just looked at Table of Contents for Baseball Hacks and I think you are right, would be a great place to start. Offseason would probably a good time to do the learning and try to create something of use for next season.

Yep, now's the perfect time to start. I guarantee you won't finish too early!

My number 1 piece of advice: Understand how to back-test out-of-sample against historical lines. Always have a plan for your back-testing and be vigilant about keeping your out-of-sample data out of your model development. Going after major markets isn't like small-market stuff or betting off-numbers where you can just go forward with reasonable confidence that you have an edge just because of solid methodology.
 
Getting retrosheet data into mySQL should be your first step, IMO. You can probably figure it out on your own with some google searching and some help from this forum when you get stuck.

Here's the basic idea.

1) Download the event-files from retrosheet for whatever seasons you're going to work with (6 seasons is probably a decent number).
2) Convert the event files into a csv file. Retrosheet provides a command-line tool (BEVENT.exe) to do this, but a third-party tool (CWEVENT.exe) available here: http://sourceforge.net/projects/chadwick/files/ is better and generates some extra fields that you'll probably want sometime.

3) Create a mySQL database. Create a table to hold the pbp.
4) Load the csv file into the mySQL table.
 
there are already some resources online that already have scripts that will load the retrosheet into mysql db just by running a python script. although it may in your interest to increase your familiarity with databases,etc. to load it into a mysql db yourself.
 
there are already some resources online that already have scripts that will load the retrosheet into mysql db just by running a python script. although it may in your interest to increase your familiarity with databases,etc. to load it into a mysql db yourself.

Exactly. I think it's a better idea to do it yourself at this stage. Creating the table yourself will also familiarize you with the pbp fields (there are a lot of them).
 
I'm not sure what you're doing wrong, I can take a look a little later, but the baseball-databank data isn't really what you want. It can be useful for some things, but you really want the data from retrosheet.org.

For any useful modeling, you're going to need data broken down to at least the individual-game-level. You won't get that with the baseball-databank.org database.