FSU Sports Analytics Series

FSU Sports Analytics Series

Slide Link: fsusac.sportsdataverse.org

Saiem Gilani


The topic of our conversation will center around how to get access to sports data and why you should know your way around the data generated by the sports in which you are interested.

About me

Saiem Gilani - Lead engineer and founder of the SportsDataverse
@saiemgilani @saiemgilani


Born and raised a Seminole and a proud Tallahassee native. I am an FSU alumnus in mathematics and went to graduate school at Georgia Tech for analytics. My general domain of work is machine learning and data science with a current focus on sports.

How did you get started?

I attended the FSU Sports Analytics Summit 2020!

I wrote up some of my thoughts and observations on incoming coach Mike Norvell’s presentation on a handful of analytics-related topics. This became my first article for Tomahawk Nation, the SBNation blog covering the Seminoles.

Further reading

Started meeting great folks online

Simultaneously, I started working with the {cfbscrapR} package (now archived) to help write analytics driven articles.

I would not be here without my collaborators from the cfbscrapR team:

I quickly became involved with contributing to my first open-source package on GitHub, eventually becoming a co-author. I then developed the successor to the package, {cfbfastR}.

Went to another conference

  • I used my experience from going to the FSU Sports Analytics Symposium to sharpen my networking and communication skills

  • A couple weeks later, I went to the 2020 MIT Sloan Sports Analytics Conference

  • Got to meet and see some sports analytics celebrities like Seth Partnow, John Hollinger, and Alok Pattani

  • Competed in the Hackathon, an exceptional opportunity to work with and chat with very talented individuals about shared research ideas and further steps we could take with our projects

Everything came to a screeching halt

Then, I had an idea 💡

I had a thought I am sure many of the long-standing members of the sports analytics community has had.

  • what if getting sports data for analysis was easy?

  • what if we worked together to build the data infrastructure for research?

  • how much further would we get?

The SportsDataverse

  • An organization trying to make the sports data and analytics industry more diverse, inclusive, and accessible by providing high-quality resources for end-users and opportunities for practical code skill development for those that join the effort
    💡 + 💻 + 📈

  • A set of packages for loading and scraping sports data in R, Python, and Node.js with focus placed on play-by-play data
    R + python + nodejs

The strength of the SportsDataverse

  • A community of developers committed to developing and maintaining open-source sports data packages and pipelines as on-going public utilities
    👥 + 💬 + 👨‍💻 + 📦

  • A set of corresponding data repositories which allow fast loading of the data for users and collectively form one of the largest open-source sports data resources with over 250Gb of data produced from the packages I contribute to
    🔑 + 👑

  • Our organization helps establish the bench of developers from diverse backgrounds to spearhead projects and make contributions

Our progress so far R

20+ R packages with over a dozen sports leagues covered.

Pro Leagues

  • NBA
  • WNBA
  • NBA G-League
  • MLB
  • NHL
  • Premier Hockey Federation
  • NWSL
  • A boatload of soccer leagues

Collegiate Leagues

  • College Football
  • Men’s College Basketball
  • Women’s College Basketball
  • College Baseball
  • College Softball
  • College Football Recruiting
  • College Basketball Recruiting

Our progress so far python + nodejs

Access to loadable SDV-provided data and functions in the sportsdataverse python module and access to ESPN endpoints. Additional modules include: sportypy, collegebaseball, nwslpy, and recruitR-py

Access to ESPN endpoints (among other websites) via the sportsdataverse node.js module for easy web application development.

Why use the SportsDataverse?

The first public conversation on the SportsDataverse projects happened at the Carnegie Mellon Sports Analytics Conference. The paper I wrote for the conference was selected as the winner for the Data and Software contribution, Open Track for their reproducible research competition.

  • It was built for you enthusiasts and soon-to-be entrants into the field
  • Allows users to quickly access seasons worth of datasets (which are updated nightly via automated GitHub Actions) via loading function calls, taking the burden off users to maintain their own web scraping scripts
  • This in-turn provides significantly easier opportunities for reproducible research and reporting

Further reading

Great… but how does this affect me?

Well, there is a fairly direct pipeline from…

  • contributing to open-source projects
  • using open-source resources to create your own open-source sports analytics projects and portfolio
  • being an active member of the open-source sports analytics community

…to getting a job in Sports Analytics

More on this topic

Using interview projects

  • As more teams and organizations adopt these principles, you may get opportunities to share projects created during the interview process on GitHub and in your portfolio

  • I am particularly grateful to the Brooklyn Nets for giving me the opportunity to add a project I built in under a couple weeks into my portfolio

Blazing the Nets

Get on GitHub! GitHub

  • Sign up for an account on GitHub.com

  • If you’re a student, sign up for the extremely generous GitHub Student Developer Pack

  • Start sharing your code and projects online so that people may see them

  • Build out a portfolio of interesting research topics, data visualizations, web applications

Then share on social media!

  • There are countless examples of people getting hired straight off their analysis on Twitter

  • Build a following to increase your network reach

  • Be prepared to have not nice things said about your work

  • Take feedback constructively and incrementally improve your projects

Who knows what can happen?

So, what do you really do?

I make data things work together to create models, reports, and applications for other stakeholders to use in their decision-making (or perhaps for upstream/downstream processes).

It’s about knowledge and not just code

Tasks can include:

  • Creating a useful model for player and team evaluation
  • Producing a nightly report of games and player boxscores using internal metrics and methods
  • Developing API’s and database methods for evolving data service provider offerings
  • … doing it better, communicating it better

Some Inspirations and Heros

My beautiful and brilliant wife, Madiha, and my family

My collaborators from the cfbfastR team:

  • Akshay Easwaran
  • Jared Lee
  • Eric Hess

The creator of CollegeFootballData.com:

  • Bill Radjewski

The nflverse team:

  • Sebastian Carl
  • Ben Baldwin
  • Tan Ho

Thank you

  • FSU Sports Analytics Club for creating a wonderful speaker series
  • The seriously awesome community of developers that helps build and maintain resources
  • All y’all for listening in

Learn more

@saiemgilani @saiemgilani
