Slide Link: fsusac.sportsdataverse.org
Saiem Gilani
The topic of our conversation will center around how to get access to sports data and why you should know your way around the data generated by the sports in which you are interested.
Saiem Gilani - Lead engineer and founder of the SportsDataverse
Born and raised a Seminole and a proud Tallahassee native. I am an FSU alumnus in mathematics and went to graduate school at Georgia Tech for analytics. My general domain of work is machine learning and data science with a current focus on sports.
I attended the FSU Sports Analytics Summit 2020!
I wrote up some of my thoughts and observations on incoming coach Mike Norvell’s presentation on a handful of analytics-related topics. This became my first article for Tomahawk Nation, the SBNation blog covering the Seminoles.
Further reading
Simultaneously, I started working with the {cfbscrapR}
package (now archived) to help write analytics driven articles.
I would not be here without my collaborators from the cfbscrapR
team:
I quickly became involved with contributing to my first open-source package on GitHub, eventually becoming a co-author. I then developed the successor to the package, {cfbfastR}
.
I used my experience from going to the FSU Sports Analytics Symposium to sharpen my networking and communication skills
A couple weeks later, I went to the 2020 MIT Sloan Sports Analytics Conference
Got to meet and see some sports analytics celebrities like Seth Partnow, John Hollinger, and Alok Pattani
Competed in the Hackathon, an exceptional opportunity to work with and chat with very talented individuals about shared research ideas and further steps we could take with our projects
I had a thought I am sure many of the long-standing members of the sports analytics community has had.
what if getting sports data for analysis was easy?
what if we worked together to build the data infrastructure for research?
how much further would we get?
An organization trying to make the sports data and analytics industry more diverse, inclusive, and accessible by providing high-quality resources for end-users and opportunities for practical code skill development for those that join the effort
💡 + 💻 + 📈
A set of packages for loading and scraping sports data in R, Python, and Node.js with focus placed on play-by-play data
+
+
A community of developers committed to developing and maintaining open-source sports data packages and pipelines as on-going public utilities
👥 + 💬 + 👨💻 + 📦
A set of corresponding data repositories which allow fast loading of the data for users and collectively form one of the largest open-source sports data resources with over 250Gb of data produced from the packages I contribute to
🔑 + 👑
Our organization helps establish the bench of developers from diverse backgrounds to spearhead projects and make contributions
20+ R packages with over a dozen sports leagues covered.
Pro Leagues
Collegiate Leagues
Access to loadable SDV-provided data and functions in the sportsdataverse
python module and access to ESPN endpoints. Additional modules include: sportypy
, collegebaseball
, nwslpy
, and recruitR-py
Access to ESPN endpoints (among other websites) via the sportsdataverse
node.js module for easy web application development.
The first public conversation on the SportsDataverse projects happened at the Carnegie Mellon Sports Analytics Conference. The paper I wrote for the conference was selected as the winner for the Data and Software contribution, Open Track for their reproducible research competition.
Further reading
As more teams and organizations adopt these principles, you may get opportunities to share projects created during the interview process on GitHub and in your portfolio
I am particularly grateful to the Brooklyn Nets for giving me the opportunity to add a project I built in under a couple weeks into my portfolio
Sign up for an account on GitHub.com
If you’re a student, sign up for the extremely generous GitHub Student Developer Pack
Start sharing your code and projects online so that people may see them
Build out a portfolio of interesting research topics, data visualizations, web applications
There are countless examples of people getting hired straight off their analysis on Twitter
Build a following to increase your network reach
Be prepared to have not nice things said about your work
Take feedback constructively and incrementally improve your projects
I make data things work together to create models, reports, and applications for other stakeholders to use in their decision-making (or perhaps for upstream/downstream processes).
It’s about knowledge and not just code
Tasks can include:
cfbfastR
team:nflverse
team:Game on Paper - for a look at the sportsdataverse
python package serving live advanced stats with expected points and win probability metrics.
Slides link fsusac.sportsdataverse.org
| Source code | Author: Saiem Gilani