The Python packages you must learn for hedge fund jobs
As we have written here before, hedge funds love people who can code in Python. Having started life as a leader in web frameworks, Python has transformed into the leading language for data scientists. And hedge funds derive a lot of their alpha from data.
Few people know more about data science in hedge funds than Jeff Reback, a managing director at quant hedge fund Two Sigma. Reback, who is a computer science graduate of MIT, has worked at Two Sigma since 2017 and is an expert in big data and electronic trading systems. But there's one thing that Reback knows better than anything else: Pandas, the open source Python library that's used for data structures and the manipulation of numerical tables. Reback is Mr Pandas: he's managed the project since 2013.
In a webinar a couple of months ago, Reback presented the following chart, reflecting the massive growth in Pandas among Python libraries since he took over. Based on questions asked on Stack overlow, it reflects Panda's preeminence among data packages across all industries, not just in finance.
Source: Two Sigma
However, Panda's growth has plateaued since 2020. And this is because Pandas is great, but not infallible. It's very easy to debug and it's very easy to test, but it's not great once you get to more than 10 gigabytes of data. At 10 gigabytes and above, Pandas is less efficient and has memory constraints.
At this point, therefore, Reback says Two Sigma switches seamlessly to something else: Ibis, another open source Python package designed for very large datasets. Ibis isn't on the chart above. Like Pandas, Ibis was designed by Wes McKinney, a former quant researcher at hedge fund AQR. McKinney himself detailed all Panda's flaws and his reasons for inventing Ibis in 2017.
These days, therefore, you don't just need to know Pandas. You need to know Pandas and Ibis. Reback says Two Sigma has built a tech stack, "Bamboo," that uses Pandas at its core for smaller datasets, and that uses Ibis to translate its code into Apache Spark for larger data sets. "This is super nice, write the code once, test it out, get it to work and then scale it up flawlessly," says Reback.
For the moment, Pandas is by far the most used of the two libraries: it has 35,000 stars on Github compared to Ibis's 2,000. But as data proliferates, Ibis is the future. Data scientists who want to work in hedge funds need to know both.
Have a confidential story, tip, or comment you’d like to share? Contact: firstname.lastname@example.org in the first instance. Whatsapp/Signal/Telegram also available (Telegram: @SarahButcher)
Bear with us if you leave a comment at the bottom of this article: all our comments are moderated by human beings. Sometimes these humans might be asleep, or away from their desks, so it may take a while for your comment to appear. Eventually it will – unless it’s offensive or libelous (in which case it won’t.)