And we back. Welcome to Data Science Some Days. A series where I go through some of the things I’ve learned in the field of Data Science.
For those who are new… I write in Python, these aren’t tutorials, and I’m bad at this.
Regarding the title – I’m kidding, I haven’t been anywhere. I’m just bad at writing. However, I recently completed a Hackathon. Let’s talk about it.
Public Health Hackathon
This crept up on me (I forgot about it…). Given it was over the course of a weekend, it threw all the plans I didn’t have out of the window.
#Our task was to tackle a health problem with public datasets. Many of the datasets supplied were about Covid-19 and I really didn’t need to study it while living in my first pandemonium. Not having it.
So we picked air quality and respiratory illness instead. Why? Hell if I know. I think it was my idea too. However, there was a lot of data for the US so it proved helpful for us.
I think we started this problem backwards. We thought about how we’d like to present the information and then thought about what we wanted to do with the data. It wasn’t a problem, just peculiar on reflection.
We decided to present our information as a Streamlit dashboard. This, when it worked, was brilliant. We then did some exploratory data analysis, developed a time forecasting model and allowed people to see changes in respiratory conditions in the future.
Here is the website we worked on.
Here is the GitHub repo in case you’re interested in the code itself.
What did I learn from the Hackathon?
- How to use Streamlit
Streamlit is a fast way to build and deploy web apps without needing to use a more complex framework or require front-end development experience.
This was probably the most useful part because I’ve been interested in trying out Streamlit for a while. It seems to a relatively powerful and I’m confident that the people behind it will continue to improve its functionality.
- The main benefit for me was its fast feedback loop. As soon as you update your script, you can quickly refresh the application locally to see your changes. If you make an error, it shows it on screen rather than crashing.
- The second benefit is its integration with popular data visualisation and manipulation packages. It’s easy to insert code from Pandas and Plotly without much change. It also contains native data visualisation functions which are helpful but not nearly as interactive as specialised packages.
- The third benefit is that it’s a great way to deploy machine learning results to the public. One thing I’ve been stuck on in my Data Science journey is how to show findings to others without having to just share notebooks. Not everyone wants to read bad code.
I’ll likely want to go into more depth into Streamlit at some point. But not today. I like it though.
2. Working with others
The only other time I’ve worked collaboratively with code is in my first Hackathon!
It’s a valuable experience being able to quickly learn the strengths and weaknesses of your teammates, decide on a task and delegate. It’s as much of a challenge as it is fun – especially when you are fortunate enough to have good teammates.
We didn’t utilise git much though and it showed that version control through Google Drive becomes unweildy fast.
3. General practice
It’s always good to practice. I’m bad at coding, which will never change. But the aim is to become less bad. I think I became less bad as a result of the hackathon.
Other things…
I’m halfway through my second round of #66daysofdata started by Ken Jee and I’ve been more consistent than the previous attempt. Definitely taken advantage of the “minimum 5 minutes” rule! There have been many days where the best I’ve done is just watch a video.
At the moment, I’ve been learning data science and Python without much direction. Mainly trying to work on projects and quickly getting discouraged by things not working. For example, in my last post about working on more projects, I was working on a movie recommendation system. It failed at multiple points and eventually I stopped working on it. Then didn’t pick anything else up.
My next Data Science Some Days post will hopefully contain a structured learning plan. Unless I finish my movie recommendation system.
Thank you, super interesting as always
I follow Ken Jee too! I’m doing the 30 days of ML by Kaggle as part of the challenge :D