What I’m currently learning in Data Science | Data Science Somedays

Black and white keyboard with red space invader icon

It is 26 September as I write this meaning that I’m on day 26 of #66daysofdata.

If this is unfamiliar to you, it’s a small journey started by a Data Scientist named Ken Jee. He decided to “restart” his data science journey is invited us all to come along for the ride.

I’m not a data scientist, I’ve just always found the young field interesting. I thought, for this instance of Data Science Somedays, I’ll go through some of the things I’ve learned (in non-technical detail).


Data Ethics

I’m starting with this because I actually think it’s one of the most important, yet overlooked parts of Data Science. Just because you can do something, doesn’t mean you should. Not everything is good simply because it can be completed with an algorithm.

One of the problems with Data Science, at least in the commercial sphere, is that there’s a lot of value in having plenty of data. Sometimes, this value is taken as a priority versus privacy. In addition, many adversaries understand the value of data and as a result, aim to muddy the waters with large disinformation campaigns or steal personal data. What does the average citizen do in this scenario?

Where am I learning this? Fast.ai’s Pratical Data Ethics course.


Coding

How do I even start?

Quite easily because I’m not that good at programming so I haven’t learned all that much. Some of the main things that come to mind are:

  1. Object Oriented Programming (this took me forever to wrap my head around… it’s still difficult).
  2. Python decorators
  3. Functions

All of this stuff has helped me create:

None of them are impressive. But they exist and I was really happy when I fixed my bugs (if there are more, don’t tell me).

Where am I learning this? 2020 Complete Python Bootcamp: From Zero to Hero in Python.

(I said earlier I haven’t learned much – that’s just me being self-deprecating. It’s a good course – I’m just not good at programming… yet.

I also bought this for £12. Udemy is on sale all the time (literally))


Data visualisation and predictions

Pandas

After a while, I wanted to direct my coding practice to more data work rather than gaining a general understanding of Python.

To do this, I started learning Pandas which is a library (a bunch of code that helps you quickly do other things), that focuses on data manipulation. In short, I can now use Excel files with python. It included things such as:

  • How to rename columns
  • How to find averages, reorganise information, and then create a new table
  • How to answer basic data analysis questions

Pandas is definitely more powerful than the minor things I mentioned above. It’s still quite difficult to remember how to use all of the syntax so I still have to Google a lot of basic information but I’ll get there.

Where am I learning this? Kaggle – Pandas

Bokeh and Seaborn

When I could mess around with excel files and data sets, I took my talents to data visualisation.

Data visualisation will always be important because looking at tables are 1) boring, 2) slow, and 3) boring. How could I make my data sets at least look interesting?

Seaborn is another library that makes data visualisation much simpler (e.g. “creating a bar chart in one line of code” simpler).

Bokeh is another library that seems to be slightly more powerful in the sense that I can then make my visualisations interactive which is helpful. Especially when you have a lot of information to display at once.

I knew that going through tutorials will have their limit as my hand is always being held so I found a data set on ramen and created Kaggle notebooks. My aim was to practice and show others what my thought process was.

Where am I learning this? Seaborn | Bokeh


Machine learning

This is my most recent venture. How can I begin to make predictions using code, computers and coffee?

So for all of the above, I still find quite difficult and there will be a little while until I can say “I know Python” but this topic seemed like the one with the biggest black box.

If I say

filepath = “hello.csv”

“pandas.read_csv(filepath)”

I understand that I’m taking a function from the Pandas library, and that function will allow me to interact with the .csv file I’ve called.

If I say sklearn.predict(X_new_data) – honestly what is even happening? Half the time, I feel like it’s just luck that I get a good outcome.

Where am I learning this? Kaggle – Intro to machine learning


What is next?

I’m going to continue learning about data manipulation with Pandas and Bokeh as those were the modules I found the most interesting to learn about. However, that could very easily change.

My approach to learning all of this is to go into practice as soon as I can even if it’s a bit scary. It exposes my mistakes and reminds me that working through tutorials often leaves me feeling as though I’ve learned more than I have.

There’s also a second problem – I’m not a Computer Science student so I don’t have the benefit of learning the theory behind all of this stuff. Part of me wants to dive in, the other part is asking that I stay on course and keep learning the practical work so I can utilise it in my work.

Quite frequently, I get frustrated by not understanding and remembering what I’m learning “straight away”. However, this stuff isn’t easy by any stretch of the imagination. So it might take some time.

And that’s alright. Because we’re improving slowly.

Let’s write an email | Data Science Some Days

Ladies and gentlemen, this has been a long time coming.

So, I’ve finally started programming. Technically, I started months ago but I’ve made such a piss-poor effort at being consistent, that I’ve done next to nothing.

I was growing frustrated – often having nightmares asking the question – will I ever be able to say “Hello World”?

Evidently, simply thinking about it would never work. I can buy as many Udemy courses as I want – that won’t turn me into a data scientist. My wonderful solution to this is to go straight into a project and learn as I go. I’m learning python 3.6. But first…

print("Hello World")

I have joined the elites.


Ok, the project goes as follows:

I need to send emails to specific groups of people a week before the event starts. The email should also include attachments specific to the person I’m sending the email to and it will have HTML elements to it.

To break it down…

  • I need to send an email
  • I need to send a HTML email
  • I need to send a HTML email with attachments
  • I need to send a HTML email with attachments to certain people

There’s more but I’ve only managed the first two so far. This programming stuff is difficult and the only reason why I’m not computer illiterate is because I was born in the 90s.

I started off by opening these tutorials:

They’re both great and use slightly different methods to achieve the same result. Maybe you’ll notice that my final solution ends up being a desperate cry for help combination of them both.

LET US BEGIN.


Here is the first iteration of the code:

import smtplib
#sets up simple mail transer protocol

smtpObj = smtplib.SMTP('smtp-mail.outlook.com', 587)
type(smtpObj)
#Connects to the outlok SMTP server

smtpObj.ehlo()
#Says "hello" to the server

smtpObj.starttls()
#Puts SMTP connection in TLS mode.
#I didn't get any confirmation when I ran the program though...

smtpObj.login('email1@email.com', input("Please enter password: ")
#Calls an argument to log into the server and input password.

smtpObj.sendmail('email1@email.com', 'email2@gmail.com', 'Subject: Hello mate \nLet\'s hope you get this mail')
#Email it's coming from, email it's going to, the message

smtpObj.quit()
#ends the session

print('Session ended')
#Tells me in the terminal it's now complete


Boom, pretty simple right? I mainly took everything from Automate the Boring stuff and just swapped in my details. Well, of course not. I kept on getting an error – nothing was happening.

Well – I was somehow using the WRONG EMAIL. FUCK. It took me an hour to realise that.


Next step… let’s send an email with bold and italics.

This was frustrating because nothing worked. All it really requires is for you to put in the message in HTML format. Because I’m not learning HTML, I decided to just use this nifty HTML converter to make this part less painful.

Here are my errors…

#regularly get "Syntax error" with smtpObj.sendmail - I was missing a fucking bracket

This literally made me to go bed angry.


'''everything in the HTML goes into the subject line -
smtpObj.sendmail('email@email', 'email2@gmail.com', "Subject: " f"Hello mate \n {html}")'''

This was a surprise but I figured out to stop it…

'''Now nothing shows up in the body of the message:
smtpObj.sendmail('email1@.ac.uk', 'email2@gmail.com', f"Subject: \n Hello mate {html}")
Solved by putting {html} next to \n'''

…then it somehow got worse…

'''Now... the email doesn't actually show up in html format
smtpObj.sendmail('email1@.ac.uk', 'email2@gmail.com', f"Subject: Hello mate\n{html}")'''

…and even worse.


At this point, I changed tactic and, in the process, the entirety of my code. I won’t show you everything here otherwise this post will look too technical to those who have no experience with coding but I’ve uploaded my progress so far onto Github.

To summarise, rather than trying to send HTML with the technique I used earlier, I used a module that was essentially created to make sending emails with python much easier (email.mime). This essentially means, it contains prewritten code that you can then use to create other programs.

But… I was successful in sending myself a HTML email. Now the next step is to add a bloody attachment without putting my head through my computer.


Dear reader, you may be wondering, “why is he putting himself through so much suffering? He sounds incredibly angry.

I’m not, I promise. This has actually been the most fun I’ve had in my free time in a while. It was a good challenge and I could sense myself improving after every mistake.

Granted, I probably should have just completed an online course or something before trying to jump into this project but that would have been less fun.

Onto the next one…

Thanks for reading!