Fill your Pocket with Reddit

Fill your Pocket with Reddit

Learning to code can be daunting, but setting a sweet, yet achievable goal can make it awesome. The goal I’ve set for us today is to have top internet content from Reddit delivered to a phone automatically, into Pocket. This will involve our trusty scripting friend, python, and an awesome library, called requests. Let’s go!

Step 1: Get to know Reddit

Reddit is a popular site that will aggregate web links that we can feed into Pocket. It is called the front page of the internet, and that is not too far from true. Some quick googling reveals that adding .json to normal looking urls can give us easily consumable content. Nice.

Reddit is broken up into subreddits, that is, smaller forums about specific topics. So, posts about python will appear in the python subreddit. By clicking around on the website and watching the way the url changes, I began to notice a pattern in the url parameters on the end. In the end, I settled on “https://www.reddit.com/r/python/top/.json?sort=top&t=day&limit=1”. This says, on reddit, subreddit python, give me the most popular post from the last day.

Step 2: Requesting Reddit

The first thing we’ll need to do is use the Reddit url we came up with to get some data back into our script. We’ll use the python requests library because it’s easy. And it’s awesome. With requests, it’s easy to do something as simple as:

We import the library, and write our url as a string variable. We call upon requests to perform a get, which is the normal thing your browser would do. After the response comes back, we print out the contents. Ok, let’s run that.  Oh wait. I’ve been lurching forward so excitedly, I haven’t even set up my project directory, or set up my virutalenv.  (Covered here if you haven’t seen it used.)  Let’s do that, first, then run.

Step 2a: This can’t be right

Sad Trombone, I’m getting {“error”: 429} as a response. So what’s that mean? HTTP code 429 means “Too many requests”, but this is our first one, so what gives? Google tells me that Reddit expects a user agent in the request. Chrome, for example, puts meta data in all its communications that says, “Hey guys, it’s me, Chrome”. We need to do the same. So we update our code like so:

This time we had a User-Agent included in the request, and Reddit was so impressed with the name we chose, it spit out a bunch of json. Trust me, in the computer world, spitting out json is a compliment.

Step 3: But I don’t speak json

Now, because we have our data as json, we can treat it like any other python dictionary. In order to understand the hierarchy, I copy that big chunk of json into a json pretty printer on the web. This doesn’t change the content at all, but adds newlines and indents so it’s easier to read for human. With my puny human eyes, I see that the structure goes, first ‘data’, then ‘children’. Each of the links I’m looking for lives at the ‘children’ level under ‘data’, ‘url’. We update the code to loop over the children, and only print out the payload we were looking for:

Step 4: MOAR DATA

In our code so far, we’re only asking for one article from one subreddit. We could update the limit parameter in the url, if we wanted only articles from the python subreddit. But hypothetically, let’s imagine that we wanted to read about more than just python. Some people are like that. We need that url to become dynamic, and do a different request for each of the subreddits we want. So, we’ll make a list of subreddits, and loop over them, putting our ‘data children’ loop within that.

Let’s recap. First we made a list of strings that were the subreddits we were interested in. Then we turned the url string into a template. (Note the {0}). Then we put all our original requests code within the loop, but changing the url each time with python’s string.format.

Step 5: Admire, then Pocket

This is pretty sweet so far, I’d argue. We have basically a fountain of internet content. It can be tailored to meet your interests and there is virutally no limit to how much comes out. With the urls as output, you have many consumption options, but we’re going to play with Pocket. We could just go to their web page and enter the links one by one, but I’m much lazier than that. They have an api, and we have a computer to do this for us.

Going to the pocket developer page, we can fill out the form to “Create a new app”, and at the end they will give us an api key. This is a way for them to know who is using their functions, and give them the power to stop us if we were doing something shady. We aren’t, so let’s continue. For simplicity, I chose to use the v2 of their api, not the newest one, v3. This is because the old version lets you specify the user’s account via username and password, instead of including an app authorization pattern, which complicates things.

Looking at the docs for the add endpoint, we see that we need the apikey, username, password, and the url we want to add. We should have all those parts now, so we’re ready to build another request. The Pocket block of code looks like this:

For each link in our list, set the post data to contain the api key we got for our app, the username/password of the Pocket user, and the link itself. Send the request to Pocket’s add endpoint with the data we built, and lean back. The leaning back helps for aerodynamics.

Jump over to your terminal and launch that script. You should see the links print out as you are sending them to Pocket. Open up Pocket on your phone or web browser and see that your links are showing up.

Step 6: Cron that!

Ok, so we have a finely crafted script to pull links from Reddit and put them into Pocket. You might say that is pretty lazy, but I’d argue, not lazy enough. I want this to happen automatically, every day, so that stuff just appears for me to read. Right now I’d have to open up my computer and run the script.

So, to make your computer run commands at a specified time, we use cron. We write a line of configuration with five numbers, followed by the command. The five numbers in order are: Minute, Hour, Day, Month, Day of week. Specify a number for each, or a star to say, “all of them”. So, I’m going to choose to run this at 5 am, like some kind of crazy internet newspaper delivery. So my cron numbers will be 0 5 * * *. The 0 minute, the 5 hour, every day of every month, no matter what day of the week it is. Then I enter the path to the script as the last part, so I’ll have “0 5 * * * /home/chris/projects/redditpocket/reddit_to_pocket.py”

Now, to enter this into the crontab, I use the command ‘crontab -e’ to open up the file to add this line. Save, and we’re done.

Feel free to strike a pose of triumph as you stand victorious over the machines, having bent them to your will.

Join the Discussion

%d bloggers like this: