Scripting web sessions in PowerShell


This quick tutorial is going to explore scripting web sessions using PowerShell. This includes gathering form information from a web site, then submitting a login through POST, and finally doing actions as that logged in user by keeping track of session information. The only requirements for this tutorial is to have PowerShell 3.0 or above installed, which comes by default on Windows 8 or can be downloaded for Windows 7, along with a basic familiarity with PowerShell and HTML.

Gathering form information

Before we can send a login or other information to a web site we need to know what the form values are, to know what the server expects. To create a connection, we will be using the Invoke-WebRequest cmdlet. For this experiment, we will use the Reddit site, logging as a valid user, then gathering inbox messages, but this would work with any web app.

Let's fetch the home page and see what forms are available:

PS> $result = Invoke-WebRequest -Uri https://reddit.com -SessionVariable session

This will put all of the page's content inside of $result and the session information inside of $session. Notice that there is no $ in the session variable parameter.

The nice thing about PowerShell is that it parses the HTML for us. So we can access links with $result.links, images with $result.images, but what we want are the forms:

PS> $result.forms
Id                            Method                        Action                        Fields
--                            ------                        ------                        ------
search                        get                           https://www.reddit.com/search {[q, ]}
login_login-main              post                          https://www.reddit.com/pos... {[op, login-main], [user, ...
PS>

Here we can see two forms. The login one is obviously the second, so we will explore that one. Remember that arrays start at 0, so we actually want form number 1:

PS> $result.forms[1].Fields
Key                                                         Value
---                                                         -----
op                                                          login-main
user
passwd
rem-login-main                                              on
PS>

Now that we have our field names and values, two last things we need before we can craft our login is where that form goes, called the action, and how to send the information, called the method:

PS> $result.forms[1].Action
https://www.reddit.com/post/login

PS> $result.forms[1].Method
post

PS>

Excellent, let's craft a call to log in.

Logging into Reddit

We know what we require in order to craft a login session: The method is POST, the action is https://www.reddit.com/post/login, and we have 4 fields to fill up. Here is the crafted login call:

PS> $result = Invoke-WebRequest -Uri https://www.reddit.com/post/login -WebSession session -Method POST -Body @{'op'='login-main';'user'='xxxxxxx';'passwd'='zzzzzz';'rem-login-main'='on'}

The two fields that already had values are left as is, as for the user name and password, replace the xxxxxx and zzzzzz with valid values. Also notice we pass our $session variable so that the session information is placed there. If all went well, we should now have valid credentials stored as cookies.

Let's see what the session contains:

PS> $session
Headers               : {}
Cookies               : System.Net.CookieContainer
UseDefaultCredentials : False
Credentials           :
Certificates          :
UserAgent             : Mozilla/5.0 (Windows NT; Windows NT 6.3; en-US) WindowsPowerShell/5.0.9883.0
Proxy                 :
MaximumRedirection    : -1
PS>

As you can see, we can access a lot of information here, including the connection headers, proxy information, and so on. What we're interested in however are cookies:

PS C:\scripts> $session.Cookies.GetCookies('https://reddit.com') | Select Name,Value
Name                                                        Value
----                                                        -----
__cfduid                                                    d9588ee1db4b3822b9c1660a284cf676a1426790508
reddit_session                                              29049489%2C2015-03-19T11%3A49%3A12%2C0feefb5636d457652a9...
PS>

We logged in successfully, and have cookies assigned to our session!

Reading our messages

Now that we have a valid session, we can use it to interact with the web app. Here we will simply look at our inbox:

PS> $result = Invoke-WebRequest -Uri https://www.reddit.com/message/inbox -WebSession $session

This will fetch the page as a logged in user. However, if you look at the actual content, it's a bit overwhelming:

PS> $result.Content

<!doctype html><html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head><title>messages: inbox</title><
meta name="keywords" content=" reddit, reddit.com, vote, comment, submit " /><meta name="description" content="reddit:
the front page of the internet" /><meta name="referrer" content="origin"><meta http-equiv="Content-Type" content="text/
html; charset=UTF-8" /><link rel="alternate" media="only screen and (max-width: 640px)" href="http://i.reddit.com/messa
ge/inbox" /><meta name="viewport" content="width=1024"><link rel='icon' href="//www.redditstatic.com/icon.png" sizes="2
56x256" type="image/png" /><link rel='shortcut icon' href="//www.redditstatic.com/favicon.ico" type="image/x-icon" /><l
ink rel='apple-touch-icon-precomposed' href="//www.redditstatic.com/icon-touch.png" /><link rel="alternate" type="appli
cation/rss+xml" title="RSS" href="https://www.reddit.com/message/inbox/.rss" /><link rel="stylesheet" type="text/css" h
ref="//www.redditstatic.com/reddit.FYXxHSxJ9C0.css" media="all"><!--[if gte IE 8]><!--><link rel="stylesheet" href="htt
ps://b.thumbs.redditmedia.com/ba_fm376ctS_mGlZGabqddmkhth3jqnccUyhKW7iGBo.css" title="applied_subreddit_stylesheet" typ
e="text/css"><!--<![endif]--><!--[if gte IE 9]><!--><script type="text/javascript" src="//www.redditstatic.com/reddit-i
nit.en.6NpjmlZTFqs.js"></script><!--<![endif]--><!--[if lt IE 9]><script type="text/javascript" src="//www.redditstatic
.com/reddit-init-legacy.en.wUkgXC56HLI.js"></script><![endif]--><script type="text/javascript" id="config">r.setup({"aj
ax_domain": "www.reddit.com", "server_time": 1416790190.0, "post_site": "", "clicktracker_url": "//pixel.redditmedia.co
m/click", "renderstyle": "html", "modhash": "hnx5r4y7t33fd6219bddab446318c3b694669f85bd1158cce4", "stats_domain": "http
s://stats.redditmedia.com", "store_visits": false, "anon_eventtracker_url": "//pixel.redditmedia.com/pixel/of_diversity
.png", "cur_domain": "reddit.com", "comment_embed_scripts": ["//www.redditstatic.com/comment-embed.js"], "send_logs": t
rue, "gold": true, "is_sponsor": false, "pageInfo": {"actionName": "message.GET_listing", "verification": "edc31ee93e17

...

You could parse this content, and PowerShell would assist you with that, but for now we can simply save the content to a file and open it in a browser to make sure the messages are there:

PS> $result.Content | Out-File messages.html
PS> .\messages.html

Parsing a feed

If this was the only way to interact with the site, you would have to do some parsing of that content. Fortunately, Reddit provides an API to transmit data as XML. If you go to https://www.reddit.com/prefs/feeds while logged in, it will show you what URL to use to see your messages as a feed. So let's do that instead:

PS> [xml]$messages = Invoke-WebRequest -Uri "http://www.reddit.com/message/inbox/.rss?feed=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&user=myuser"

Here it's important to cast the variable as [xml] to tell PowerShell it should do automatic parsing, which is yet another useful feature of the language. If you take a look at the result, you will see it contains a standard RSS feed:

PS> $messages
xml                                                         rss
---                                                         ---
version="1.0" encoding="UTF-8"                              rss
PS>

From here, you can drill into it easily:

PS> $messages.rss
version : 2.0
dc      : http://purl.org/dc/elements/1.1/
media   : http://search.yahoo.com/mrss/
atom    : http://www.w3.org/2005/Atom
channel : channel
PS> $messages.rss.channel
title                         link                          description                   image
-----                         ----                          -----------                   -----
messages: inbox               {http://www.reddit.com/, a...                               image
PS>

You can get a list of message titles like this:

PS> $messages.rss.channel.item | Select title
title
-----
comment reply : from dejurka via sysadmin sent 42 minutes ago
comment reply : from dejurka via sysadmin sent 1 hour ago
comment reply : from kdorsey0718 via sysadmin sent 1 hour ago
comment reply : from kdorsey0718 via sysadmin sent 1 hour ago
Your comment has been gilded! : from reddit sent 2 hours ago
comment reply : from Pyr0AWLB via 24hoursupport sent 3 hours ago
PS>

Conclusion

Most people would think of using specialized applications to do this type of web automation, or perhaps create a Python or Perl script, but PowerShell is surprisingly useful for things like that. Hopefully you've learned a thing or two and had fun!



© 2008-2017 Patrick Lambert - All resources on this site are provided under the MIT License - You can contact me at: dendory@live.ca