Relationships are built on trust.
Take our relationship for example. There’s a good chance you have no idea who I am. An even better chance that we’ve never met face to face. And an even better chance that we’ve never had a full conversation.
Still, there was a part of you that decided to stay for this portion of the talk. Maybe you’ve seen my name pop up on Twitter. Maybe you know that I work as Director of Web Engineering for the amazing team at 10up. Maybe you’ve just decided to camp out in the developer track all day.
Maybe nobody told you that classes ended yesterday and you’re surprised to see all of us.
Whatever the reason, something about our separate relationships with WordCamp Seattle provided enough trust for you to give me a chance to talk about data.
The responsibility to maintain that trust is now mine.
I have you in the room and I’m going to talk about a topic I think is interesting. If you don’t find it accurate or convincing, or if I go off on an obviously out of place rant and show you big, hardly legible red slides, I might lose that trust.
And once I lose your trust, the chance that you’ll sit through a future talk of mine is extremely low. This doesn’t just affect me. Your trust in WordCamps may be diminished as a whole and the community loses.
The same thing applies to websites.
The visitors to your site or the sites of your clients go in with a level of trust. They trust that if they fill out a form or load a page nothing bad will happen. If something bad does happen, you lose that trust forever.
The users of your plugin or theme—both site owners and developers that depend on its functionality in their projects—are the same way. Maintain this by taking care with their data and taking care of their site. If you don’t, that trust will disappear immediately.
The level of freedom you are given as a WordPress developer is mind boggling.
Through the use of actions, filters, remote calls, database queries, and—at some level—file system access, you have almost complete freedom when your plugin or theme is installed. This trust has been established for you by WordPress.
One accidental misuse of a function can bring down a site entirely, leaving the owner scrambling to find an answer.
The mishandling of data can cause severe security issues, from stolen cookies leading to unauthorized access of an account to embedded spam links that become a nightmare to get rid of.
It’s not just a case of “it’s free, deal with it” or “pull requests accepted”.
The responsibility is yours, the developer.
Maintain it. Exceed it.
Let people know that when they use a solution developed by you that it is a quality product created with care.
There are several aspects to maintaining that trust. Almost everything centers around data. Making sure that the data flowing in and out of a system matches intent is crucial.
Because this is a lightning talk and I’ve already spent a few minutes talking development philosophy, we’re going to cover one topic – escaping data – with a few specific examples. At the end and in my posted write up I’ll leave you with a bunch of resources to help make maintaining trust part of your daily routine.
There are a lot of ways to escape data. At its basic form, escaping data is using the right characters or transforming the current characters into something that will not conflict with the expectations that exist where the data will be output.
If you are tasked with outputting a string between to sets of double quotes – “this is my string”, you want to be careful when the string you receive is “this is “my” string”. All of a sudden you have two sets of strings – “this is ” and ” string”. “my” has caused a syntax error – or worse, a serious security issue.
If this data were escaped – “this is \”my\” string” – it becomes something that can be dealt with as it follow an expected syntax.
The requirements for escaping change across the board. The quotes displayed here work in this example, but they aren’t always the culprit
Certain characters will trip you up when outputting HTML, others will get in the way when used as an element attribute, some will only show their face when dynamically outputting JavaScript or XML. In any of these cases you should be thinking of where the data came from, where it is going and then prepare it as so.
Cross Site Scripting enables attackers to inject client side scripts into pages. Once a script from an outside source is injected, it is considered trusted by the browser and can perform any actions that would normally be handled by the original site. This can include accessing and manipulating authentication cookies or causing general disruptions in the display of the site.
Avoiding Cross Site Scripting attacks is the primary reason for doing all of this. This is something you should be focused on throughout development when working with anything that involves interaction with an untrusted source of data.
* User Input
* Database Data
* Outside Data – API calls
This is when some source of inputted data is the cause of the XSS attack. It comes in from user input, but is displayed directly on the screen instead of being stored in the database. The data is not persistent.
Non persistent XSS attacks are somewhat less dangerous because they often require a couple steps before the user is in trouble. This could be a shortened link in an email box that goes to a site’s wp-admin path and—because a plugin does not properly escape user input—a $_GET[] parameter is interpreted on the front end to load an external, malicious script.
Persistent XSS is scarier. This has worked its way into the database and will now be output on every related page view for every user. Rather than a targeted attack, this can provide data to an attacker on many users at once and cause an extremely serious security issue.
Now that we’ve described Cross Site Scripting, how do you escape data to prevent it?
Output with care so that others trust you. Data security is not an afterthought, make it happen as you go.
Take this first line of code. This didn’t start off like the second or third line before I caught it and decided to escape the data. It started off as the first and last lines, escaped from the beginning as it was written.
We knew that the variable contained information that had just been pulled from the database and could not be trusted for output to a user’s screen until it had either been escaped or validated.
This stays in your brain throughout. It happens as you go.
Be Late
Escape the data as late as possible. On output is the ideal place to take care of this. Keeping track of what has been escaped in previous sections of code can be confusing and once you lose track of the data, mistakes will happen.
Assume the worst
Just because it’s likely that a user will never enter something bad into an input field doesn’t mean you shouldn’t protect against it. Escaping is not expensive.
But. Be aware.
Don’t walk around only assuming the worst. Instead, be aware of the situation. Know that:
I wrote this, I can trust that it won’t change as it displays on the user’s screen.
This example looks ridiculous and I can trust it. x and y have been assigned values that I can see do not cause issues between an H1 element. I’m outputting those variables immediately in that H1 element without additional processing.
In this case, we have no idea what is going to be assigned to the site_title variable from the API. We’d like to think that the API has our best interests in mind, but what if they get hacked and start producing dirty data.
Data coming in via $_SERVER, $_GET, $_POST, $_REQUEST, $_FILES and $_COOKIE
Just because it is a core function, doesn’t mean you can trust the data. And this isn’t a bad thing. Core can not decide how data should be handled. The onus is on developers to keep track of data as it works through the system and to make sure it is in a format ready for display as late as possible before displaying.
When you do manage data with purpose, you’re telling anybody reading your code—including your future self—that the data being output has been managed well and can be trusted. A reviewer or contributor does not have to worry if the data has been properly handled and they can trust that things are being done right.
Data that is handled with care can be trusted.
- esc_html()
- esc_attr()
- esc_url()
- esc_textarea()
- esc_js()
- wp_kses_post()
Resources for Learning about XSS
- http://ha.ckers.org/xsscalc.html
- http://homakov.blogspot.com/
- https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet
Responses and reactions
Replies
[…] last of the Lightning Talks was by Jeremy Felt on the subject of “Trust… and data.” Jeremy made a few very important points that I really think need to be […]
The only requirement for your mention to be recognized is a link to this post in your post's content. You can update or delete your post and then re-submit the URL in the form to update or remove your response from this page.
Learn more about Webmentions.