Category: Web Development

The Tweet, The Whole Tweet and Nothing But the Tweet So Help Me Twitter

I used the Twitter Search API to collect tweet content for a project and kept getting truncated (incomplete) tweets. I asked for help and Twitter answered.

Problem

If you are doing analysis with text data (e.g. sentiment analysis), the completeness of data matters. For example:

BigCo CEO fired...

has a dramatically different meaning than:

BigCo CEO fired a gun in McMansion and puts a hole in the ceiling

The documentation does not directly address truncated tweets; reading documentation to get unstuck is like reading a medical dictionary during a heart attack. I scoured the Internet, then posted on Twitter Developers forum, and an actual Twitter staff member responded.

The GET Call

First, I’ll try to explain the process of the Twitter API call. In Node, two main libraries exist for consuming the Twitter API: one aptly named Twitter, and the other named twit. They essentially work the same way; I chose ‘Twitter’ because it was the first one I  found. This tutorial is a good introduction to the library.

I’ll skip the part on how to install Node modules and how to get access tokens from Twitter Application Management as they are not central to the problem.

Once you have the tokens, store them in a config.js file in the same directory of the project:

Load the tokens and initiate a new Twitter client:

Each API call contains a set of parameters, for example: the search term, the number of tweets returned, and the geolocation the tweet originated from. A sample parameter that searches for “10 recent English-language tweets originating from New York City containing the hashtag #DonaldTrump” would look like this:

I thought the truncated: false on line 7 meant that the tweets wouldn’t come in truncated, but it didn’t, because the tweets did come in truncated even with this setting. Using this code to make the GET request:

This is a sample of the data object returned by the GET call, and it contains one single tweet object. The text property on line 5 is the actual tweet content:

@RepMaxineWaters on GOP colleagues: They cannot credibly come before the American public and defend #DonaldTrump. They’re a…

They’re a… what?  This isn’t a severe case of misinformation, as the negative tone is clear in the first sentence. Nevertheless, I wanted to know the definitive way of retrieving a complete tweet.

As seen in this answer from Twitter, the key is to add tweet_mode: extended (line 8) and retweeted_status: { truncated: false } (line 9) to the parameters, as seen in the code snippet below.

A GET call with these additional parameters returns a different data object which contains a full_text property as seen on line 5 below. Note that the data object in the previous GET call does not contain a full_text property. In this current GET call, if the data object is an original tweet, it would look like this:

If the data object is a retweet, its own full_text property is truncated, and it would contain a retweeted_status property to hold the original tweet it is citing, as seen on line 21 in the code snippet below. Note that the data object of an original tweet in the previous code snippet does not contain a retweeted_status property.

If you do desire to get the text of the tweet cited by this retweet, you can call the retweet_status.full_text property. That property is not logged by the console, and therefore isn’t visible in the code snippet above, but it does exist in the object and I have tested it.

Conclusion

In any given Twitter API search call:

  1. Original tweets always have complete full_text properties.
  2. Retweets always have truncated full_text properties but complete
    retweet_status.full_text properties.

From a developer’s perspective, the fact that a tweet should ever be truncated is inconvenient. However, this is an acceptable outcome, and I’m grateful to Twitter for responding quickly. The full script for the API call is available here, and the relevant discussion is available here.

Unpacking Values in Python and JavaScript

While reading some TensorFlow code in a Stanford tutorial,  I noticed a type of multiple-variable assignment I’m not familiar with:

x_train, x_test, y_train, y_test = cross_validation.train_test_split(
 iris.data, iris.target, test_size=0.2, random_state=42)

Turns out to be a technique called “unpacking,” typically done via tuples in Python and arrays in JavaScript. Additionally, the technique is referred to as “destructuring” in JavaScript, introduced as a new feature in ES6.

Python Examples

Input:

def return4values():
    return [1,2,3,4]

ONE, TWO, *THREE = return4values()
print("One:{}".format(ONE))
print("Two:{}".format(TWO))
print("Three:{}".format(THREE))

Output:

>> One:1
>> Two:2
>> Three:[3, 4]

When calling the function, the variable preceded by an asterisk gets the “remaining” values not picked up by the other variables:

*ONE, TWO, THREE = return4values()
print("One:{}".format(ONE))
print("Two:{}".format(TWO))
print("Three:{}".format(THREE))

Output:

>> One:[1, 2]
>> Two:3
>> Three:4

JavaScript Examples

Sourced from MDN:

var a, b, rest;
[a, b] = [10, 20];
console.log(a); // 10
console.log(b); // 20

[a, b, ...rest] = [10, 20, 30, 40, 50];
console.log(a); // 10
console.log(b); // 20
console.log(rest); // [30, 40, 50]

({ a, b } = { a: 10, b: 20 });
console.log(a); // 10
console.log(b); // 20

// Stage 3 proposal
({a, b, ...rest} = {a: 10, b: 20, c: 30, d: 40});
console.log(a); // 10
console.log(b); // 20
console.log(rest); //{c: 30, d: 40}

Related

In the last line of scikit-learn’s cross_validation module, a similar technique from Python itertools is used:

return list(chain.from_iterable((safe_indexing(a, train),
                                     safe_indexing(a, test)) for a in arrays))

The Web Can Live

Work with the system we have or build the system we want?

Mike Hearn, a former Google employee and Bitcoin developer, proposed to kill the Web and build a new platform for developing and delivering applications, arguing that the unmanageable complexity of the Web and its security flaws warrant its death. The piece pretty much reads like a marketing manifesto for a product that doesn’t even exist yet. I’m not convinced by the message, but it reminds me of the subway problem in New York.

The Web is like the New York subway system. The Web was born in 1989, and the subway in 1904. When they were conceived, they were not expected to perform at today’s scale.

The original City Hall subway station in New York City. (Untapped Cities)

As New York’s population grew, the subway’s capacity was incrementally added whenever needs arose. Routes were added, new tracks and stations were built, and old trains were replaced by new trains. Incrementally. All trains are dependent on an old signaling system that has not been thoroughly updated. The increasing loads are putting pressure on the system to increase supply of rides, which it fails to, or at least is perceived to have failed.

The Web was designed to display, interlink, share, and browse documents. It was not designed to serve up sophisticated applications to enable business transactions and personal activities. The Web became popular when users discovered they could conduct business and personal activities with an efficiency an order of magnitude better than the way they had been conducting them.

Tim Berners-Lee, inventor of the World Wide Web. (CERN)

To fix the subway, you must disrupt people’s lives in order to make meaningful changes; there is no alternative. The trains and roadways are already saturated. To overhaul the subway signal system, it might not be possible to selectively halt several lines and leave other lines open for service. There will come a time when there has to be a large scale outage to test the signal system. Because failure has potentially grave consequences, the scale and magnitude of the testing has to be considerable.

To fix the Web, you don’t need to kill it. Just offer an alternative and see if it proves to be a worthy replacement. Besides, the Web is not broken at all. It works, and it gives businesses and consumers what they want. The main problem with the Web is that it is simple for users to use it, and complex for developers to make things that users want. It’s not impossible to develop for the Web, just difficult. In that sense, the system is not sufficiently faulty to warrant an imminent and complete overhaul. Users aren’t complaining; developers are.

An obstacle to introducing a new Web applications platform will be the politics. Bitcoin is technically viable and popular, but its rise became derailed when power concentrated, according to Hearn. A core issue of any attempts on a new Web app platform would probably be handling the power structure and negotiating government regulations.

Open source software products like Python and JavaScript enjoyed enormous success in providing programming tools for free. Same with React and Ruby on Rails in Web development frameworks. I don’t see why there shouldn’t be an open source Web development platform that offers a new set of protocols as an alternative to HTML, HTTP, and so on; I could even imagine this innovation on a hardware level. But you really have to prove the efficacy of the alternative before advocating the death of the incumbent. This isn’t a presidential election.

© 2018

Theme by Anders NorenUp ↑