Class 33: Tools, Workflow & more NLP

Review

sklearn estimators have: fit, predict and score methods

split, train and test are the steps we do, but not the names of methods of objects in sklearn

matplotlib vs seaborn:

We’ve learned two plotting libraries. Why would you use seaborn over matplotlib?

  • seaborn works better when the data is in a dataframe while matplotlib can work better with data in array form

  • Matplotlib is more basic plotting, whereas seaborn has less syntax but more plotting customization and themes

  • Seaborn uses simpler syntax and easier to make complex intuitive plots with short commands

  • seaborn is a higher level library that uses matplotlib “under the hood”. It’s basically an easier way to graph, but directly using matplotlib can give us more freedom, but forces you to make all of the decisions.

  • seaborn helps make “good” types of plots, both common and easy for people, on average, to read. Matplotlib, lets you do anything you want, even things that are likely to be confusing

  • seaborn can show some statistical calculations while plotting

  • seaborn makes categories easier to show, with for example the hue, row, and column parameters

Vector representation review

Given this Vocabulary:

[‘and’, ‘are’, ‘cat’, ‘cats’, ‘dogs’, ‘pets’, ‘popular’, ‘videos’]

represent: Cats and dogs are pets

Since we have the vocabulary, we can go word by word in the vocabulary and [1,1,0,1,1,1,0,0]

build the vocabulary and transform with “Cats and dogs are pets” [ 1,1,1,1,1]

Classification with text

# %load http://drsmb.co/310
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import euclidean_distances
from sklearn import datasets
import pandas as pd
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
ng_X,ng_y = datasets.fetch_20newsgroups(categories =['comp.graphics','sci.crypt'],
                                       return_X_y = True)
ng_X[0], ng_y[0]
("From: robert@cpuserver.acsc.com (Robert Grant)\nSubject: Virtual Reality for X on the CHEAP!\nOrganization: USCACSC, Los Angeles\nLines: 187\nDistribution: world\nReply-To: robert@cpuserver.acsc.com (Robert Grant)\nNNTP-Posting-Host: cpuserver.acsc.com\n\nHi everyone,\n\nI thought that some people may be interested in my VR\nsoftware on these groups:\n\n*******Announcing the release of Multiverse-1.0.2*******\n\nMultiverse is a multi-user, non-immersive, X-Windows based Virtual Reality\nsystem, primarily focused on entertainment/research.\n\nFeatures:\n\n   Client-Server based model, using Berkeley Sockets.\n   No limit to the number of users (apart from performance).\n   Generic clients.\n   Customizable servers.\n   Hierachical Objects (allowing attachment of cameras and light sources).\n   Multiple light sources (ambient, point and spot).\n   Objects can have extension code, to handle unique functionality, easily\n        attached.\n\nFunctionality:\n\n  Client:\n   The client is built around a 'fast' render loop. Basically it changes things\n   when told to by the server and then renders an image from the user's\n   viewpoint. It also provides the server with information about the user's\n   actions - which can then be communicated to other clients and therefore to\n   other users.\n\n   The client is designed to be generic - in other words you don't need to\n   develop a new client when you want to enter a new world. This means that\n   resources can be spent on enhancing the client software rather than adapting\n   it. The adaptations, as will be explained in a moment, occur in the servers.\n\n   This release of the client software supports the following functionality:\n\n    o Hierarchical Objects (with associated addressing)\n\n    o Multiple Light Sources and Types (Ambient, Point and Spot)\n\n    o User Interface Panels\n\n    o Colour Polygonal Rendering with Phong Shading (optional wireframe for\n\tfaster frame rates)\n\n    o Mouse and Keyboard Input\n\n   (Some people may be disappointed that this software doesn't support the\n   PowerGlove as an input device - this is not because it can't, but because\n   I don't have one! This will, however, be one of the first enhancements!)\n\n  Server(s):\n   This is where customization can take place. The following basic support is\n   provided in this release for potential world server developers:\n\n    o Transparent Client Management\n\n    o Client Message Handling\n\n   This may not sound like much, but it takes away the headache of\naccepting and\n   terminating clients and receiving messages from them - the\napplication writer\n   can work with the assumption that things are happening locally.\n\n   Things get more interesting in the object extension functionality. This is\n   what is provided to allow you to animate your objects:\n\n    o Server Selectable Extension Installation:\n        What this means is that you can decide which objects have extended\n        functionality in your world. Basically you call the extension\n        initialisers you want.\n\n    o Event Handler Registration:\n        When you develop extensions for an object you basically write callback\n        functions for the events that you want the object to respond to.\n        (Current events supported: INIT, MOVE, CHANGE, COLLIDE & TERMINATE)\n\n    o Collision Detection Registration:\n        If you want your object to respond to collision events just provide\n        some basic information to the collision detection management software.\n        Your callback will be activated when a collision occurs.\n\n    This software is kept separate from the worldServer applications because\n    the application developer wants to build a library of extended objects\n    from which to choose.\n\n    The following is all you need to make a World Server application:\n\n    o Provide an initWorld function:\n        This is where you choose what object extensions will be supported, plus\n        any initialization you want to do.\n\n    o Provide a positionObject function:\n        This is where you determine where to place a new client.\n\n    o Provide an installWorldObjects function:\n        This is where you load the world (.wld) file for a new client.\n\n    o Provide a getWorldType function:\n        This is where you tell a new client what persona they should have.\n\n    o Provide an animateWorld function:\n        This is where you can go wild! At a minimum you should let the objects\n        move (by calling a move function) and let the server sleep for a bit\n        (to avoid outrunning the clients).\n\n    That's all there is to it! And to prove it here are the line counts for the\n    three world servers I've provided:\n\n        generic - 81 lines\n        dactyl - 270 lines (more complicated collision detection due to the\n                           stairs! Will probably be improved with future\n                           versions)\n        dogfight - 72 lines\n\nLocation:\n\n   This software is located at the following site:\n   ftp.u.washington.edu\n\n   Directory:\n   pub/virtual-worlds\n\n   File:\n   multiverse-1.0.2.tar.Z\n\nFutures:\n\n   Client:\n\n    o Texture mapping.\n\n    o More realistic rendering: i.e. Z-Buffering (or similar), Gouraud shading\n\n    o HMD support.\n\n    o Etc, etc....\n\n   Server:\n\n    o Physical Modelling (gravity, friction etc).\n\n    o Enhanced Object Management/Interaction\n\n    o Etc, etc....\n\n   Both:\n\n    o Improved Comms!!!\n\nI hope this provides people with a good understanding of the Multiverse\nsoftware,\nunfortunately it comes with practically zero documentation, and I'm not sure\nwhether that will ever be able to be rectified! :-(\n\nI hope people enjoy this software and that it is useful in our explorations of\nthe Virtual Universe - I've certainly found fascinating developing it, and I\nwould *LOVE* to add support for the PowerGlove...and an HMD :-)!!\n\nFinally one major disclaimer:\n\nThis is totally amateur code. By that I mean there is no support for this code\nother than what I, out the kindness of my heart, or you, out of pure\ndesperation, provide. I cannot be held responsible for anything good or bad\nthat may happen through the use of this code - USE IT AT YOUR OWN RISK!\n\nDisclaimer over!\n\nOf course if you love it, I would like to here from you. And anyone with\nPOSITIVE contributions/criticisms is also encouraged to contact me. Anyone who\nhates it: > /dev/null!\n\n************************************************************************\n*********\nAnd if anyone wants to let me do this for a living: you know where to\nwrite :-)!\n************************************************************************\n*********\n\nThanks,\n\nRobert.\n\nrobert@acsc.com\n^^^^^^^^^^^^^^^\n",
 0)
counts = CountVectorizer()
ng_vec = counts.fit_transform(ng_X)
ng_vec
<1179x24257 sparse matrix of type '<class 'numpy.int64'>'
	with 188291 stored elements in Compressed Sparse Row format>
ng_vec[:1].toarray()
array([[0, 0, 0, ..., 0, 0, 0]])
clf = MultinomialNB()
ng_vec_train, ng_vec_test, ng_y_train, ng_y_test = train_test_split(ng_vec,ng_y)
clf.fit(ng_vec_train,ng_y_train).score(ng_vec_test,ng_y_test)
0.9932203389830508