Alex Thomo's Blog

Posts

Restaurants Map for Victoria and Vancouver

Here is a great site for finding restaurants in Victoria and Vancouver using a very responsive map. It also comes with quick links to Yelp pages of the restaurants. Here is the link of FoodAdvisor: www.foodadvisor.xyz

Some time back I wrote some notes on the theme of Visibly Pushdown Automata (VPAs) vs. bottom-up Tree Automata (TA) for XML. In general, both VPAs and TAs are great tools and the bottom line is that, depending on the case, sometimes VPAs come in handy and some other times, TAs do a better job. Representing XML schemas in the form of extended DTDs (EDTDs) using VPAs or TAs is theoretically the same. Both VPAs and TAs fully capture EDTDs, and furthermore the complexity of important decision problems (such as inclusion) for both VPAs and TAs is the same. However, a reason for possibly preferring VPAs over TAs for XML is that VPAs are often more natural and exponentially more succinct than TAs when it comes to "semi-formally" specify documents using pattern-based conditions on the global linear order of XML. If you would like to read more about this and other problems benefiting from the use of VPAs or TAs, you can read the following document, which comes with an example a...

SQLite Musicbrainz database

Here is a rar file with a stripped down SQLite Musicbrainz database. The compressed file is about 700 MB, so it will take around 10 min to download with a good connection. This database is for educational purposes. I have not deleted tuples from the main tables, just removed some columns and tables I thought weren't necessary to experience and understand the data. Here is a database schema diagram and a file with create table statements and explanatory comments. Also here is a Toad data model . The raw data used for the database was downloaded from Musicbrainz on Oct 14, 2012.

New book on MapReduce

Andrei Lopatenko send me a note about this new book on large scale text processing with MapReduce: http://www.umiacs.umd.edu/~jimmylin/book.html

Latent Semantic Analysis Tutorial

I wrote a tutorial on Latent Semantic Analysis (LSA). It can be accessed by following this link . I believe LSA is a very interesting method for ranking documents in response to a query. LSA is a method for discovering hidden concepts in document data. Each document and term (word) is expressed as a vector with elements corresponding to these concepts. Each element in a vector gives the degree of participation of the document or term in the corresponding concept. The goal is not to describe the concepts verbally, but to be able to represent the documents and terms in a unified way for exposing document-document, document-term, and term-term similarities or semantic relationship which are otherwise hidden. An Example Suppose we have the following set of five documents d1 : Romeo and Juliet. d2 : Juliet: O happy dagger! d3 : Romeo died by dagger. d4 : “Live free or die”, that’s the New-Hampshire’s motto. d5 : Did you know, New-Hampshire is in New-England. and search query: dies , dagger ...

MapReduce

Google's MapReduce is a new parallelism framework for processing large amounts of data. Some recommended links are: Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff Ullman ( http://www.mmds.org/ ). Wu-Jun Li's course at Shangai Jiao Tong University: http://cs.nju.edu.cn/lwj/course/mmds.html