terça-feira, 31 de julho de 2012

Natural Language in Voice Search



On July 26 and 27, we held our eighth annual Computer Science Faculty Summit on our Mountain View Campus. During the event, we brought you a series of blog posts dedicated to sharing the Summit's talks, panels and sessions, and we continue with this glimpse into natural language in voice search. --Ed

At this year’s Faculty Summit, I had the opportunity to showcase the newest version of Google Voice Search. This version hints at how Google Search, in particular on mobile devices and by voice, will become increasingly capable of responding to natural language queries.

I first outlined the trajectory of Google Voice Search, which was initially released in 2007. Voice actions, launched in 2010 for Android devices, made it possible to control your device by speaking to it. For example, if you wanted to set your device alarm for 10:00 AM, you could say “set alarm for 10:00 AM. Label: meeting on voice actions.” To indicate the subject of the alarm, a meeting about voice actions, you would have to use the keyword “label”! Certainly not everyone would think to frame the requested action this way. What if you could speak to your device in a more natural way and have it understand you?

At last month’s Google I/O 2012, we announced a version of voice actions that supports much more natural commands. For instance, your device will now set an alarm if you say “my meeting is at 10:00 AM, remind me”. This makes even previously existing functionality, such as sending a text message or calling someone, more discoverable on the device -- that is, if you express a voice command in whatever way feels natural to you, whether it be “let David know I’ll be late via text” or “make sure I buy milk by 3 pm”, there is now a good chance that your device will respond how you anticipated it to.

I then discussed some of the possibly unexpected decisions we made when designing the system we now use for interpreting natural language queries or requests. For example, as you would expect from Google, our approach to interpreting natural language queries is data-driven and relies heavily on machine learning. In complex machine learning systems, however, it is often difficult to figure out the underlying cause for an error: after supplying them with training and test data, you merely obtain a set of metrics that hopefully give a reasonable indication about the system’s quality but they fail to provide an explanation for why a certain input lead to a given, possibly wrong output.

As a result, even understanding why some mistakes were made requires experts in the field and detailed analysis, rendering it nearly impossible to harness non-experts in analyzing and improving such systems. To avoid this, we aim to make every partial decision of the system as interpretable as possible. In many cases, any random speaker of English could look at its possibly erroneous behavior in response to some input and quickly identify the underlying issue - and in some cases even fix it!

We are especially interested in working with our academic colleagues on some of the many fascinating research and engineering challenges in building large-scale, yet interpretable natural language understanding systems and devising the machine learning algorithms this requires.

Using WinForms from a Console application

It is perfectly possible to use WinForms from a Console application.  You just need to add a Reference to System.Windows.Forms.  Thereafter, you can add any code that uses WinForms.  The only pitfall is that you need add the [STAThread] attribute to your Main method.  Otherwise, the OpenFileDialog instantiation will block/deadlock when running on Microsoft .net framework.  On Mono, everything runs fine without the [STAThread] attribute, but to eliminate portability issues, you probably want to add it.

[STAThread]

public static void Main (string[] args)
{
    OpenFileDialog openDialog = new OpenFileDialog ();
    openDialog.ShowDialog ();
}

sexta-feira, 27 de julho de 2012

New Challenges in Computer Science Research



Yesterday afternoon at the 2012 Computer Science Faculty Summit, there was a round of lightning talks addressing some of the research problems faced by Google across several domains. The talks pointed out some of the biggest challenges emerging from increasing digital interaction, which is this year’s Faculty Summit theme.

Research Scientist Vivek Kwatra kicked things off with a talk about video stabilization on YouTube. The popularity of mobile devices with cameras has led to an explosion in the amount of video people capture, which can often be shaky. Vivek and his team have found algorithmic approaches to make casual videos look more professional by simulating professional camera moves. Their stabilization technology vastly improves the quality of amateur footage.

Next, Ed Chi (Research Scientist) talked about social media focusing on the experimental circle model that characterizes Google+. Ed is particularly interested in how social interaction on the web can be designed to mimic live communication. Circles on Google+ allow a user to manage their audience and share content in a targeted fashion, which reflects face-to-face interaction. Ed discussed how, from an HCI perspective, the challenge going forward is the need to consider the trinity of social media: context, audience, content.

John Wilkes, Principal Software Engineer, talked about cluster management at Google and the challenges of building a new cluster manager-- that is, an operating system for a fleet of machines. Everything at Google is big and a consequence of operating at such tremendous scale is that machines are bound to fail. John’s team is working to make things easier for internal users enabling our ability to respond to more system requests. There are several hard problems in this domain, such as issues with configuration, making it as easy as possible to run a binary, increasing failure tolerance, and helping internal users understand their own needs as well as the behavior and performance of their system in our complicated distributed environment.

Research Scientist and coffee connoisseur Alon Halevy took to the podium to confirm that he did indeed author an empirical book on coffee, and also talked with attendees about structured data on the web. Structured data is comprised of hundreds of millions of (relatively small) tables of data, and Alon’s work is focused on enabling data enthusiasts to discover and visualize those data sets. Great possibilities open up when people start combining data sets in meaningful ways, which inspired the creation of Fusion Tables. An example is a map made in the aftermath of the 2011 earthquake and tsunami in Japan, that shows natural disaster data alongside the locations of the world’s nuclear plants. Moving forward, Alon’s team will continue to think about interesting things that can be done with data, and the techniques needed to distinguish good data from bad data.

To wrap up the session, Praveen Paritosh did a brief, but deep dive into the Knowledge Graph, an intelligent model that understands real-world entities and their relationships to one another-- things, not strings-- which launched earlier this year.

The Google Faculty Summit continued today with more talks, and breakout sessions centered on our theme of digital interaction. Check back for additional blog posts in the coming days.

Education in the Cloud



In the last 10 years, we’ve seen a major transition from stand-alone applications that run on desktop computers to applications running in the cloud. Unfortunately, many computer science students don’t have the opportunity to learn and work in the cloud due to a lack of resources in traditional undergrad programs. Without this access students are limited to the resources their school can provide.

So today, we’re announcing a new award program: the Google App Engine Education Awards. We are excited because Google App Engine can teach students how to build sophisticated large-scale systems in the cloud without needing access to a large physical network.

Google App Engine can be used to build mobile or social applications, traditional browser-based applications, or stand-alone web services that scale to millions of users with ease. The Google App Engine infrastructure and storage tools are useful for collecting and analyzing educational data, building a learning management system to organize courses, or implementing a teacher forum for exchanging ideas and practices. All of these adaptations of the Google App Engine platform will use the same infrastructure that powers Google.

We invite teachers at universities across the United States to submit a proposal describing how to use Google App Engine for their course development, educational research or tools, or for student projects. Selected proposals will receive $1,000 in App Engine credits.

If you teach at an accredited college, university or community college in the US, we encourage you to apply. You can submit a proposal by filling out this form. The application deadline is midnight PST August 31, 2012.

quinta-feira, 26 de julho de 2012

Big Pictures with Big Messages



Google’s Eighth Annual Computer Science Faculty Summit opened today in Mountain View with a fascinating talk by Fernanda Viégas and Martin Wattenberg, leaders of the data visualization group at our Cambridge office. They provided insight into their design process in visualizing big data, by highlighting Google+ Ripples and a map of the wind they created.

To preface his explanation of the design process, Martin shared that his team “wants visualization to be ‘G-rated,’ showing the full detail of the data - there’s no need to simplify it, if complexity is done right.” Martin discussed how their wind map started as a personal art project, but has gained interest particularly among groups that are interested in information on the wind (sailors, surfers, firefighters). The map displays surface wind data from the US National Digital Forecast Database and updates hourly. You can zoom around the United States looking for where the winds are fastest - often around lakes or just offshore - or check out the gallery to see snapshots of the wind from days past.


Fernanda discussed the development of Google+ Ripples, a visualization that shows how news spreads on Google+. The visualization shows spheres of influence and different patterns of spread. For example, someone might post a video to their Google+ page and if it goes viral, we’ll see several circles in the visualization. This depicts the influence of different individuals sharing content, both in terms of the number of their followers and the re-shares of the video, and has revealed that individuals are at times more influential than organizations in the social media domain.


Martin and Fernanda closed with two important lessons in data visualization: first, don’t “dumb down” the data. If complexity is handled correctly and in interesting ways, our users find the details appealing and find their own ways to interact with and expand upon the data. Second, users like to see their personal world in a visualization. Being able to see the spread of a Google+ post, or zoom in to see the wind around one’s town is what makes a visualization personal and compelling-- we call this the “I can see my house from here” feature.

The Faculty Summit will continue through Friday, July 27 with talks by Googlers and faculty guests as well as breakout sessions on specific topics related to this year’s theme of digital interactions. We will be looking closely at how computation and bits have permeated our everyday experiences via smart phones, wearable computing, social interactions, and education.

We will be posting here throughout the summit with updates and news as it happens.

quarta-feira, 25 de julho de 2012

Site Reliability Engineers: “solving the most interesting problems”



I recently sat down with Ben Appleton, a Senior Staff Software Engineer, to talk about his recent move from Software Engineer (SWE) on the Maps team to Site Reliability Engineering (SRE). In the interview, Ben explains why he transitioned from a pure development role to a role in production, and how his work has changed:

Chris: Tell us about your path to Google.
Ben: Before I joined Google I didn’t consider myself a “software engineer”. I went to the University of Queensland and graduated with a Bachelor’s Degree in Electrical Engineering and Mathematics, before going on to complete a Ph.D. My field of research was image segmentation, extending graph cuts to continuous space for analyzing X-rays and MRIs. At a conference in France I met a friend of my Ph.D. advisor’s, and he raved about Google, commenting that they were one of the only companies that really understood technology. I’d already decided academia wasn’t for me, so I interviewed for a general Software Engineering role at Google. I enjoyed the interviews, met some really smart people, and learned about some interesting stuff they were working on. I joined the Maps team in Sydney in 2005 and spent the next 6 years working on the Maps API.

Chris: Tell us about some of the coolest work you did for Google Maps, and how you applied your research background.
Ben: My background in algorithms and computational geometry was really useful. We were basically making browsers do stuff they’re not designed to do, such as rendering millions of vectors or warping images, inventing techniques as we went. On the server-side we focused on content distribution, pushing tiles or vectors from Google servers down through caches to the user’s browser, optimizing for load and latency at every stage. On the client-side, we had to make the most of limited processors with new geometric algorithms and clever prefetching to hide network latency. It was really interesting work.

Chris: I understand you received company-wide recognition when you were managing the Maps API team. Tell us more about what that entailed.
Ben: In September 2008, when managing the Maps API, my team received an award that was recognized Google-wide, which is a big honor. My main contributions were latency optimizations, stability, enterprise support, and Street View integration. The award was in recognition of strong sustained growth of the Maps API, in relation to the number of sites using it, and total views per day. Currently the Google Maps API is serving more than 600,000 websites.

Chris: So what prompted the move to Site Reliability Engineering (SRE)?
Ben: In my experience, a lot of software engineers don’t understand what SREs do. I’d worked closely with SREs, particularly those in Sydney supporting Maps, and had formed a high opinion of them. They’re a very strong team - they’re smart and they get things done. After 6 years working on the Maps API I felt it was time for a change. In Sydney there are SWE teams covering most of the product areas, including Chrome and Apps, Social and Blogger, Infrastructure Networking and the Go programming language, as well as Maps and GeoCommerce. I talked to all of them, but chose SRE because in my opinion, they’re solving the most interesting problems.

Chris: How would you describe SRE?
Ben: It really depends on the individual. At one end are the Systems Administrator types, sustaining ridiculously large systems. But at the other end are the Software Engineers like me. As SREs get more experienced this distinction tends to be blurred. The best SREs think programmatically even if they don’t do the programming. For me, I don’t see a difference in my day-to-day role. When I was working on the Maps API I was the primary on-call one week in three, whereas in SRE the typical on-call roster is one week in six. When you’re primary on-call it just means you’re the go-to person for the team, responsible for when something breaks or pushing new code into production. I was spending 50% of my time doing coding and development work, and as an SRE this has increased to 80%.

Chris: Wow! So as an SRE in Production, you’re spending less time on-call and more time writing code than you were as a SWE on the Maps team?
Ben: Yes! I’m not managing a team now, but I’m definitely spending more time coding than I was before. I guess the average SRE spends 50% of their time doing development work, but as I said, it depends on the person and it ranges from 20-80%.

Chris: What does your team do?
Ben: In Sydney there are SRE teams supporting Maps, Blogger, App Engine, as well as various parts of the infrastructure and storage systems. I’m working on Blobstore, an infrastructure storage service based on Bigtable which simplifies building and deploying applications that store users' binary data (BLOBs, or "Binary Large OBjects"). Example BLOBs include images, videos, or email attachments - any data objects that are immutable and long-lived. The fact that we're storing user data means that Blobstore must be highly available for reads and writes, be extremely reliable (so that we never lose data), and be efficient in terms of storage usage (so that we can provide large amounts of storage to users at low cost).

Chris: Tell us more about some of the problems you’re solving, and how they differ with those you faced as a SWE in a development role.
Ben: With the massive expansion in online data storage, we’re solving problems at a scale never before seen. Due to the global nature of our infrastructure, we think in terms of load balancing at many levels: across regions, across data centers within a region, and across machines within a data center. The problems we’re facing in SRE are much closer to the metal. We’re constantly optimizing resource allocation and efficiency and scalability of Google’s massive computer systems, as opposed to developing new features for a product like Maps. So the nature of the work is very similar to SWE, but the problems are bigger and there is a strong focus on scale.

Chris: Are you planning on staying in SRE for a while?
Ben: Yeah. I signed up for a six month rotation program called “Mission Control,” the goal of which is to teach engineers to understand the challenges of building and operating a high reliability service at Google scale. In other words, it’s an SRE training program. In my first three months of Mission Control I’ve been on-call twice, and always during office hours so there were SREs to help me when I got into trouble...which I did. I’ve got no intention of going back to SWE at the end of the six months and plan to stay in SRE for at least a few years. Right now the problems seem more interesting. For example, last year’s storage solutions are facing additional strain from the growth of Gmail, Google+ and Google Drive. So you’re constantly reinventing.

Chris: What advice do you have for Software Engineers contemplating a role in SRE?
Ben: SRE gives you the opportunity to work on infrastructure at a really big scale in a way you don’t get to in SWE. Whereas SWE is more about developing new features, SRE is dealing with bigger problems and more complex engineering due to the sheer scale. SRE is a great way to learn how systems really work in order to become a great engineer.

If you’re interesting in applying for a Site Reliability Engineering role, please note that we advertise the roles in several different ways to reflect the diversity of the team. The two main roles are “Software Engineer, Google.com” and “Systems Engineer, Google.com”. We use the term “Google.com” to signify that the roles are in Production as opposed to R&D. You can find all the openings listed on the Google jobs site. We’re currently hiring across many regions, including Sydney in Australia, and of course Mountain View in California.

segunda-feira, 23 de julho de 2012

FxGqlC: DailyRollingFileAppender in log4j/log4net/log4cxx

Using log4j or one of the ports (like log4net or log4cxx), you can configure the appender to "roll" to a new file every day:
log4j.rootLogger=INFO, logfile

# logfile appender: writes its output to a file that is rolled each midnight.
log4j.appender.logfile=org.apache.log4j.DailyRollingFileAppender
log4j.appender.logfile.File=c:/logs/MyLogFile.log
log4j.appender.logfile.Append=true
log4j.appender.logfile.DatePattern='.'yyyy-MM-dd
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d{}{GMT} %X{pid} %X{pname} [%t-%X{tname}] %-5p - %c %m%n


This gives you one log file per day, suffixed with the date, except the last day:

  • MyLogFile.log.2012-07-21
  • MyLogFile.log.2012-07-22
  • MyLogFile.log
When running the query "select distinct $filename from [*.*]" in FxGqlC, you get this list:
MyLogFile.log
MyLogFile.log.2012-07-21
MyLogFile.log.2012-07-22

So, the most recent file is scanned first, because this is the "ascending" order as returned by the operating system.

A workaround for this is to change the query like this: "select distinct $filename from [*.log.2*], [*.log]"

This workaround also applies to the regular RollingFileAppender, based on filesize instead of date.

A future feature will be to sort the files on modification date, by extending the FROM-clause option -fileorder.  Possibly in version 2.3.

Update: the FROM-clause option -fileorder has been extended in v2.3 to allow an order based on modification time.  For more information, have a look at:
https://sites.google.com/site/fxgqlc/home/fxgqlc-manual/changes-in-fxgqlc-2-3

sábado, 21 de julho de 2012

Gource video of FxGqlC 2.2

I refreshed the Gource video of FxGqlC to reflect the latest version:





This time, the video was created on Ubuntu, and it is even simpler then on Windows.  These are the commands that I've run in a terminal window:

  • sudo apt-get install gource
  • sudo apt-get install ffmpeg
  • cd ~/Projects/FxGqlC
    (the GIT-directory where the FxGqlC source code is located)
  • gource -s 0.25 -title 'FxGqlC' -i 1000 -o /tmp/fxgqlc-gource.ppm
    (-s 0.25 to speed up to 4 days per second, -i 1000 to prevent fading out of idle items)
  • avconv -y -r 25 -f image2pipe -vcodec ppm -i /tmp/fxgqlc-gource.ppm -vcodec wmv1 -r 25 -same_quant /tmp/fxgqlc-gource.wmv
  • upload the video /tmp/fxgqlc-gource.wmv to youtube
Initially, I used ffmpeg as on Windows which worked without problems, but I got this warning:
*** THIS PROGRAM IS DEPRECATED ***
This program is only provided for compatibility and will be removed in a future release. Please use avconv instead.

Migrating to "avconv" was no problem, since the exact same parameters could be used with it, except for the parameter -sameq which is replaced by -same_quant.

Clock problem when running Windows and Linux on same computer

When you install both Windows and Linux on the same machine (using dual/multi-boot), you get into an annoying problem with the system time:

  • Linux uses the internal clock as UTC/GMT
  • Windows uses the internal clock as Local Time.
On my box, in Belgium, this currently gives a difference of 2 hours (1 hour + 1 hour for daylight saving time) between my Windows 7 and Ubuntu.  After some time, the time is corrected because the operating systems syncs its time with an NTP-server on the internet.  But after booting into the other operating system, the problem repeats itself evidently.

This problem can be solved either in Windows or in Ubuntu.  I have chosen to change the Windows configuration:
  • Start regedit.exe
  • Go to the key:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\TimeZoneInformation
  • If it doesn't exist, add a new key (or change the existing):
    Set "RealTimeIsUniversal" as DWORD to "1"
  • Reboot
You can also solve it in Linux:
  • Open the file "/etc/default/rcS". When using a graphical shell like Unity or Gnome:
    • start a terminal window by pressing Ctrl-Alt-T
    • run "sudo gedit /etc/default/rcS" 
  • Look for the line "UTC=yes", and change it to "UTC=no"

sexta-feira, 20 de julho de 2012

Creating symlinks and hardlinks in Explorer

A very useful tool to create  hardlinks, junctions and symbolic links ("symlinks") in Windows Explorer:
Link Shell Extension

It includes an explanation on hardlinks, junctions and symbolic links used in the NTFS file system:
http://schinagl.priv.at/nt/hardlinkshellext/hardlinkshellext.html#hardlinks

quinta-feira, 19 de julho de 2012

Colorize GIT output

To colorize the output of GIT, you can enable the "color.ui" flag by executing:
git config --global --add color.ui true


More information can be found in the GIT book:
http://git-scm.com/book/ch7-1.html

quarta-feira, 18 de julho de 2012

FxGqlC 2.2 released

A new version of FxGqlC has been released.  There are many improvements, both in terms of performance and capabilities. 
Check it out on: https://sites.google.com/site/fxgqlc/home , and give it a try.
And many more things are in the pipeline, so come back in a few weeks for the next version.
The most important new features are:
  • Change the working directory with the USE statement.  Similar to the cd/chdir commands in command prompts. USE [c:\temp]USE [../subdir]USE ['sub directory']
  • Added support for variables.  Setting variables in select output (e.g. select )is not yet support
  • DECLARE @var string
    SET @var = 'US' + ' ' + 'Open'
    SELECT [Winner] FROM ['Tennis-ATP-2011.csv' -heading=on] WHERE [Tournament] = @var AND [Round] = 'The Final'
  • System variable $filename: Returns current filename (without path).  The implementation of this system variable has been changed. Before v2.2, the full filename was returned (same behavior as current system variable $fullfilename).
    SELECT DISTINCT $filename FROM ['SampleFiles\*' -recurse]-- Returns:     AirportCodes.csv     AirportCodes.csv.zip     AirportCodesTwice.zip     CountryList.csv     IP2Country.csv.zip     Tennis-ATP-2011.csv
         AirportCodes2.csv
         AirportCodes2.csv.zip
  • System variable $fullfilename: Current full filename (with complete absolute path).
    This variable is only valid in the context of a running query.SELECT DISTINCT $fullfilename FROM ['SampleFiles\*' -recurse]-- Returns:     C:\Data\SampleFiles\AirportCodes.csv     C:\Data\SampleFiles\AirportCodes.csv.zip     C:\Data\SampleFiles\AirportCodesTwice.zip     C:\Data\SampleFiles\CountryList.csv     C:\Data\SampleFiles\IP2Country.csv.zip     C:\Data\SampleFiles\Tennis-ATP-2011.csv     C:\Data\SampleFiles\SubFolder\AirportCodes2.csv     C:\Data\SampleFiles\SubFolder\AirportCodes2.csv.zip
  • Added FROM-clause options '-Heading=On', '-Heading=OnWithRule' and '-Heading=Off' (default).
    SELECT [Winner] from ['Tennis-ATP-2011.csv' -heading=on]     WHERE [Tournament] = 'US OPEN' AND [Round] = 'The Final'-- Returns:     Djokovic N.

  • Added possibility to show column headers in output, using !SET HEADING
    !SET HEADING OFFSELECT [Winner] FROM ['Tennis-ATP-2011.csv' -heading=on]     WHERE [Tournament] = 'US OPEN' AND [Round] = 'The Final'-- Returns:     Djokovic N.

    !SET HEADING ON
    SELECT [Winner] FROM ['Tennis-ATP-2011.csv' -heading=on]      WHERE [Tournament] = 'US OPEN' AND [Round] = 'The Final'-- Returns:     Winner     Djokovic N.
    !SET HEADING ONWITHRULE
    SELECT [Winner] FROM ['Tennis-ATP-2011.csv' -heading=on]      WHERE [Tournament] = 'US OPEN' AND [Round] = 'The Final'
    -- Returns:     Winner     ======     Djokovic N.
  • The -Heading option can also be used in the INTO-clause:
    SELECT [Winner]      INTO ['US OPEN Winner.txt' -heading=onwithrule]     FROM ['Tennis-ATP-2011.csv' -heading=on]      WHERE [Tournament] = 'US OPEN' and [Round] = 'The Final'
  • Added support for VIEWs:
    CREATE VIEW Tennis AS
         SELECT [Tournament], [Winner]
              FROM ['Tennis-ATP-2011.csv' -heading=on]
              WHERE [Round] = 'The final'
    SELECT * FROM Tennis
    DROP VIEW Tennis
  • Added support for parameterized VIEWs:
    CREATE VIEW Tennis(@file string, @round string) AS
         SELECT [Tournament], [Winner]
              FROM [@file -heading=on]
              WHERE [Round] = @round
    SELECT * FROM Tennis('Tennis-ATP-2011.csv', 'The final')
    DROP VIEW Tennis
  • Added support for count(*) as alternative to count(<expression>):
    SELECT count(*) FROM ['Tennis-ATP-2011.csv' -heading=on]
  • Added support for count(distinct <expression>) to count unique values:
    SELECT count(distinct [Tournament]) FROM ['Tennis-ATP-2011.csv' -heading=on] 
  • Block comments are now also supported.
    SELECT distinct [Tournament] /* block comment */ FROM ['Tennis-ATP-2011.csv' -heading=on] -- line comment
  • Added support for option -columndelimiter in FROM-clause and INTO-clause.  Until now, the tab character "\t" was always used as delimiter, which is still the default. The string specified is unescaped using the RegularExpression syntax (e.g. \t becomes a tab character).SELECT [Date], [Winner]
         INTO ['output.txt' -columndelimiter=';']
         FROM ['Tennis-ATP-2011.csv' -heading=on]
         WHERE [Tournament] = 'US OPEN' AND [Round] = 'The Final'
    -- Output.txt contains:
         12/09/2011;Djokovic N.
  • HAVING-clause: Add a filter that is applied AFTER the GROUP BY aggregation.
    SELECT [Winner], count(*) FROM ['Tennis-ATP-2011.csv' -heading=on] GROUP BY [Winner] HAVING count(*) > 60

  • Added support for "alias" in FROM-clause, which makes it possible to link subquery columns to outer query columns.
    SELECT [Date], [Tournament], [Winner],      (     SELECT count(*)           FROM ['Tennis-ATP-2011.csv' -heading=on] [inner]           WHERE [outer].[Winner] = [inner].[Winner]     )      FROM ['Tennis-ATP-2011.csv' -heading=on] [outer]      WHERE [Round] = 'The Final'
  • A startup script file is automatically executed when FxGqlC.exe is started in command mode (-c, -command), in file mode (-gqlfile) or in prompt mode (-p, -prompt).  This can be useful to create regularly used views or variables, or to execute any comand such as USE or SET.  The startup script file path can be configured using the startup option -autoexec <filename>.  When the startup option -autoexec is not present, the default startup script file "autoexec.gql" is searched, first in the current directory and then in the directory where FxGqlC.exe is located. 

terça-feira, 17 de julho de 2012

Reducing PDF file size

Using GhostScript, it is very easy to make PDF files smaller. Run this command:
gs -sDEVICE=pdfwrite -dPDFSETTINGS=/screen -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=NewFile.pdf OriginalFile.pdf

segunda-feira, 16 de julho de 2012

Subversion working copy locked

When you get the SubVersion (svn) error "Working copy <directory> locked" in Tortoise, you can try this to unlock the directory:

  • Open a command prompt, and change the directory to your locked subversion directory.
  • Run svn cleanup

sexta-feira, 13 de julho de 2012

Google at SIGMOD/PODS 2012



Over the years, SIGMOD has expanded beyond a traditional "database" conference to include several areas related to information management. This year’s ACM SIGMOD/PODS conference (on Management of Data, and Principles of Database Systems), held in Scottsdale, Arizona was no different. We were impressed by the wide variety of researchers from industry and academia alike the conference attracted, and enjoyed learning how others are pushing the limits of scalability in data storage and processing. In addition to an excellent set of papers on a large number of topics, we saw a couple of recurring themes:

1) Data Visualization
  • Pat Hanrahan from Stanford gave a keynote on some of the challenges involved in building systems to enable "data enthusiasts" to manage and visualize data. 

2) Big Data


As has been the case for the last couple of years, “Big Data" has been of ever-growing interest to the entire community, particularly from industry. Google presented a talk on F1, a new distributed database system we’ve built to power the AdWords system. A complex business application like AdWords has different requirements than many systems at Google that often use storage systems like Bigtable. We have a single database shared by hundreds of developers and systems, so we need the robustness and ease of use we’re used to from traditional databases. F1 is built to scale like Bigtable, without giving up the database features we also need, like strong consistency, ACID transactions, schema enforcement, and most importantly, SQL query.

There’s been a widespread trend over the last several years away from databases, towards highly scalable “NoSQL” systems. We don’t think that trade-off is necessary, and were happy to see several other speakers advocate a similar theme -- yes, databases are useful, and developers shouldn’t need to give up database features and ease of use in the name of scalability.

This theme was supported by an industry session on Big Data featuring talks from other companies: Facebook (TAO: How Facebook Serves the Social Graph), Twitter (Large-Scale Machine Learning at Twitter), and Microsoft (Recurring Job Optimization in Scope). Googler Kirsten LeFevre was a panelist on the "Perspectives on Big Data" panel organized by Surajit Chaudhuri from Microsoft, and also featuring Donald Kossmann from ETHZ, Sam Madden from MIT, and Anand Rajaraman from Walmart Labs. Last but not the least, Surajit Chaudhuri also gave an excellent keynote outlining some of the research challenges that the new era of "Big Data and Cloud" poses.

As has been the practice for several years now, to continue generating great interest in data management research, SIGMOD has been organizing panels such as this year's "New Research Symposium" (which included Anish Das Sarma from Google as a panelist).

In addition to sponsoring the conference, many Googlers attended contributing to a robust presence and affording us the opportunity to interact with the broader information management community. We've been pushing the frontiers of science with cutting-edge research in many aspects of data management, and we were eager to share our innovations and see what others have been working on. We found Amin Vahdat's keynote on the intersection of Networking and Databases to be a highlight of Google’s participation, which also included presenting papers, participating on panels, and taking part in planning and program committees:

Program Committee Members


Anish Das Sarma, Venkatesh Ganti, Zoltan Gyongyi, Alon Halevy (Tutorials Chair), Kristen LeFevre, Cong Yu

Talks


Symbiosis in Scale Out Networking and Data Management
Amin Vahdat, Google (Keynote)

F1-The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business
Jeff Shute, Mircea Oancea, Stephan Ellner, Ben Handy, Eric Rollins, Bart Samwel, Radek Vingralek, Chad Whipkey, Xin Chen, Beat Jegerlehner, Kyle Littlefield, Phoenix Tong (Googlers)

Finding Related Tables
Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Cong Yu (Googlers)

Papers


CloudRAMSort: Fast and Efficient Large-Scale Distributed RAM Sort on Shared-Nothing Cluster
Changkyu Kim, Jongsoo Park, Nadathur Satish, Hongrae Lee (Google), Pradeep Dubey, Jatin Chhugani

Efficient Spatial Sampling of Large Geographical Tables
Anish Das Sarma, Hongrae Lee, Hector Gonzalez, Jayant Madhavan, Alon Halevy (Googlers)

Panels


Perspectives on Big Data Plenary Session: Privacy and Big Data 
Kristen LeFevre, Google

SIGMOD New Researcher Symposium - How to be a good advisor/advisee? 
Anish Das Sarma, Google

Overall, this year’s SIGMOD was a great conference, widely attended by researchers from industry and academia, and comprised of a very interesting mix of research presentations and discussions. Google had a good showing at the conference, and we look forward to continuing this trend in the coming years.

quinta-feira, 12 de julho de 2012

Reflections on the Google Faculty Institute



Extending the school year one day can result in a year’s worth of learning. This proved true on June 8, when the 2011 Google Faculty Institute (GFI) cohort were welcomed back for a day, to share best practices and perspectives from their funded research over the 2011-12 school year.

For the past year, the GFI Fellows collaborated across 16 California State University campuses, Stanford and UC Berkeley, to execute on ten research initiatives proposed on the final day of the conference. GFI themes of faculty collaboration, project-based learning, universal design and others were implemented in the Fellows’ projects, each of which focused on ways to enhance teaching practices through the use of educational technologies.

At the GFI Redux earlier this month, participants reviewed research initiatives, attended panel discussions, and defined plans for the 2012-13 school year. In a packed day of sessions, the cohort showcased projects ranging from mobile application development to geospatial tool utilization to the success of the flipped classroom. Some highlights of GFI projects:

  • Making Teachers “Appy” presented workshops on UC and CSU campuses on mobile application development using App Inventor. While building confidence with new technologies, participants learned to create their own applications to enhance classroom instruction.
  • Bird’s Eye Detective encouraged CSU pre-service teachers to explore the world from a new perspective utilizing geospatial tools including Google Earth, Google Maps, and Fusion Tables.
  • Transforming STEM Educators included nine hands-on workshops on three CSU campuses, presenting creative ways to engage students in science and engineering courses through the use of technology.
  • CSU Digital Learning Ambassadors are faculty creating collaborative communities and customized initiatives from the inside. Initiatives include tech infusion prizes, Hangouts on Air for academic discussions, and webinars.

The Google Faculty Institute served as a catalyst and incubator for innovative educational technology. Congratulations to the GFI Fellows on a year of excellent research and application.


terça-feira, 3 de julho de 2012

Google Research Awards: Summer, 2012



We’ve just finished the review process for the latest round of the Google Research Awards, which is our bi-annual open call for proposals on research in areas of mutual interest with Google. Our funding provides full-time faculty the opportunity to fund a graduate student and work directly with Google research scientists and engineers.

This round, we are funding 104 awards across 21 different focus areas for a total of nearly $6 million. The subject areas that received the highest level of support this time were systems and infrastructure, human computer interaction, and mobile. In addition, 28% of the funding was awarded to universities outside the U.S.

Given that our program is merit-based, we make funding decisions via committees of experts, who assess each proposal by its impact, innovation, relevance to Google, and other factors. Over the past two years, we have seen significant growth in the Research Award program. This round, we had 815 proposals—up 11% from last round, which required 1,946 reviews by 654 reviewers.

Our award committees represent a microcosm of Research @ Google. Not only do we work with research scientists in making funding decisions, but also engineers—many of whom have advanced degrees in Computer Science. Our research organization has a similar make-up: both research scientists and engineers working together on innovative projects that are product-focused and relevant to our customers.

Congratulations to the well-deserving recipients of this round’s awards. If you are interested in applying for the next round (deadline is October 15), please visit our website for more information.

The following packages have been kept back

When executing "sudo apt-get upgrade" on a Ubuntu server, you can get the message:
The following packages have been kept back

This means that certain updates require system changes.  You can install the packages by executing this command:
sudo apt-get dist-upgrade

segunda-feira, 2 de julho de 2012

Our Unique Approach to Research



Google started as a research project—and research has remained a core part of our culture. But we also do research differently than many other places. To shed more light on Google’s unique approach to research, Peter Norvig (Director of Research), Slav Petrov (Senior Research Scientist) and I recently published a paper, “Google’s Hybrid Approach to Research,” in the July issue of Communications of the ACM.
   
In the paper, we describe our hybrid approach to research, which integrates research and development to maximize our impact on users and the speed at which we make progress. Our model allows us to work at unparalleled scale and conduct research in vivo on real systems with millions of users, rather than on artificial prototypes. This yields not only innovative research results and new technologies, but valuable new capabilities for the company—think of MapReduce, Voice Search or open source projects such as Android and Chrome. 

Breaking up long-term research projects into shorter-term, measurable components is another aspect of our integrated model. This is not to say our model precludes longer-term objectives, but we try to achieve these in stages. For example, Google Translate is a multi-year project characterized by the need for both research and complex systems, but we’ve achieved many small objectives along the way—such as adding languages over time for a current total of 64, developing features like two-step translation functionality, enabling users to make corrections, and consideration of syntactic structure.

Overall, our success in the areas of systems, speech recognition, language translation, machine learning, market algorithms, computer vision and many other areas has stemmed from our hybrid research approach. While there are risks associated with the close integration of research and development activities—namely the concern that research will take a back seat in favor of shorter-term projects—we mitigate those by focusing on the user and empirical data, maintaining a flexible organizational structure, and engaging with the academic community. We have a portfolio of timescales, with some researchers working with engineers to rapidly iterate on existing products, and others working on forward-looking projects that will benefit people in the future.

We hope “Google’s Hybrid Approach to Research” helps explain our method. We feel it will bring some clarification and transparency to our approach, and perhaps merit consideration by other technology companies and academic labs that organize research differently.

To learn more about what we do and see see real-time applications of our hybrid research model, add Research at Google to your circles on Google+.

(Cross-posted on the Official Google Blog