Run your first query using RubiX¶
Start Hive Client¶
hive --hiveconf hive.metastore.uris="" --hiveconf fs.rubix.impl=com.qubole.rubix.hadoop2.CachingNativeS3FileSystem
Create External Table¶
CREATE EXTERNAL TABLE wikistats_orc_rubix
(language STRING, page_title STRING,
hits BIGINT, retrived_size BIGINT)
STORED AS ORC
LOCATION 'rubix://emr.presto.airpal/wikistats/orc';
Run Query (Presto or Hive CLI)¶
SELECT language, page_title, AVG(hits) AS avg_hits
FROM default.wikistats_orc_rubix
WHERE language = 'en'
AND page_title NOT IN ('Main_Page', '404_error/')
AND page_title NOT LIKE '%index%'
AND page_title NOT LIKE '%Search%'
GROUP BY language, page_title
ORDER BY avg_hits DESC
LIMIT 10;
RubiX Stats (supported on Presto only)¶
The cache statistics are pushed to MBean named rubix:name=stats. To check the stats, execute
SELECT Node, CachedReads,
ROUND(extrareadfromremote,2) as ExtraReadFromRemote,
ROUND(hitrate,2) as HitRate,
ROUND(missrate,2) as MissRate,
ROUND(nonlocaldataread,2) as NonLocalDataRead,
NonLocalReads,
ROUND(readfromcache,2) as ReadFromCache,
ROUND(readfromremote, 2) as ReadFromRemote,
RemoteReads
FROM jmx.current."rubix:name=stats";