Data Sets

An overview of the two data sets you can select for labs 5-8. For each 'family' of data there are several different datasets from the same instrument.

 

HERA data

The HERA data is in hd5 files (this is a good thing). Unfortunately, they are too large for canvas to host. So here are Dropbox links to the three files in this family:

https://www.dropbox.com/s/vcn6jmkvtqpapih/zen.2459122.34011.mini.sum.uvh5?dl=0 Links to an external site.

https://www.dropbox.com/s/8calcpfy7c1l18o/zen.2459122.48015.mini.sum.uvh5?dl=0 Links to an external site.

https://www.dropbox.com/s/2gn6djtme7dd4vf/zen.2459122.62018.mini.sum.uvh5?dl=0 Links to an external site.

 

LHC data

The LHC data have two files per set. If you are working in python, it is best to read it using the pickle package (you may need to add this library to your environment)

Set 1: higgs_100000_pt_250_500.pkl Download higgs_100000_pt_250_500.pkl;   qcd_100000_pt_250_500.pkl Download qcd_100000_pt_250_500.pkl

Set 2: higgs_100000_pt_1000_1200.pkl Download higgs_100000_pt_1000_1200.pkl;  qcd_100000_pt_1000_1200.pkl Download qcd_100000_pt_1000_1200.pkl

 

If you have issues reading this data due to pickle version, here are hd5 versions of the files

Set 1: higgs_100000_pt_250_500.h5 Download higgs_100000_pt_250_500.h5; qcd_100000_pt_250_500.h5 Download qcd_100000_pt_250_500.h5

Set 2: higgs_100000_pt_1000_1200.h5 Download higgs_100000_pt_1000_1200.h5; qcd_100000_pt_1000_1200.h5 Download qcd_100000_pt_1000_1200.h5

One disadvantage of this loading method is that the data attributes are not labelled. The rows will be in the same order as described in the document.