-
Notifications
You must be signed in to change notification settings - Fork 28
CMS NanoAOD interface #45
Comments
The links I'm referring to are for cross-cleaning. |
This is such a great proposition! I am doing something similar in my analysis, but there is unfortunately a large overhead when loading NanoAODs because individual columns are spread over several files accessed via xrootd (about 20 per dataset). You should maybe keep this in mind when thinking of a solution. Some day, data should also be anyway stored columnar in CMS I hope! Some more things I observed when working with awkward and NanoAOD:
All in all, I think uproot/awkward should not only adapt better to NanoAOD, but NanoAOD could also benefit from the lessons learned in awkward the other way around. |
Actually, they evolved together—we were talking with each other when NanoAOD and awkward were both being developed. I had some suggestions about the branch type: to use ROOT arrays instead of std::vector (which adds 10 bytes per event per branch). This is the JaggedArray format, almost byte for byte. (ROOT's offsets are byte offsets relative to the TKey, rather than item offsets relative to the start of data, but that's a subtraction and a bit-shift.) NanoAOD can save space by storing one set of counts ( What I'm talking about in this issue is not about changing any formats or making anything more efficient—just packaging it up in a more intuitive way. Turning NanoAOD's links into |
Thank you for your explanations! As someone relatively new in CMS, I'm always very glad if someone explains me some context on how things evolved historically. I did not know many of this, so thanks for taking the time to answer even though my previous comment was not really on topic as I see now. |
That's okay—it's good to hear about the level of interest! The thread here will be replaced with a PR as soon as I start actually working on it anyway. |
Hey, can we use the recursively defined |
Yes. If the gen particles looks something like this: tree = awkward.fromiter([
{"value": 1.23, "left": 1, "right": 2}, # node 0
{"value": 3.21, "left": 3, "right": 4}, # node 1
{"value": 9.99, "left": 5, "right": 6}, # node 2
{"value": 3.14, "left": 7, "right": None}, # node 3
{"value": 2.71, "left": None, "right": 8}, # node 4
{"value": 5.55, "left": None, "right": None}, # node 5
{"value": 8.00, "left": None, "right": None}, # node 6
{"value": 9.00, "left": None, "right": None}, # node 7
{"value": 0.00, "left": None, "right": None}, # node 8
])
left = tree.contents["left"].content
right = tree.contents["right"].content
left[(left < 0) | (left > 8)] = 0 # satisfy overzealous validity checks
right[(right < 0) | (right > 8)] = 0
tree.contents["left"].content = awkward.IndexedArray(left, tree)
tree.contents["right"].content = awkward.IndexedArray(right, tree)
tree[0].tolist() we can make a tree. (That's what the above does: |
lol that's a BDT |
BDT was a motivating case. Yes, these are top-down arrows, so that you can walk from root to leaf. If gen particle arrows point from leaf to root, then a new calculation would be needed. Since we'd only want to do that on demand, it could be in a There are quite a few good things the CMS NanoAOD extension could have. It's not short-timescale like the awkward/uproot-methods version management, though. By the way, I lost track of something you said about mocking |
There should be a mechanism that recognizes a TTree as NanoAOD and presents a virtual, formatted view of the data, using knowledge of NanoAOD idioms. For instance,
Muon_*
should be collected into a single jagged table calledmuons
with the muon branches as its columns. It should useVirtualArrays
, so that you can carry an array of muons around without having loaded all of the branches. References between particles and jets—expressed as integer indexes in NanoAOD—should beIndexedArrays
. I'm on the fence about making themChunkedArrays
at the basket level—that may be too small. Perhaps they could beChunkedArrays
at the file level (or a function for loading them that takes the chunking size as an option).This was inspired by scikit-hep/awkward-0.x#95.
The text was updated successfully, but these errors were encountered: