The first part is easily done by using the ascii number of the characters and 2 offsets. The second part I first though we need to find the groups on our own, but (RTFM...) it's just every 3 lines, so it's just a matter of using pythons set.
set