A multimodal sensing architecture utilizes an array of single sensor or multi-sensor groups (superpixels) to facilitate advanced object-manipulation and recognition tasks per-formed by mechanical end effectors in robotic systems. The single-sensors/superpixels are spatially arrayed over contact surfaces of the end effector fingers and include, e.g., pressure sensors and vibration sensors that facilitate the simultaneous detection of both static and dynamic events occurring on the end effector, and optionally include proximity sensors and/or temperature sensors. A readout circuit receives the sensor data from the superpixels and transmits the sensor data onto a shared sensor data bus. An optional multimodal control generator receives and processes the sensor data and generates multimodal control signals that cause the robot system’s control circuit to adjust control operations performed by the end effector or other portions of the robot mechanism and when the sensor data indicates non-standard operating conditions.