Issue
Displaying h264 video from an mpegts stream over udp:// on android.
I've been trying for a few days to get this to work with no success. What I have is a device that produces a h264 video stream that it multicasts over in an mpegts container over raw udp (not rtp). I'm trying to get this to display in a custom app on android.
I read that android's built in MediaPlayer supports both h264 (avc) and mpegts, but that it does not handle udp:// streams, so I cannot use that (which would be by far the simplest). Instead, I have tried to manually parse the mpegts stream into an elementary stream and pass that to a MediaCodec that's been passed the surface of a SurfaceView. No matter what I seem to try, two things always happen (once I fix exceptions, etc):
- The SurfaceView is always black.
- The MediaCodec always accepts about 6-9 buffers and then
dequeueInputBuffer
just starts instantly failing (returning -1) and I cannot queue anything else.
I can split the mpeg stream into TS packets and then join their payloads into PES packets. I've tried passing full PES packets (minus the PES header) into MediaCodec.
I've also tried splitting the PES packets into individual NAL units by splitting on \x00\x00\x01
and passing them individually into the MediaCodec.
I've also tried holding off on passing in NAL unit until I've received the SPS NAL unit and passing that first with BUFFER_FLAG_CODEC_CONFIG
.
All of these result in the same thing mentiond above. I am out of ideas about what to try, so any help would be greatly appreciated.
Some points I'm still not sure about:
Nearly all the examples I've seen get the MediaFormat from MediaExtractor, which I can't use on the stream. The few that don't use MediaExtractor explicity set csd-0 and csd-1 from bytestrings that aren't explained. I read that SPS packet can be put in the buffer instead so that's what I tried.
I'm not sure what to pass into presentationTimeUs. The TS packets have a PCR and the PES packets have a PTS, but I don't know what's expected by the api and how these relate.
I'm not sure how the data needs to be passed into MediaCodec (is this why it stops giving me buffers?). I got the idea of passing in individual NAL units from this so post: Decoding Raw H264 stream in android?
other references I used to make this example:
code (sorry it's pretty long):
I just created a test app from the basic template in AndroidStudio, most of it is boilerplate so I'll just paste the video related stuff.
SurfaceView is defined in the xml, so grab it and get the surface when it's created/changed
public class VideoPlayer extends Activity implements SurfaceHolder.Callback {
private static final String TAG = VideoPlayer.class.getName();
PlayerThread playerThread;
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_video_player);
SurfaceView view = (SurfaceView) findViewById(R.id.surface);
view.getHolder().addCallback(this);
}
...
@Override
public void surfaceCreated(SurfaceHolder surfaceHolder) {
Log.d(TAG,"surfaceCreated");
}
@Override
public void surfaceChanged(SurfaceHolder surfaceHolder, int i, int i2, int i3) {
Log.d("main","surfaceChanged");
if( playerThread == null ) {
playerThread = new PlayerThread(surfaceHolder.getSurface());
playerThread.start();
}
}
...
PlayerThread is an internal class that reads data from a multicast port and passes it to a parsing function on a background thread:
class PlayerThread extends Thread {
private final String TAG = PlayerThread.class.getName();
MediaExtractor extractor;
MediaCodec decoder;
Surface surface;
boolean running;
ByteBuffer[] inputBuffers;
public PlayerThread(Surface surface)
{
this.surface = surface;
MediaFormat format = MediaFormat.createVideoFormat("video/avc",720,480);
decoder = MediaCodec.createDecoderByType("video/avc");
decoder.configure(format, surface, null, 0);
decoder.start();
inputBuffers = decoder.getInputBuffers();
}
...
@Override
public void run() {
running = true;
try {
String mcg = "239.255.0.1";
MulticastSocket ms;
ms = new MulticastSocket(1841);
ms.joinGroup(new InetSocketAddress(mcg, 1841), NetworkInterface.getByName("eth0"));
ms.setSoTimeout(4000);
ms.setReuseAddress(true);
byte[] buffer = new byte[65535];
DatagramPacket dp = new DatagramPacket(buffer, buffer.length);
while (running) {
try {
ms.receive(dp);
parse(dp.getData());
} catch (SocketTimeoutException e) {
Log.d("thread", "timeout");
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
The receiving works fine, each datagram packet contains two TS packets. They get passed to the parse function:
boolean first = true;
ByteArrayOutputStream current = new ByteArrayOutputStream();
void parse(byte[] data) {
ByteBuffer stream = ByteBuffer.wrap(data);
// mpeg-ts stream header is 4 bytes starting with the sync byte
if( stream.get(0) != 0x47 ) {
Log.w(TAG, "got packet w/out mpegts header!");
return;
}
ByteBuffer raw = stream.duplicate();
// ts packets are 188 bytes
raw.limit(188);
TSPacket ts = new TSPacket(raw);
if( ts.pid == 0x10 ) {
processTS(ts);
}
// move to second packet
stream.position(188);
stream.limit(188*2);
if( stream.get(stream.position()) != 0x47 ) {
Log.w(TAG, "missing mpegts header!");
return;
}
raw = stream.duplicate();
raw.limit(188*2);
ts = new TSPacket(raw);
if( ts.pid == 0x10 ) {
processTS(ts);
}
}
TS packets are parsed by the TSPacket class:
public class TSPacket {
private final static String TAG = TSPacket.class.getName();
class AdaptationField {
boolean di;
boolean rai;
boolean espi;
boolean hasPcr;
boolean hasOpcr;
boolean spf;
boolean tpdf;
boolean hasExtension;
byte[] data;
public AdaptationField(ByteBuffer raw) {
// first byte is size of field minus size byte
int count = raw.get() & 0xff;
// second byte is flags
BitSet flags = BitSet.valueOf(new byte[]{ raw.get()});
di = flags.get(7);
rai = flags.get(6);
espi = flags.get(5);
hasPcr = flags.get(4);
hasOpcr = flags.get(3);
spf = flags.get(2);
tpdf = flags.get(1);
hasExtension = flags.get(0);
// the rest is 'data'
if( count > 1 ) {
data = new byte[count-1];
raw.get(data);
}
}
}
boolean tei;
boolean pus;
boolean tp;
int pid;
boolean hasAdapt;
boolean hasPayload;
int counter;
AdaptationField adaptationField;
byte[] payload;
public TSPacket(ByteBuffer raw) {
// check for sync byte
if( raw.get() != 0x47 ) {
Log.e(TAG, "missing sync byte");
throw new InvalidParameterException("missing sync byte");
}
// next 3 bits are flags
byte b = raw.get();
BitSet flags = BitSet.valueOf(new byte[] {b});
tei = flags.get(7);
pus = flags.get(6);
tp = flags.get(5);
// then 13 bits for pid
pid = ((b << 8) | (raw.get() & 0xff) ) & 0x1fff;
b = raw.get();
flags = BitSet.valueOf(new byte[]{b});
// then 4 more flags
if( flags.get(7) || flags.get(6) ) {
Log.e(TAG, "scrambled?!?!");
// todo: bail on this packet?
}
hasAdapt = flags.get(5);
hasPayload = flags.get(4);
// counter
counter = b & 0x0f;
// optional adaptation field
if( hasAdapt ) {
adaptationField = new AdaptationField(raw);
}
// optional payload field
if( hasPayload ) {
payload = new byte[raw.remaining()];
raw.get(payload);
}
}
}
Then passed to the processTS function:
// a PES packet can span multiple TS packets, so we keep track of the 'current' one
PESPacket currentPES;
void processTS(TSPacket ts) {
// payload unit start?
if( ts.pus ) {
if( currentPES != null ) {
Log.d(TAG,String.format("replacing pes: len=%d,size=%d", currentPES.length, currentPES.data.size()));
}
// start of new PES packet
currentPES = new PESPacket(ts);
} else if (currentPES != null ) {
// continued PES
currentPES.Add(ts);
} else {
// haven't got a start pes yet
return;
}
if( currentPES.isFull() ) {
long pts = currentPES.getPts();
byte[] data = currentPES.data.toByteArray();
int idx = 0;
do {
int sidx = idx;
// find next NAL prefix
idx = Utility.indexOf(data, sidx+4, data.length-(sidx+4), new byte[]{0,0,1});
byte[] NAL;
if( idx >= 0 ) {
NAL = Arrays.copyOfRange(data, sidx, idx);
} else {
NAL = Arrays.copyOfRange(data, sidx, data.length);
}
// send SPS NAL before anything else
if( first ) {
byte type = NAL[3] == 0 ? NAL[4] : NAL[3];
if( (type & 0x1f) == 7 ) {
Log.d(TAG, "found sps!");
int ibs = decoder.dequeueInputBuffer(1000);
if (ibs >= 0) {
ByteBuffer sinput = inputBuffers[ibs];
sinput.clear();
sinput.put(NAL);
decoder.queueInputBuffer(ibs, 0, NAL.length, 0, MediaCodec.BUFFER_FLAG_CODEC_CONFIG);
Log.d(TAG, "sent sps");
first = false;
} else
Log.d(TAG, String.format("could not send sps! %d", ibs));
}
} else {
// put in decoder?
int ibs = decoder.dequeueInputBuffer(1000);
if (ibs >= 0) {
ByteBuffer sinput = inputBuffers[ibs];
sinput.clear();
sinput.put(NAL);
decoder.queueInputBuffer(ibs, 0, NAL.length, 0, 0);
Log.d(TAG, "buffa");
}
}
} while( idx >= 0 );
// finished with this pes
currentPES = null;
}
}
PES packets are parsed by the PESPacket class:
public class PESPacket {
private final static String TAG = PESPacket.class.getName();
int id;
int length;
boolean priority;
boolean dai;
boolean copyright;
boolean origOrCopy;
boolean hasPts;
boolean hasDts;
boolean hasEscr;
boolean hasEsRate;
boolean dsmtmf;
boolean acif;
boolean hasCrc;
boolean pesef;
int headerDataLength;
byte[] headerData;
ByteArrayOutputStream data = new ByteArrayOutputStream();
public PESPacket(TSPacket ts) {
if( ts == null || !ts.pus) {
Log.e(TAG, "invalid ts passed in");
throw new InvalidParameterException("invalid ts passed in");
}
ByteBuffer pes = ByteBuffer.wrap(ts.payload);
// start code
if( pes.get() != 0 || pes.get() != 0 || pes.get() != 1 ) {
Log.e(TAG, "invalid start code");
throw new InvalidParameterException("invalid start code");
}
// stream id
id = pes.get() & 0xff;
// packet length
length = pes.getShort() & 0xffff;
// this is supposedly allowed for video
if( length == 0 ) {
Log.w(TAG, "got zero-length PES?");
}
if( id != 0xe0 ) {
Log.e(TAG, String.format("unexpected stream id: 0x%x", id));
// todo: ?
}
// for 0xe0 there is an extension header starting with 2 bits '10'
byte b = pes.get();
if( (b & 0x30) != 0 ) {
Log.w(TAG, "scrambled ?!?!");
// todo: ?
}
BitSet flags = BitSet.valueOf(new byte[]{b});
priority = flags.get(3);
dai = flags.get(2);
copyright = flags.get(1);
origOrCopy = flags.get(0);
flags = BitSet.valueOf(new byte[]{pes.get()});
hasPts = flags.get(7);
hasDts = flags.get(6);
hasEscr = flags.get(5);
hasEsRate = flags.get(4);
dsmtmf = flags.get(3);
acif = flags.get(2);
hasCrc = flags.get(1);
pesef = flags.get(0);
headerDataLength = pes.get() & 0xff;
if( headerDataLength > 0 ) {
headerData = new byte[headerDataLength];
pes.get(headerData);
}
WritableByteChannel channel = Channels.newChannel(data);
try {
channel.write(pes);
} catch (IOException e) {
e.printStackTrace();
}
// length includes optional pes header,
length = length - (3 + headerDataLength);
}
public void Add(TSPacket ts) {
if( ts.pus ) {
Log.e(TAG, "don't add start of PES packet to another packet");
throw new InvalidParameterException("ts packet marked as new pes");
}
int size = data.size();
int len = length - size;
len = ts.payload.length > len ? len : ts.payload.length;
data.write(ts.payload, 0, len);
}
public boolean isFull() {
return (data.size() >= length );
}
public long getPts() {
if( !hasPts || headerDataLength < 5 )
return 0;
ByteBuffer hd = ByteBuffer.wrap(headerData);
long pts = ( ((hd.get() & 0x0e) << 29)
| ((hd.get() & 0xff) << 22)
| ((hd.get() & 0xfe) << 14)
| ((hd.get() & 0xff) << 7)
| ((hd.get() & 0xfe) >>> 1));
return pts;
}
}
Solution
So I eventually figured out that, even though I was using an output surface, I had to manually drain the output buffers. By calling decoder.dequeueOutputBuffer
and then decoder.releaseOutputBuffer
, the input buffers worked as expected.
I was able to also able to get output by passing in both individual NAL units as well as full access units (one per PES packet), but I got the clearest video by passing in full access units.
Answered By - bj0
Answer Checked By - David Marino (JavaFixing Volunteer)